From 43c26a1a45938624fb9301e8bf7dfabbed293619 Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Thu, 18 May 2017 15:44:41 +0200 Subject: net: more accurate checksumming in validate_xmit_skb() skb_csum_hwoffload_help() uses netdev features and skb->csum_not_inet to determine if skb needs software computation of Internet Checksum or crc32c (or nothing, if this computation can be done by the hardware). Use it in place of skb_checksum_help() in validate_xmit_skb() to avoid corruption of non-GSO SCTP packets having skb->ip_summed equal to CHECKSUM_PARTIAL. While at it, remove references to skb_csum_off_chk* functions, since they are not present anymore in Linux _ see commit cf53b1da73bd ("Revert "net: Add driver helper functions to determine checksum offloadability""). Signed-off-by: Davide Caratti Signed-off-by: David S. Miller --- Documentation/networking/checksum-offloads.txt | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/checksum-offloads.txt b/Documentation/networking/checksum-offloads.txt index 56e36861245f..d52d191bbb0c 100644 --- a/Documentation/networking/checksum-offloads.txt +++ b/Documentation/networking/checksum-offloads.txt @@ -35,6 +35,9 @@ This interface only allows a single checksum to be offloaded. Where encapsulation is used, the packet may have multiple checksum fields in different header layers, and the rest will have to be handled by another mechanism such as LCO or RCO. +CRC32c can also be offloaded using this interface, by means of filling + skb->csum_start and skb->csum_offset as described above, and setting + skb->csum_not_inet: see skbuff.h comment (section 'D') for more details. No offloading of the IP header checksum is performed; it is always done in software. This is OK because when we build the IP header, we obviously have it in cache, so summing it isn't expensive. It's also rather short. @@ -49,9 +52,9 @@ A driver declares its offload capabilities in netdev->hw_features; see and csum_offset given in the SKB; if it tries to deduce these itself in hardware (as some NICs do) the driver should check that the values in the SKB match those which the hardware will deduce, and if not, fall back to - checksumming in software instead (with skb_checksum_help or one of the - skb_csum_off_chk* functions as mentioned in include/linux/skbuff.h). This - is a pain, but that's what you get when hardware tries to be clever. + checksumming in software instead (with skb_csum_hwoffload_help() or one of + the skb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in + include/linux/skbuff.h). The stack should, for the most part, assume that checksum offload is supported by the underlying device. The only place that should check is @@ -60,7 +63,7 @@ The stack should, for the most part, assume that checksum offload is may include other offloads besides TX Checksum Offload) and, if they are not supported or enabled on the device (determined by netdev->features), performs the corresponding offload in software. In the case of TX - Checksum Offload, that means calling skb_checksum_help(skb). + Checksum Offload, that means calling skb_csum_hwoffload_help(skb, features). LCO: Local Checksum Offload -- cgit v1.2.3 From aad9c8c470f2a8321a99eb053630ce0e199558d6 Mon Sep 17 00:00:00 2001 From: Miroslav Lichvar Date: Fri, 19 May 2017 17:52:38 +0200 Subject: net: add new control message for incoming HW-timestamped packets Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message for incoming packets with hardware timestamps. It contains the index of the real interface which received the packet and the length of the packet at layer 2. The index is useful with bonding, bridges and other interfaces, where IP_PKTINFO doesn't allow applications to determine which PHC made the timestamp. With the L2 length (and link speed) it is possible to transpose preamble timestamps to trailer timestamps, which are used in the NTP protocol. While this information could be provided by two new socket options independently from timestamping, it doesn't look like they would be very useful. With this option any performance impact is limited to hardware timestamping. Use dev_get_by_napi_id() to get the device and its index. On kernels with disabled CONFIG_NET_RX_BUSY_POLL or drivers not using NAPI, a zero index will be returned in the control message. CC: Richard Cochran Acked-by: Willem de Bruijn Signed-off-by: Miroslav Lichvar Signed-off-by: David S. Miller --- Documentation/networking/timestamping.txt | 10 ++++++++++ include/uapi/asm-generic/socket.h | 2 ++ include/uapi/linux/net_tstamp.h | 11 ++++++++++- net/socket.c | 27 ++++++++++++++++++++++++++- 4 files changed, 48 insertions(+), 2 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index 96f50694a748..ce11e3a08c0d 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt @@ -193,6 +193,16 @@ SOF_TIMESTAMPING_OPT_STATS: the transmit timestamps, such as how long a certain block of data was limited by peer's receiver window. +SOF_TIMESTAMPING_OPT_PKTINFO: + + Enable the SCM_TIMESTAMPING_PKTINFO control message for incoming + packets with hardware timestamps. The message contains struct + scm_ts_pktinfo, which supplies the index of the real interface which + received the packet and its length at layer 2. A valid (non-zero) + interface index will be returned only if CONFIG_NET_RX_BUSY_POLL is + enabled and the driver is using NAPI. The struct contains also two + other fields, but they are reserved and undefined. + New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate regardless of the setting of sysctl net.core.tstamp_allow_data. diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h index 2b488565599d..a5f6e819fafd 100644 --- a/include/uapi/asm-generic/socket.h +++ b/include/uapi/asm-generic/socket.h @@ -100,4 +100,6 @@ #define SO_COOKIE 57 +#define SCM_TIMESTAMPING_PKTINFO 58 + #endif /* __ASM_GENERIC_SOCKET_H */ diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h index 0749fb13e517..dee74d39da94 100644 --- a/include/uapi/linux/net_tstamp.h +++ b/include/uapi/linux/net_tstamp.h @@ -9,6 +9,7 @@ #ifndef _NET_TIMESTAMPING_H #define _NET_TIMESTAMPING_H +#include #include /* for SO_TIMESTAMPING */ /* SO_TIMESTAMPING gets an integer bit field comprised of these values */ @@ -26,8 +27,9 @@ enum { SOF_TIMESTAMPING_OPT_CMSG = (1<<10), SOF_TIMESTAMPING_OPT_TSONLY = (1<<11), SOF_TIMESTAMPING_OPT_STATS = (1<<12), + SOF_TIMESTAMPING_OPT_PKTINFO = (1<<13), - SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_STATS, + SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_PKTINFO, SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) | SOF_TIMESTAMPING_LAST }; @@ -130,4 +132,11 @@ enum hwtstamp_rx_filters { HWTSTAMP_FILTER_NTP_ALL, }; +/* SCM_TIMESTAMPING_PKTINFO control message */ +struct scm_ts_pktinfo { + __u32 if_index; + __u32 pkt_length; + __u32 reserved[2]; +}; + #endif /* _NET_TIMESTAMPING_H */ diff --git a/net/socket.c b/net/socket.c index c2564eb25c6b..67db7d8a3b81 100644 --- a/net/socket.c +++ b/net/socket.c @@ -662,6 +662,27 @@ static bool skb_is_err_queue(const struct sk_buff *skb) return skb->pkt_type == PACKET_OUTGOING; } +static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb) +{ + struct scm_ts_pktinfo ts_pktinfo; + struct net_device *orig_dev; + + if (!skb_mac_header_was_set(skb)) + return; + + memset(&ts_pktinfo, 0, sizeof(ts_pktinfo)); + + rcu_read_lock(); + orig_dev = dev_get_by_napi_id(skb_napi_id(skb)); + if (orig_dev) + ts_pktinfo.if_index = orig_dev->ifindex; + rcu_read_unlock(); + + ts_pktinfo.pkt_length = skb->len - skb_mac_offset(skb); + put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING_PKTINFO, + sizeof(ts_pktinfo), &ts_pktinfo); +} + /* * called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP) */ @@ -699,8 +720,12 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk, empty = 0; if (shhwtstamps && (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) && - ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2)) + ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2)) { empty = 0; + if ((sk->sk_tsflags & SOF_TIMESTAMPING_OPT_PKTINFO) && + !skb_is_err_queue(skb)) + put_ts_pktinfo(msg, skb); + } if (!empty) { put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING, sizeof(tss), &tss); -- cgit v1.2.3 From 67953d47bb24e63d209705f745a0de411a4c6578 Mon Sep 17 00:00:00 2001 From: Miroslav Lichvar Date: Fri, 19 May 2017 17:52:39 +0200 Subject: net: fix documentation of struct scm_timestamping The scm_timestamping struct may return multiple non-zero fields, e.g. when both software and hardware RX timestamping is enabled, or when the SO_TIMESTAMP(NS) option is combined with SCM_TIMESTAMPING and a false software timestamp is generated in the recvmsg() call in order to always return a SCM_TIMESTAMP(NS) message. CC: Richard Cochran CC: Willem de Bruijn Signed-off-by: Miroslav Lichvar Acked-by: Willem de Bruijn Signed-off-by: David S. Miller --- Documentation/networking/timestamping.txt | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index ce11e3a08c0d..50eb0e554778 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt @@ -322,7 +322,7 @@ struct scm_timestamping { }; The structure can return up to three timestamps. This is a legacy -feature. Only one field is non-zero at any time. Most timestamps +feature. At least one field is non-zero at any time. Most timestamps are passed in ts[0]. Hardware timestamps are passed in ts[2]. ts[1] used to hold hardware timestamps converted to system time. @@ -331,6 +331,12 @@ a HW PTP clock source, to allow time conversion in userspace and optionally synchronize system time with a userspace PTP stack such as linuxptp. For the PTP clock API, see Documentation/ptp/ptp.txt. +Note that if the SO_TIMESTAMP or SO_TIMESTAMPNS option is enabled +together with SO_TIMESTAMPING using SOF_TIMESTAMPING_SOFTWARE, a false +software timestamp will be generated in the recvmsg() call and passed +in ts[0] when a real software timestamp is missing. This happens also +on hardware transmit timestamps. + 2.1.1 Transmit timestamps with MSG_ERRQUEUE For transmit timestamps the outgoing packet is looped back to the -- cgit v1.2.3 From b50a5c70ffa4fd6b6da324ab54c84adf48fb17d9 Mon Sep 17 00:00:00 2001 From: Miroslav Lichvar Date: Fri, 19 May 2017 17:52:40 +0200 Subject: net: allow simultaneous SW and HW transmit timestamping Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to be looped to the socket's error queue with a software timestamp even when a hardware transmit timestamp is expected to be provided by the driver. Applications using this option will receive two separate messages from the error queue, one with a software timestamp and the other with a hardware timestamp. As the hardware timestamp is saved to the shared skb info, which may happen before the first message with software timestamp is received by the application, the hardware timestamp is copied to the SCM_TIMESTAMPING control message only when the skb has no software timestamp or it is an incoming packet. While changing sw_tx_timestamp(), inline it in skb_tx_timestamp() as there are no other users. CC: Richard Cochran CC: Willem de Bruijn Signed-off-by: Miroslav Lichvar Acked-by: Willem de Bruijn Signed-off-by: David S. Miller --- Documentation/networking/timestamping.txt | 8 ++++++++ include/linux/skbuff.h | 10 ++-------- include/uapi/linux/net_tstamp.h | 3 ++- net/core/skbuff.c | 4 ++++ net/socket.c | 20 ++++++++++++++++++-- 5 files changed, 34 insertions(+), 11 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index 50eb0e554778..196ba17cc344 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt @@ -203,6 +203,14 @@ SOF_TIMESTAMPING_OPT_PKTINFO: enabled and the driver is using NAPI. The struct contains also two other fields, but they are reserved and undefined. +SOF_TIMESTAMPING_OPT_TX_SWHW: + + Request both hardware and software timestamps for outgoing packets + when SOF_TIMESTAMPING_TX_HARDWARE and SOF_TIMESTAMPING_TX_SOFTWARE + are enabled at the same time. If both timestamps are generated, + two separate messages will be looped to the socket's error queue, + each containing just one timestamp. + New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate regardless of the setting of sysctl net.core.tstamp_allow_data. diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 8acce7143f6a..45a59c1e0cc7 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3259,13 +3259,6 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb, void skb_tstamp_tx(struct sk_buff *orig_skb, struct skb_shared_hwtstamps *hwtstamps); -static inline void sw_tx_timestamp(struct sk_buff *skb) -{ - if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP && - !(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS)) - skb_tstamp_tx(skb, NULL); -} - /** * skb_tx_timestamp() - Driver hook for transmit timestamping * @@ -3281,7 +3274,8 @@ static inline void sw_tx_timestamp(struct sk_buff *skb) static inline void skb_tx_timestamp(struct sk_buff *skb) { skb_clone_tx_timestamp(skb); - sw_tx_timestamp(skb); + if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP) + skb_tstamp_tx(skb, NULL); } /** diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h index dee74d39da94..3d421d912193 100644 --- a/include/uapi/linux/net_tstamp.h +++ b/include/uapi/linux/net_tstamp.h @@ -28,8 +28,9 @@ enum { SOF_TIMESTAMPING_OPT_TSONLY = (1<<11), SOF_TIMESTAMPING_OPT_STATS = (1<<12), SOF_TIMESTAMPING_OPT_PKTINFO = (1<<13), + SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14), - SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_PKTINFO, + SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_TX_SWHW, SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) | SOF_TIMESTAMPING_LAST }; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index d5c98117cbce..780b7c1563d0 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3901,6 +3901,10 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb, if (!sk) return; + if (!hwtstamps && !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) && + skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS) + return; + tsonly = sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TSONLY; if (!skb_may_tx_timestamp(sk, tsonly)) return; diff --git a/net/socket.c b/net/socket.c index 67db7d8a3b81..cb355a7ef135 100644 --- a/net/socket.c +++ b/net/socket.c @@ -662,6 +662,19 @@ static bool skb_is_err_queue(const struct sk_buff *skb) return skb->pkt_type == PACKET_OUTGOING; } +/* On transmit, software and hardware timestamps are returned independently. + * As the two skb clones share the hardware timestamp, which may be updated + * before the software timestamp is received, a hardware TX timestamp may be + * returned only if there is no software TX timestamp. Ignore false software + * timestamps, which may be made in the __sock_recv_timestamp() call when the + * option SO_TIMESTAMP(NS) is enabled on the socket, even when the skb has a + * hardware timestamp. + */ +static bool skb_is_swtx_tstamp(const struct sk_buff *skb, int false_tstamp) +{ + return skb->tstamp && !false_tstamp && skb_is_err_queue(skb); +} + static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb) { struct scm_ts_pktinfo ts_pktinfo; @@ -691,14 +704,16 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk, { int need_software_tstamp = sock_flag(sk, SOCK_RCVTSTAMP); struct scm_timestamping tss; - int empty = 1; + int empty = 1, false_tstamp = 0; struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb); /* Race occurred between timestamp enabling and packet receiving. Fill in the current time for now. */ - if (need_software_tstamp && skb->tstamp == 0) + if (need_software_tstamp && skb->tstamp == 0) { __net_timestamp(skb); + false_tstamp = 1; + } if (need_software_tstamp) { if (!sock_flag(sk, SOCK_RCVTSTAMPNS)) { @@ -720,6 +735,7 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk, empty = 0; if (shhwtstamps && (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) && + !skb_is_swtx_tstamp(skb, false_tstamp) && ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2)) { empty = 0; if ((sk->sk_tsflags & SOF_TIMESTAMPING_OPT_PKTINFO) && -- cgit v1.2.3 From 85cfa71764cab95228e0abebdd77e0382c3c34be Mon Sep 17 00:00:00 2001 From: Jesse Brandeburg Date: Thu, 11 May 2017 11:23:21 -0700 Subject: i40evf: update i40evf.txt with new content The addition of the AVF and virtchnl code to the i40evf driver means we should update the i40evf.txt file with the most up to date information. It seems this file hasn't been updated in a while, so the changes cover a little more than just AVF, but it's all only in the i40evf.txt. Signed-off-by: Jesse Brandeburg Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- Documentation/networking/i40evf.txt | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/i40evf.txt b/Documentation/networking/i40evf.txt index 21e41271af79..e9b3035b95d0 100644 --- a/Documentation/networking/i40evf.txt +++ b/Documentation/networking/i40evf.txt @@ -1,8 +1,8 @@ Linux* Base Driver for Intel(R) Network Connection ================================================== -Intel XL710 X710 Virtual Function Linux driver. -Copyright(c) 2013 Intel Corporation. +Intel Ethernet Adaptive Virtual Function Linux driver. +Copyright(c) 2013-2017 Intel Corporation. Contents ======== @@ -11,19 +11,26 @@ Contents - Known Issues/Troubleshooting - Support -This file describes the i40evf Linux* Base Driver for the Intel(R) XL710 -X710 Virtual Function. +This file describes the i40evf Linux* Base Driver. -The i40evf driver supports XL710 and X710 virtual function devices that -can only be activated on kernels with CONFIG_PCI_IOV enabled. +The i40evf driver supports the below mentioned virtual function +devices and can only be activated on kernels running the i40e or +newer Physical Function (PF) driver compiled with CONFIG_PCI_IOV. +The i40evf driver requires CONFIG_PCI_MSI to be enabled. The guest OS loading the i40evf driver must support MSI-X interrupts. +Supported Hardware +================== +Intel XL710 X710 Virtual Function +Intel Ethernet Adaptive Virtual Function +Intel X722 Virtual Function + Identifying Your Adapter ======================== -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: +For more information on how to identify your adapter, go to the +Adapter & Driver ID Guide at: http://support.intel.com/support/go/network/adapter/idguide.htm -- cgit v1.2.3 From 28036f44851e2515aa91b547b45cefddcac52ff6 Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 5 Jun 2017 14:30:49 +0100 Subject: rxrpc: Permit multiple service binding Permit bind() to be called on an AF_RXRPC socket more than once (currently maximum twice) to bind multiple listening services to it. There are some restrictions: (1) All bind() calls involved must have a non-zero service ID. (2) The service IDs must all be different. (3) The rest of the address (notably the transport part) must be the same in all (a single UDP socket is shared). (4) This must be done before listen() or sendmsg() is called. This allows someone to connect to the service socket with different service IDs and lays the foundation for service upgrading. The service ID used by an incoming call can be extracted from the msg_name returned by recvmsg(). Signed-off-by: David Howells --- Documentation/networking/rxrpc.txt | 4 +++ net/rxrpc/af_rxrpc.c | 62 ++++++++++++++++++++++++-------------- net/rxrpc/ar-internal.h | 2 ++ net/rxrpc/call_accept.c | 3 +- net/rxrpc/local_object.c | 1 + net/rxrpc/security.c | 3 +- 6 files changed, 51 insertions(+), 24 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt index 1b63bbc6b94f..b7115ec55e04 100644 --- a/Documentation/networking/rxrpc.txt +++ b/Documentation/networking/rxrpc.txt @@ -600,6 +600,10 @@ A server would be set up to accept operations in the following manner: }; bind(server, &srx, sizeof(srx)); + More than one service ID may be bound to a socket, provided the transport + parameters are the same. The limit is currently two. To do this, bind() + should be called twice. + (3) The server is then set to listen out for incoming calls: listen(server, 100); diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index 1e4ac889ec00..3b982bca7d22 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -144,31 +144,48 @@ static int rxrpc_bind(struct socket *sock, struct sockaddr *saddr, int len) lock_sock(&rx->sk); - if (rx->sk.sk_state != RXRPC_UNBOUND) { - ret = -EINVAL; - goto error_unlock; - } - - memcpy(&rx->srx, srx, sizeof(rx->srx)); + switch (rx->sk.sk_state) { + case RXRPC_UNBOUND: + rx->srx = *srx; + local = rxrpc_lookup_local(sock_net(&rx->sk), &rx->srx); + if (IS_ERR(local)) { + ret = PTR_ERR(local); + goto error_unlock; + } - local = rxrpc_lookup_local(sock_net(&rx->sk), &rx->srx); - if (IS_ERR(local)) { - ret = PTR_ERR(local); - goto error_unlock; - } + if (service_id) { + write_lock(&local->services_lock); + if (rcu_access_pointer(local->service)) + goto service_in_use; + rx->local = local; + rcu_assign_pointer(local->service, rx); + write_unlock(&local->services_lock); + + rx->sk.sk_state = RXRPC_SERVER_BOUND; + } else { + rx->local = local; + rx->sk.sk_state = RXRPC_CLIENT_BOUND; + } + break; - if (service_id) { - write_lock(&local->services_lock); - if (rcu_access_pointer(local->service)) - goto service_in_use; - rx->local = local; - rcu_assign_pointer(local->service, rx); - write_unlock(&local->services_lock); + case RXRPC_SERVER_BOUND: + ret = -EINVAL; + if (service_id == 0) + goto error_unlock; + ret = -EADDRINUSE; + if (service_id == rx->srx.srx_service) + goto error_unlock; + ret = -EINVAL; + srx->srx_service = rx->srx.srx_service; + if (memcmp(srx, &rx->srx, sizeof(*srx)) != 0) + goto error_unlock; + rx->second_service = service_id; + rx->sk.sk_state = RXRPC_SERVER_BOUND2; + break; - rx->sk.sk_state = RXRPC_SERVER_BOUND; - } else { - rx->local = local; - rx->sk.sk_state = RXRPC_CLIENT_BOUND; + default: + ret = -EINVAL; + goto error_unlock; } release_sock(&rx->sk); @@ -205,6 +222,7 @@ static int rxrpc_listen(struct socket *sock, int backlog) ret = -EADDRNOTAVAIL; break; case RXRPC_SERVER_BOUND: + case RXRPC_SERVER_BOUND2: ASSERT(rx->local != NULL); max = READ_ONCE(rxrpc_max_backlog); ret = -EINVAL; diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index de98a49adb35..781fbc253b5a 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -61,6 +61,7 @@ enum { RXRPC_CLIENT_UNBOUND, /* Unbound socket used as client */ RXRPC_CLIENT_BOUND, /* client local address bound */ RXRPC_SERVER_BOUND, /* server local address bound */ + RXRPC_SERVER_BOUND2, /* second server local address bound */ RXRPC_SERVER_LISTENING, /* server listening for connections */ RXRPC_SERVER_LISTEN_DISABLED, /* server listening disabled */ RXRPC_CLOSE, /* socket is being closed */ @@ -142,6 +143,7 @@ struct rxrpc_sock { u32 min_sec_level; /* minimum security level */ #define RXRPC_SECURITY_MAX RXRPC_SECURITY_ENCRYPT bool exclusive; /* Exclusive connection for a client socket */ + u16 second_service; /* Additional service bound to the endpoint */ sa_family_t family; /* Protocol family created with */ struct sockaddr_rxrpc srx; /* local address */ struct sockaddr_rxrpc connect_srx; /* Default client address from connect() */ diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c index a8515b0d4717..544df53ccf79 100644 --- a/net/rxrpc/call_accept.c +++ b/net/rxrpc/call_accept.c @@ -341,7 +341,8 @@ struct rxrpc_call *rxrpc_new_incoming_call(struct rxrpc_local *local, /* Get the socket providing the service */ rx = rcu_dereference(local->service); - if (rx && service_id == rx->srx.srx_service) + if (rx && (service_id == rx->srx.srx_service || + service_id == rx->second_service)) goto found_service; trace_rxrpc_abort("INV", sp->hdr.cid, sp->hdr.callNumber, sp->hdr.seq, diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c index 17d79fd73ade..38b99db30e54 100644 --- a/net/rxrpc/local_object.c +++ b/net/rxrpc/local_object.c @@ -94,6 +94,7 @@ static struct rxrpc_local *rxrpc_alloc_local(struct rxrpc_net *rxnet, rwlock_init(&local->services_lock); local->debug_id = atomic_inc_return(&rxrpc_debug_id); memcpy(&local->srx, srx, sizeof(*srx)); + local->srx.srx_service = 0; } _leave(" = %p", local); diff --git a/net/rxrpc/security.c b/net/rxrpc/security.c index b9f5dbbe0b8b..e9f428351293 100644 --- a/net/rxrpc/security.c +++ b/net/rxrpc/security.c @@ -133,7 +133,8 @@ int rxrpc_init_server_conn_security(struct rxrpc_connection *conn) read_lock(&local->services_lock); rx = rcu_dereference_protected(local->service, lockdep_is_held(&local->services_lock)); - if (rx && rx->srx.srx_service == conn->service_id) + if (rx && (rx->srx.srx_service == conn->service_id || + rx->second_service == conn->service_id)) goto found_service; /* the service appears to have died */ -- cgit v1.2.3 From 4722974d90e06d0164ca1b73a6b34cec6bdb64ad Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 5 Jun 2017 14:30:49 +0100 Subject: rxrpc: Implement service upgrade Implement AuriStor's service upgrade facility. There are three problems that this is meant to deal with: (1) Various of the standard AFS RPC calls have IPv4 addresses in their requests and/or replies - but there's no room for including IPv6 addresses. (2) Definition of IPv6-specific RPC operations in the standard operation sets has not yet been achieved. (3) One could envision the creation a new service on the same port that as the original service. The new service could implement improved operations - and the client could try this first, falling back to the original service if it's not there. Unfortunately, certain servers ignore packets addressed to a service they don't implement and don't respond in any way - not even with an ABORT. This means that the client must then wait for the call timeout to occur. What service upgrade does is to see if the connection is marked as being 'upgradeable' and if so, change the service ID in the server and thus the request and reply formats. Note that the upgrade isn't mandatory - a server that supports only the original call set will ignore the upgrade request. In the protocol, the procedure is then as follows: (1) To request an upgrade, the first DATA packet in a new connection must have the userStatus set to 1 (this is normally 0). The userStatus value is normally ignored by the server. (2) If the server doesn't support upgrading, the reply packets will contain the same service ID as for the first request packet. (3) If the server does support upgrading, all future reply packets on that connection will contain the new service ID and the new service ID will be applied to *all* further calls on that connection as well. (4) The RPC op used to probe the upgrade must take the same request data as the shadow call in the upgrade set (but may return a different reply). GetCapability RPC ops were added to all standard sets for just this purpose. Ops where the request formats differ cannot be used for probing. (5) The client must wait for completion of the probe before sending any further RPC ops to the same destination. It should then use the service ID that recvmsg() reported back in all future calls. (6) The shadow service must have call definitions for all the operation IDs defined by the original service. To support service upgrading, a server should: (1) Call bind() twice on its AF_RXRPC socket before calling listen(). Each bind() should supply a different service ID, but the transport addresses must be the same. This allows the server to receive requests with either service ID. (2) Enable automatic upgrading by calling setsockopt(), specifying RXRPC_UPGRADEABLE_SERVICE and passing in a two-member array of unsigned shorts as the argument: unsigned short optval[2]; This specifies a pair of service IDs. They must be different and must match the service IDs bound to the socket. Member 0 is the service ID to upgrade from and member 1 is the service ID to upgrade to. Signed-off-by: David Howells --- Documentation/networking/rxrpc.txt | 34 ++++++++++++++++++++++++++-------- include/linux/rxrpc.h | 1 + include/rxrpc/packet.h | 2 ++ net/rxrpc/af_rxrpc.c | 23 +++++++++++++++++++++++ net/rxrpc/ar-internal.h | 10 ++++++++-- net/rxrpc/call_accept.c | 2 +- net/rxrpc/conn_service.c | 11 ++++++++++- 7 files changed, 71 insertions(+), 12 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt index b7115ec55e04..2a1662760450 100644 --- a/Documentation/networking/rxrpc.txt +++ b/Documentation/networking/rxrpc.txt @@ -433,6 +433,13 @@ AF_RXRPC sockets support a few socket options at the SOL_RXRPC level: Encrypted checksum plus entire packet padded and encrypted, including actual packet length. + (*) RXRPC_UPGRADEABLE_SERVICE + + This is used to indicate that a service socket with two bindings may + upgrade one bound service to the other if requested by the client. optval + must point to an array of two unsigned short ints. The first is the + service ID to upgrade from and the second the service ID to upgrade to. + ======== SECURITY @@ -588,7 +595,7 @@ A server would be set up to accept operations in the following manner: The keyring can be manipulated after it has been given to the socket. This permits the server to add more keys, replace keys, etc. whilst it is live. - (2) A local address must then be bound: + (3) A local address must then be bound: struct sockaddr_rxrpc srx = { .srx_family = AF_RXRPC, @@ -604,11 +611,22 @@ A server would be set up to accept operations in the following manner: parameters are the same. The limit is currently two. To do this, bind() should be called twice. - (3) The server is then set to listen out for incoming calls: + (4) If service upgrading is required, first two service IDs must have been + bound and then the following option must be set: + + unsigned short service_ids[2] = { from_ID, to_ID }; + setsockopt(server, SOL_RXRPC, RXRPC_UPGRADEABLE_SERVICE, + service_ids, sizeof(service_ids)); + + This will automatically upgrade connections on service from_ID to service + to_ID if they request it. This will be reflected in msg_name obtained + through recvmsg() when the request data is delivered to userspace. + + (5) The server is then set to listen out for incoming calls: listen(server, 100); - (4) The kernel notifies the server of pending incoming connections by sending + (6) The kernel notifies the server of pending incoming connections by sending it a message for each. This is received with recvmsg() on the server socket. It has no data, and has a single dataless control message attached: @@ -620,13 +638,13 @@ A server would be set up to accept operations in the following manner: the time it is accepted - in which case the first call still on the queue will be accepted. - (5) The server then accepts the new call by issuing a sendmsg() with two + (7) The server then accepts the new call by issuing a sendmsg() with two pieces of control data and no actual data: RXRPC_ACCEPT - indicate connection acceptance RXRPC_USER_CALL_ID - specify user ID for this call - (6) The first request data packet will then be posted to the server socket for + (8) The first request data packet will then be posted to the server socket for recvmsg() to pick up. At that point, the RxRPC address for the call can be read from the address fields in the msghdr struct. @@ -638,7 +656,7 @@ A server would be set up to accept operations in the following manner: RXRPC_USER_CALL_ID - specifies the user ID for this call - (8) The reply data should then be posted to the server socket using a series + (9) The reply data should then be posted to the server socket using a series of sendmsg() calls, each with the following control messages attached: RXRPC_USER_CALL_ID - specifies the user ID for this call @@ -646,7 +664,7 @@ A server would be set up to accept operations in the following manner: MSG_MORE should be set in msghdr::msg_flags on all but the last message for a particular call. - (9) The final ACK from the client will be posted for retrieval by recvmsg() +(10) The final ACK from the client will be posted for retrieval by recvmsg() when it is received. It will take the form of a dataless message with two control messages attached: @@ -656,7 +674,7 @@ A server would be set up to accept operations in the following manner: MSG_EOR will be flagged to indicate that this is the final message for this call. -(10) Up to the point the final packet of reply data is sent, the call can be +(11) Up to the point the final packet of reply data is sent, the call can be aborted by calling sendmsg() with a dataless message with the following control messages attached: diff --git a/include/linux/rxrpc.h b/include/linux/rxrpc.h index c68307bc306f..634116561a6a 100644 --- a/include/linux/rxrpc.h +++ b/include/linux/rxrpc.h @@ -37,6 +37,7 @@ struct sockaddr_rxrpc { #define RXRPC_SECURITY_KEYRING 2 /* [srvr] set ring of server security keys */ #define RXRPC_EXCLUSIVE_CONNECTION 3 /* Deprecated; use RXRPC_EXCLUSIVE_CALL instead */ #define RXRPC_MIN_SECURITY_LEVEL 4 /* minimum security level */ +#define RXRPC_UPGRADEABLE_SERVICE 5 /* Upgrade service[0] -> service[1] */ /* * RxRPC control messages diff --git a/include/rxrpc/packet.h b/include/rxrpc/packet.h index 703a64b4681a..a2dcfb850b9f 100644 --- a/include/rxrpc/packet.h +++ b/include/rxrpc/packet.h @@ -58,6 +58,8 @@ struct rxrpc_wire_header { #define RXRPC_SLOW_START_OK 0x20 /* [ACK] slow start supported */ uint8_t userStatus; /* app-layer defined status */ +#define RXRPC_USERSTATUS_SERVICE_UPGRADE 0x01 /* AuriStor service upgrade request */ + uint8_t securityIndex; /* security protocol ID */ union { __be16 _rsvd; /* reserved */ diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index 3b982bca7d22..0c4dc4a7832c 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -490,6 +490,7 @@ static int rxrpc_setsockopt(struct socket *sock, int level, int optname, { struct rxrpc_sock *rx = rxrpc_sk(sock->sk); unsigned int min_sec_level; + u16 service_upgrade[2]; int ret; _enter(",%d,%d,,%d", level, optname, optlen); @@ -546,6 +547,28 @@ static int rxrpc_setsockopt(struct socket *sock, int level, int optname, rx->min_sec_level = min_sec_level; goto success; + case RXRPC_UPGRADEABLE_SERVICE: + ret = -EINVAL; + if (optlen != sizeof(service_upgrade) || + rx->service_upgrade.from != 0) + goto error; + ret = -EISCONN; + if (rx->sk.sk_state != RXRPC_SERVER_BOUND2) + goto error; + ret = -EFAULT; + if (copy_from_user(service_upgrade, optval, + sizeof(service_upgrade)) != 0) + goto error; + ret = -EINVAL; + if ((service_upgrade[0] != rx->srx.srx_service || + service_upgrade[1] != rx->second_service) && + (service_upgrade[0] != rx->second_service || + service_upgrade[1] != rx->srx.srx_service)) + goto error; + rx->service_upgrade.from = service_upgrade[0]; + rx->service_upgrade.to = service_upgrade[1]; + goto success; + default: break; } diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 781fbc253b5a..c1ebd886a53f 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -144,8 +144,13 @@ struct rxrpc_sock { #define RXRPC_SECURITY_MAX RXRPC_SECURITY_ENCRYPT bool exclusive; /* Exclusive connection for a client socket */ u16 second_service; /* Additional service bound to the endpoint */ + struct { + /* Service upgrade information */ + u16 from; /* Service ID to upgrade (if not 0) */ + u16 to; /* service ID to upgrade to */ + } service_upgrade; sa_family_t family; /* Protocol family created with */ - struct sockaddr_rxrpc srx; /* local address */ + struct sockaddr_rxrpc srx; /* Primary Service/local addresses */ struct sockaddr_rxrpc connect_srx; /* Default client address from connect() */ }; @@ -861,7 +866,8 @@ static inline void rxrpc_put_connection(struct rxrpc_connection *conn) struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct rxrpc_peer *, struct sk_buff *); struct rxrpc_connection *rxrpc_prealloc_service_connection(struct rxrpc_net *, gfp_t); -void rxrpc_new_incoming_connection(struct rxrpc_connection *, struct sk_buff *); +void rxrpc_new_incoming_connection(struct rxrpc_sock *, + struct rxrpc_connection *, struct sk_buff *); void rxrpc_unpublish_service_conn(struct rxrpc_connection *); /* diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c index 544df53ccf79..0d4d84e8c074 100644 --- a/net/rxrpc/call_accept.c +++ b/net/rxrpc/call_accept.c @@ -296,7 +296,7 @@ static struct rxrpc_call *rxrpc_alloc_incoming_call(struct rxrpc_sock *rx, conn->params.local = local; conn->params.peer = peer; rxrpc_see_connection(conn); - rxrpc_new_incoming_connection(conn, skb); + rxrpc_new_incoming_connection(rx, conn, skb); } else { rxrpc_get_connection(conn); } diff --git a/net/rxrpc/conn_service.c b/net/rxrpc/conn_service.c index c7f8682a55b2..e60fcd2a4a02 100644 --- a/net/rxrpc/conn_service.c +++ b/net/rxrpc/conn_service.c @@ -150,7 +150,8 @@ struct rxrpc_connection *rxrpc_prealloc_service_connection(struct rxrpc_net *rxn * Set up an incoming connection. This is called in BH context with the RCU * read lock held. */ -void rxrpc_new_incoming_connection(struct rxrpc_connection *conn, +void rxrpc_new_incoming_connection(struct rxrpc_sock *rx, + struct rxrpc_connection *conn, struct sk_buff *skb) { struct rxrpc_skb_priv *sp = rxrpc_skb(skb); @@ -168,6 +169,14 @@ void rxrpc_new_incoming_connection(struct rxrpc_connection *conn, else conn->state = RXRPC_CONN_SERVICE; + /* See if we should upgrade the service. This can only happen on the + * first packet on a new connection. Once done, it applies to all + * subsequent calls on that connection. + */ + if (sp->hdr.userStatus == RXRPC_USERSTATUS_SERVICE_UPGRADE && + conn->service_id == rx->service_upgrade.from) + conn->service_id = rx->service_upgrade.to; + /* Make the connection a target for incoming packets. */ rxrpc_publish_service_conn(conn->params.peer, conn); -- cgit v1.2.3 From 4e255721d1575a766ada06dc7eb03acdcd34eaaf Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 5 Jun 2017 14:30:49 +0100 Subject: rxrpc: Add service upgrade support for client connections Make it possible for a client to use AuriStor's service upgrade facility. The client does this by adding an RXRPC_UPGRADE_SERVICE control message to the first sendmsg() of a call. This takes no parameters. When recvmsg() starts returning data from the call, the service ID field in the returned msg_name will reflect the result of the upgrade attempt. If the upgrade was ignored, srx_service will match what was set in the sendmsg(); if the upgrade happened the srx_service will be altered to indicate the service the server upgraded to. Note that: (1) The choice of upgrade service is up to the server (2) Further client calls to the same server that would share a connection are blocked if an upgrade probe is in progress. (3) This should only be used to probe the service. Clients should then use the returned service ID in all subsequent communications with that server (and not set the upgrade). Note that the kernel will not retain this information should the connection expire from its cache. (4) If a server that supports upgrading is replaced by one that doesn't, whilst a connection is live, and if the replacement is running, say, OpenAFS 1.6.4 or older or an older IBM AFS, then the replacement server will not respond to packets sent to the upgraded connection. At this point, calls will time out and the server must be reprobed. Signed-off-by: David Howells --- Documentation/networking/rxrpc.txt | 30 ++++++++++++++++++++++++++ include/linux/rxrpc.h | 1 + include/trace/events/rxrpc.h | 1 + net/rxrpc/ar-internal.h | 3 +++ net/rxrpc/conn_client.c | 43 +++++++++++++++++++++++++++++++------- net/rxrpc/input.c | 17 +++++++++++++++ net/rxrpc/output.c | 4 ++++ net/rxrpc/sendmsg.c | 19 +++++++++++++---- 8 files changed, 106 insertions(+), 12 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt index 2a1662760450..18078e630a63 100644 --- a/Documentation/networking/rxrpc.txt +++ b/Documentation/networking/rxrpc.txt @@ -325,6 +325,8 @@ calls, to invoke certain actions and to report certain conditions. These are: RXRPC_LOCAL_ERROR -rt error num Local error encountered RXRPC_NEW_CALL -r- n/a New call received RXRPC_ACCEPT s-- n/a Accept new call + RXRPC_EXCLUSIVE_CALL s-- n/a Make an exclusive client call + RXRPC_UPGRADE_SERVICE s-- n/a Client call can be upgraded (SRT = usable in Sendmsg / delivered by Recvmsg / Terminal message) @@ -387,6 +389,23 @@ calls, to invoke certain actions and to report certain conditions. These are: return error ENODATA. If the user ID is already in use by another call, then error EBADSLT will be returned. + (*) RXRPC_EXCLUSIVE_CALL + + This is used to indicate that a client call should be made on a one-off + connection. The connection is discarded once the call has terminated. + + (*) RXRPC_UPGRADE_SERVICE + + This is used to make a client call to probe if the specified service ID + may be upgraded by the server. The caller must check msg_name returned to + recvmsg() for the service ID actually in use. The operation probed must + be one that takes the same arguments in both services. + + Once this has been used to establish the upgrade capability (or lack + thereof) of the server, the service ID returned should be used for all + future communication to that server and RXRPC_UPGRADE_SERVICE should no + longer be set. + ============== SOCKET OPTIONS @@ -566,6 +585,17 @@ A client would issue an operation by: buffer instead, and MSG_EOR will be flagged to indicate the end of that call. +A client may ask for a service ID it knows and ask that this be upgraded to a +better service if one is available by supplying RXRPC_UPGRADE_SERVICE on the +first sendmsg() of a call. The client should then check srx_service in the +msg_name filled in by recvmsg() when collecting the result. srx_service will +hold the same value as given to sendmsg() if the upgrade request was ignored by +the service - otherwise it will be altered to indicate the service ID the +server upgraded to. Note that the upgraded service ID is chosen by the server. +The caller has to wait until it sees the service ID in the reply before sending +any more calls (further calls to the same destination will be blocked until the +probe is concluded). + ==================== EXAMPLE SERVER USAGE diff --git a/include/linux/rxrpc.h b/include/linux/rxrpc.h index 634116561a6a..707910c6c6c5 100644 --- a/include/linux/rxrpc.h +++ b/include/linux/rxrpc.h @@ -54,6 +54,7 @@ struct sockaddr_rxrpc { #define RXRPC_NEW_CALL 8 /* -r: [Service] new incoming call notification */ #define RXRPC_ACCEPT 9 /* s-: [Service] accept request */ #define RXRPC_EXCLUSIVE_CALL 10 /* s-: Call should be on exclusive connection */ +#define RXRPC_UPGRADE_SERVICE 11 /* s-: Request service upgrade for client call */ /* * RxRPC security levels diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 29a3d53a4015..ebe96796027a 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -233,6 +233,7 @@ enum rxrpc_congest_change { EM(RXRPC_CONN_CLIENT_INACTIVE, "Inac") \ EM(RXRPC_CONN_CLIENT_WAITING, "Wait") \ EM(RXRPC_CONN_CLIENT_ACTIVE, "Actv") \ + EM(RXRPC_CONN_CLIENT_UPGRADE, "Upgd") \ EM(RXRPC_CONN_CLIENT_CULLED, "Cull") \ E_(RXRPC_CONN_CLIENT_IDLE, "Idle") \ diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index c1ebd886a53f..e9b536cb0acf 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -320,6 +320,7 @@ struct rxrpc_conn_parameters { struct rxrpc_peer *peer; /* Remote endpoint */ struct key *key; /* Security details */ bool exclusive; /* T if conn is exclusive */ + bool upgrade; /* T if service ID can be upgraded */ u16 service_id; /* Service ID for this connection */ u32 security_level; /* Security level selected */ }; @@ -334,6 +335,7 @@ enum rxrpc_conn_flag { RXRPC_CONN_EXPOSED, /* Conn has extra ref for exposure */ RXRPC_CONN_DONT_REUSE, /* Don't reuse this connection */ RXRPC_CONN_COUNTED, /* Counted by rxrpc_nr_client_conns */ + RXRPC_CONN_PROBING_FOR_UPGRADE, /* Probing for service upgrade */ }; /* @@ -350,6 +352,7 @@ enum rxrpc_conn_cache_state { RXRPC_CONN_CLIENT_INACTIVE, /* Conn is not yet listed */ RXRPC_CONN_CLIENT_WAITING, /* Conn is on wait list, waiting for capacity */ RXRPC_CONN_CLIENT_ACTIVE, /* Conn is on active list, doing calls */ + RXRPC_CONN_CLIENT_UPGRADE, /* Conn is on active list, probing for upgrade */ RXRPC_CONN_CLIENT_CULLED, /* Conn is culled and delisted, doing calls */ RXRPC_CONN_CLIENT_IDLE, /* Conn is on idle list, doing mostly nothing */ RXRPC_CONN__NR_CACHE_STATES diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c index 3f358bf424ad..dd8bb919c15a 100644 --- a/net/rxrpc/conn_client.c +++ b/net/rxrpc/conn_client.c @@ -36,12 +36,15 @@ * * rxrpc_nr_active_client_conns is held incremented also. * - * (4) CULLED - The connection got summarily culled to try and free up + * (4) UPGRADE - As for ACTIVE, but only one call may be in progress and is + * being used to probe for service upgrade. + * + * (5) CULLED - The connection got summarily culled to try and free up * capacity. Calls currently in progress on the connection are allowed to * continue, but new calls will have to wait. There can be no waiters in * this state - the conn would have to go to the WAITING state instead. * - * (5) IDLE - The connection has no calls in progress upon it and must have + * (6) IDLE - The connection has no calls in progress upon it and must have * been exposed to the world (ie. the EXPOSED flag must be set). When it * expires, the EXPOSED flag is cleared and the connection transitions to * the INACTIVE state. @@ -184,6 +187,8 @@ rxrpc_alloc_client_connection(struct rxrpc_conn_parameters *cp, gfp_t gfp) atomic_set(&conn->usage, 1); if (cp->exclusive) __set_bit(RXRPC_CONN_DONT_REUSE, &conn->flags); + if (cp->upgrade) + __set_bit(RXRPC_CONN_PROBING_FOR_UPGRADE, &conn->flags); conn->params = *cp; conn->out_clientflag = RXRPC_CLIENT_INITIATED; @@ -300,7 +305,8 @@ static int rxrpc_get_client_conn(struct rxrpc_call *call, #define cmp(X) ((long)conn->params.X - (long)cp->X) diff = (cmp(peer) ?: cmp(key) ?: - cmp(security_level)); + cmp(security_level) ?: + cmp(upgrade)); #undef cmp if (diff < 0) { p = p->rb_left; @@ -365,7 +371,8 @@ static int rxrpc_get_client_conn(struct rxrpc_call *call, #define cmp(X) ((long)conn->params.X - (long)candidate->params.X) diff = (cmp(peer) ?: cmp(key) ?: - cmp(security_level)); + cmp(security_level) ?: + cmp(upgrade)); #undef cmp if (diff < 0) { pp = &(*pp)->rb_left; @@ -436,8 +443,13 @@ error: static void rxrpc_activate_conn(struct rxrpc_net *rxnet, struct rxrpc_connection *conn) { - trace_rxrpc_client(conn, -1, rxrpc_client_to_active); - conn->cache_state = RXRPC_CONN_CLIENT_ACTIVE; + if (test_bit(RXRPC_CONN_PROBING_FOR_UPGRADE, &conn->flags)) { + trace_rxrpc_client(conn, -1, rxrpc_client_to_upgrade); + conn->cache_state = RXRPC_CONN_CLIENT_UPGRADE; + } else { + trace_rxrpc_client(conn, -1, rxrpc_client_to_active); + conn->cache_state = RXRPC_CONN_CLIENT_ACTIVE; + } rxnet->nr_active_client_conns++; list_move_tail(&conn->cache_link, &rxnet->active_client_conns); } @@ -461,7 +473,8 @@ static void rxrpc_animate_client_conn(struct rxrpc_net *rxnet, _enter("%d,%d", conn->debug_id, conn->cache_state); - if (conn->cache_state == RXRPC_CONN_CLIENT_ACTIVE) + if (conn->cache_state == RXRPC_CONN_CLIENT_ACTIVE || + conn->cache_state == RXRPC_CONN_CLIENT_UPGRADE) goto out; spin_lock(&rxnet->client_conn_cache_lock); @@ -474,6 +487,7 @@ static void rxrpc_animate_client_conn(struct rxrpc_net *rxnet, switch (conn->cache_state) { case RXRPC_CONN_CLIENT_ACTIVE: + case RXRPC_CONN_CLIENT_UPGRADE: case RXRPC_CONN_CLIENT_WAITING: break; @@ -577,6 +591,9 @@ static void rxrpc_activate_channels_locked(struct rxrpc_connection *conn) case RXRPC_CONN_CLIENT_ACTIVE: mask = RXRPC_ACTIVE_CHANS_MASK; break; + case RXRPC_CONN_CLIENT_UPGRADE: + mask = 0x01; + break; default: return; } @@ -787,6 +804,15 @@ void rxrpc_disconnect_client_call(struct rxrpc_call *call) spin_lock(&rxnet->client_conn_cache_lock); switch (conn->cache_state) { + case RXRPC_CONN_CLIENT_UPGRADE: + /* Deal with termination of a service upgrade probe. */ + if (test_bit(RXRPC_CONN_EXPOSED, &conn->flags)) { + clear_bit(RXRPC_CONN_PROBING_FOR_UPGRADE, &conn->flags); + trace_rxrpc_client(conn, channel, rxrpc_client_to_active); + conn->cache_state = RXRPC_CONN_CLIENT_ACTIVE; + rxrpc_activate_channels_locked(conn); + } + /* fall through */ case RXRPC_CONN_CLIENT_ACTIVE: if (list_empty(&conn->waiting_calls)) { rxrpc_deactivate_one_channel(conn, channel); @@ -941,7 +967,8 @@ static void rxrpc_cull_active_client_conns(struct rxrpc_net *rxnet) ASSERT(!list_empty(&rxnet->active_client_conns)); conn = list_entry(rxnet->active_client_conns.next, struct rxrpc_connection, cache_link); - ASSERTCMP(conn->cache_state, ==, RXRPC_CONN_CLIENT_ACTIVE); + ASSERTIFCMP(conn->cache_state != RXRPC_CONN_CLIENT_ACTIVE, + conn->cache_state, ==, RXRPC_CONN_CLIENT_UPGRADE); if (list_empty(&conn->waiting_calls)) { trace_rxrpc_client(conn, -1, rxrpc_client_to_culled); diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 45dba732a3b4..e56e23ed2229 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -1142,6 +1142,13 @@ void rxrpc_data_ready(struct sock *udp_sk) if (sp->hdr.securityIndex != conn->security_ix) goto wrong_security; + if (sp->hdr.serviceId != conn->service_id) { + if (!test_bit(RXRPC_CONN_PROBING_FOR_UPGRADE, &conn->flags) || + conn->service_id != conn->params.service_id) + goto reupgrade; + conn->service_id = sp->hdr.serviceId; + } + if (sp->hdr.callNumber == 0) { /* Connection-level packet */ _debug("CONN %p {%d}", conn, conn->debug_id); @@ -1194,6 +1201,9 @@ void rxrpc_data_ready(struct sock *udp_sk) rxrpc_input_implicit_end_call(conn, call); call = NULL; } + + if (call && sp->hdr.serviceId != call->service_id) + call->service_id = sp->hdr.serviceId; } else { skew = 0; call = NULL; @@ -1237,11 +1247,18 @@ wrong_security: skb->priority = RXKADINCONSISTENCY; goto post_abort; +reupgrade: + rcu_read_unlock(); + trace_rxrpc_abort("UPG", sp->hdr.cid, sp->hdr.callNumber, sp->hdr.seq, + RX_PROTOCOL_ERROR, EBADMSG); + goto protocol_error; + bad_message_unlock: rcu_read_unlock(); bad_message: trace_rxrpc_abort("BAD", sp->hdr.cid, sp->hdr.callNumber, sp->hdr.seq, RX_PROTOCOL_ERROR, EBADMSG); +protocol_error: skb->priority = RX_PROTOCOL_ERROR; post_abort: skb->mark = RXRPC_SKB_MARK_LOCAL_ABORT; diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 5dab1ff3a6c2..5bd2d0fa4a03 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -292,6 +292,10 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct sk_buff *skb, whdr._rsvd = htons(sp->hdr._rsvd); whdr.serviceId = htons(call->service_id); + if (test_bit(RXRPC_CONN_PROBING_FOR_UPGRADE, &conn->flags) && + sp->hdr.seq == 1) + whdr.userStatus = RXRPC_USERSTATUS_SERVICE_UPGRADE; + iov[0].iov_base = &whdr; iov[0].iov_len = sizeof(whdr); iov[1].iov_base = skb->head; diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 96ffa5d5733b..5a4801e7f560 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -366,7 +366,8 @@ static int rxrpc_sendmsg_cmsg(struct msghdr *msg, unsigned long *user_call_ID, enum rxrpc_command *command, u32 *abort_code, - bool *_exclusive) + bool *_exclusive, + bool *_upgrade) { struct cmsghdr *cmsg; bool got_user_ID = false; @@ -429,6 +430,13 @@ static int rxrpc_sendmsg_cmsg(struct msghdr *msg, if (len != 0) return -EINVAL; break; + + case RXRPC_UPGRADE_SERVICE: + *_upgrade = true; + if (len != 0) + return -EINVAL; + break; + default: return -EINVAL; } @@ -447,7 +455,8 @@ static int rxrpc_sendmsg_cmsg(struct msghdr *msg, */ static struct rxrpc_call * rxrpc_new_client_call_for_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, - unsigned long user_call_ID, bool exclusive) + unsigned long user_call_ID, bool exclusive, + bool upgrade) __releases(&rx->sk.sk_lock.slock) { struct rxrpc_conn_parameters cp; @@ -472,6 +481,7 @@ rxrpc_new_client_call_for_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, cp.key = rx->key; cp.security_level = rx->min_sec_level; cp.exclusive = rx->exclusive | exclusive; + cp.upgrade = upgrade; cp.service_id = srx->srx_service; call = rxrpc_new_client_call(rx, &cp, srx, user_call_ID, GFP_KERNEL); /* The socket is now unlocked */ @@ -493,13 +503,14 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) struct rxrpc_call *call; unsigned long user_call_ID = 0; bool exclusive = false; + bool upgrade = true; u32 abort_code = 0; int ret; _enter(""); ret = rxrpc_sendmsg_cmsg(msg, &user_call_ID, &cmd, &abort_code, - &exclusive); + &exclusive, &upgrade); if (ret < 0) goto error_release_sock; @@ -521,7 +532,7 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) if (cmd != RXRPC_CMD_SEND_DATA) goto error_release_sock; call = rxrpc_new_client_call_for_sendmsg(rx, msg, user_call_ID, - exclusive); + exclusive, upgrade); /* The socket is now unlocked... */ if (IS_ERR(call)) return PTR_ERR(call); -- cgit v1.2.3 From f8fe99754673719ab791713a676bf27dae616fbc Mon Sep 17 00:00:00 2001 From: "yuval.shaia@oracle.com" Date: Mon, 5 Jun 2017 10:18:40 +0300 Subject: net: phy: Delete unused function phy_ethtool_gset It's unused, so remove it. Signed-off-by: Yuval Shaia Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- Documentation/networking/phy.txt | 1 - drivers/net/phy/phy.c | 24 ------------------------ include/linux/phy.h | 1 - 3 files changed, 26 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt index 16f90d817224..bdec0f700bc1 100644 --- a/Documentation/networking/phy.txt +++ b/Documentation/networking/phy.txt @@ -295,7 +295,6 @@ Doing it all yourself settings in the PHY. int phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd); - int phy_ethtool_gset(struct phy_device *phydev, struct ethtool_cmd *cmd); Ethtool convenience functions. diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index 82ab8fb82587..40f4c6a2ef6c 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -484,30 +484,6 @@ int phy_ethtool_ksettings_set(struct phy_device *phydev, } EXPORT_SYMBOL(phy_ethtool_ksettings_set); -int phy_ethtool_gset(struct phy_device *phydev, struct ethtool_cmd *cmd) -{ - cmd->supported = phydev->supported; - - cmd->advertising = phydev->advertising; - cmd->lp_advertising = phydev->lp_advertising; - - ethtool_cmd_speed_set(cmd, phydev->speed); - cmd->duplex = phydev->duplex; - if (phydev->interface == PHY_INTERFACE_MODE_MOCA) - cmd->port = PORT_BNC; - else - cmd->port = PORT_MII; - cmd->phy_address = phydev->mdio.addr; - cmd->transceiver = phy_is_internal(phydev) ? - XCVR_INTERNAL : XCVR_EXTERNAL; - cmd->autoneg = phydev->autoneg; - cmd->eth_tp_mdix_ctrl = phydev->mdix_ctrl; - cmd->eth_tp_mdix = phydev->mdix; - - return 0; -} -EXPORT_SYMBOL(phy_ethtool_gset); - int phy_ethtool_ksettings_get(struct phy_device *phydev, struct ethtool_link_ksettings *cmd) { diff --git a/include/linux/phy.h b/include/linux/phy.h index 58f1b45a4c44..748e526c0698 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -854,7 +854,6 @@ void phy_start_machine(struct phy_device *phydev); void phy_stop_machine(struct phy_device *phydev); void phy_trigger_machine(struct phy_device *phydev, bool sync); int phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd); -int phy_ethtool_gset(struct phy_device *phydev, struct ethtool_cmd *cmd); int phy_ethtool_ksettings_get(struct phy_device *phydev, struct ethtool_link_ksettings *cmd); int phy_ethtool_ksettings_set(struct phy_device *phydev, -- cgit v1.2.3 From 515559ca21713218595f3a4dad44a4e7eea2fcfb Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 7 Jun 2017 16:27:15 +0100 Subject: rxrpc: Provide a getsockopt call to query what cmsgs types are supported Provide a getsockopt() call that can query what cmsg types are supported by AF_RXRPC. --- Documentation/networking/rxrpc.txt | 9 +++++++++ include/linux/rxrpc.h | 24 ++++++++++++++---------- net/rxrpc/af_rxrpc.c | 30 +++++++++++++++++++++++++++++- 3 files changed, 52 insertions(+), 11 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt index 18078e630a63..bce8e10a2a8e 100644 --- a/Documentation/networking/rxrpc.txt +++ b/Documentation/networking/rxrpc.txt @@ -406,6 +406,10 @@ calls, to invoke certain actions and to report certain conditions. These are: future communication to that server and RXRPC_UPGRADE_SERVICE should no longer be set. +The symbol RXRPC__SUPPORTED is defined as one more than the highest control +message type supported. At run time this can be queried by means of the +RXRPC_SUPPORTED_CMSG socket option (see below). + ============== SOCKET OPTIONS @@ -459,6 +463,11 @@ AF_RXRPC sockets support a few socket options at the SOL_RXRPC level: must point to an array of two unsigned short ints. The first is the service ID to upgrade from and the second the service ID to upgrade to. + (*) RXRPC_SUPPORTED_CMSG + + This is a read-only option that writes an int into the buffer indicating + the highest control message type supported. + ======== SECURITY diff --git a/include/linux/rxrpc.h b/include/linux/rxrpc.h index 707910c6c6c5..bdd3175b9a48 100644 --- a/include/linux/rxrpc.h +++ b/include/linux/rxrpc.h @@ -38,6 +38,7 @@ struct sockaddr_rxrpc { #define RXRPC_EXCLUSIVE_CONNECTION 3 /* Deprecated; use RXRPC_EXCLUSIVE_CALL instead */ #define RXRPC_MIN_SECURITY_LEVEL 4 /* minimum security level */ #define RXRPC_UPGRADEABLE_SERVICE 5 /* Upgrade service[0] -> service[1] */ +#define RXRPC_SUPPORTED_CMSG 6 /* Get highest supported control message type */ /* * RxRPC control messages @@ -45,16 +46,19 @@ struct sockaddr_rxrpc { * - terminal messages mean that a user call ID tag can be recycled * - s/r/- indicate whether these are applicable to sendmsg() and/or recvmsg() */ -#define RXRPC_USER_CALL_ID 1 /* sr: user call ID specifier */ -#define RXRPC_ABORT 2 /* sr: abort request / notification [terminal] */ -#define RXRPC_ACK 3 /* -r: [Service] RPC op final ACK received [terminal] */ -#define RXRPC_NET_ERROR 5 /* -r: network error received [terminal] */ -#define RXRPC_BUSY 6 /* -r: server busy received [terminal] */ -#define RXRPC_LOCAL_ERROR 7 /* -r: local error generated [terminal] */ -#define RXRPC_NEW_CALL 8 /* -r: [Service] new incoming call notification */ -#define RXRPC_ACCEPT 9 /* s-: [Service] accept request */ -#define RXRPC_EXCLUSIVE_CALL 10 /* s-: Call should be on exclusive connection */ -#define RXRPC_UPGRADE_SERVICE 11 /* s-: Request service upgrade for client call */ +enum rxrpc_cmsg_type { + RXRPC_USER_CALL_ID = 1, /* sr: user call ID specifier */ + RXRPC_ABORT = 2, /* sr: abort request / notification [terminal] */ + RXRPC_ACK = 3, /* -r: [Service] RPC op final ACK received [terminal] */ + RXRPC_NET_ERROR = 5, /* -r: network error received [terminal] */ + RXRPC_BUSY = 6, /* -r: server busy received [terminal] */ + RXRPC_LOCAL_ERROR = 7, /* -r: local error generated [terminal] */ + RXRPC_NEW_CALL = 8, /* -r: [Service] new incoming call notification */ + RXRPC_ACCEPT = 9, /* s-: [Service] accept request */ + RXRPC_EXCLUSIVE_CALL = 10, /* s-: Call should be on exclusive connection */ + RXRPC_UPGRADE_SERVICE = 11, /* s-: Request service upgrade for client call */ + RXRPC__SUPPORTED +}; /* * RxRPC security levels diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index 0c4dc4a7832c..44a52b82bb5d 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -581,6 +581,34 @@ error: return ret; } +/* + * Get socket options. + */ +static int rxrpc_getsockopt(struct socket *sock, int level, int optname, + char __user *optval, int __user *_optlen) +{ + int optlen; + + if (level != SOL_RXRPC) + return -EOPNOTSUPP; + + if (get_user(optlen, _optlen)) + return -EFAULT; + + switch (optname) { + case RXRPC_SUPPORTED_CMSG: + if (optlen < sizeof(int)) + return -ETOOSMALL; + if (put_user(RXRPC__SUPPORTED - 1, (int __user *)optval) || + put_user(sizeof(int), _optlen)) + return -EFAULT; + return 0; + + default: + return -EOPNOTSUPP; + } +} + /* * permit an RxRPC socket to be polled */ @@ -784,7 +812,7 @@ static const struct proto_ops rxrpc_rpc_ops = { .listen = rxrpc_listen, .shutdown = rxrpc_shutdown, .setsockopt = rxrpc_setsockopt, - .getsockopt = sock_no_getsockopt, + .getsockopt = rxrpc_getsockopt, .sendmsg = rxrpc_sendmsg, .recvmsg = rxrpc_recvmsg, .mmap = sock_no_mmap, -- cgit v1.2.3 From e754eba685aac2a9b5538176fa2d254ad25f464d Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 7 Jun 2017 12:40:03 +0100 Subject: rxrpc: Provide a cmsg to specify the amount of Tx data for a call Provide a control message that can be specified on the first sendmsg() of a client call or the first sendmsg() of a service response to indicate the total length of the data to be transmitted for that call. Currently, because the length of the payload of an encrypted DATA packet is encrypted in front of the data, the packet cannot be encrypted until we know how much data it will hold. By specifying the length at the beginning of the transmit phase, each DATA packet length can be set before we start loading data from userspace (where several sendmsg() calls may contribute to a particular packet). An error will be returned if too little or too much data is presented in the Tx phase. Signed-off-by: David Howells --- Documentation/networking/rxrpc.txt | 34 ++++++++++++++++++++++++ fs/afs/rxrpc.c | 18 ++++++++++++- include/linux/rxrpc.h | 1 + include/net/af_rxrpc.h | 2 ++ net/rxrpc/af_rxrpc.c | 5 +++- net/rxrpc/ar-internal.h | 3 ++- net/rxrpc/call_object.c | 3 +++ net/rxrpc/sendmsg.c | 54 ++++++++++++++++++++++++++++++++++++-- 8 files changed, 115 insertions(+), 5 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt index bce8e10a2a8e..8c70ba5dee4d 100644 --- a/Documentation/networking/rxrpc.txt +++ b/Documentation/networking/rxrpc.txt @@ -327,6 +327,7 @@ calls, to invoke certain actions and to report certain conditions. These are: RXRPC_ACCEPT s-- n/a Accept new call RXRPC_EXCLUSIVE_CALL s-- n/a Make an exclusive client call RXRPC_UPGRADE_SERVICE s-- n/a Client call can be upgraded + RXRPC_TX_LENGTH s-- data len Total length of Tx data (SRT = usable in Sendmsg / delivered by Recvmsg / Terminal message) @@ -406,6 +407,19 @@ calls, to invoke certain actions and to report certain conditions. These are: future communication to that server and RXRPC_UPGRADE_SERVICE should no longer be set. + (*) RXRPC_TX_LENGTH + + This is used to inform the kernel of the total amount of data that is + going to be transmitted by a call (whether in a client request or a + service response). If given, it allows the kernel to encrypt from the + userspace buffer directly to the packet buffers, rather than copying into + the buffer and then encrypting in place. This may only be given with the + first sendmsg() providing data for a call. EMSGSIZE will be generated if + the amount of data actually given is different. + + This takes a parameter of __s64 type that indicates how much will be + transmitted. This may not be less than zero. + The symbol RXRPC__SUPPORTED is defined as one more than the highest control message type supported. At run time this can be queried by means of the RXRPC_SUPPORTED_CMSG socket option (see below). @@ -577,6 +591,9 @@ A client would issue an operation by: MSG_MORE should be set in msghdr::msg_flags on all but the last part of the request. Multiple requests may be made simultaneously. + An RXRPC_TX_LENGTH control message can also be specified on the first + sendmsg() call. + If a call is intended to go to a destination other than the default specified through connect(), then msghdr::msg_name should be set on the first request message of that call. @@ -764,6 +781,7 @@ The kernel interface functions are as follows: struct sockaddr_rxrpc *srx, struct key *key, unsigned long user_call_ID, + s64 tx_total_len, gfp_t gfp); This allocates the infrastructure to make a new RxRPC call and assigns @@ -780,6 +798,11 @@ The kernel interface functions are as follows: control data buffer. It is entirely feasible to use this to point to a kernel data structure. + tx_total_len is the amount of data the caller is intending to transmit + with this call (or -1 if unknown at this point). Setting the data size + allows the kernel to encrypt directly to the packet buffers, thereby + saving a copy. The value may not be less than -1. + If this function is successful, an opaque reference to the RxRPC call is returned. The caller now holds a reference on this and it must be properly ended. @@ -931,6 +954,17 @@ The kernel interface functions are as follows: This is used to find the remote peer address of a call. + (*) Set the total transmit data size on a call. + + void rxrpc_kernel_set_tx_length(struct socket *sock, + struct rxrpc_call *call, + s64 tx_total_len); + + This sets the amount of data that the caller is intending to transmit on a + call. It's intended to be used for setting the reply size as the request + size should be set when the call is begun. tx_total_len may not be less + than zero. + ======================= CONFIGURABLE PARAMETERS diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index d5990eb160bd..02781e78ffb6 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -341,6 +341,7 @@ int afs_make_call(struct in_addr *addr, struct afs_call *call, gfp_t gfp, struct msghdr msg; struct kvec iov[1]; size_t offset; + s64 tx_total_len; u32 abort_code; int ret; @@ -364,9 +365,20 @@ int afs_make_call(struct in_addr *addr, struct afs_call *call, gfp_t gfp, srx.transport.sin.sin_port = call->port; memcpy(&srx.transport.sin.sin_addr, addr, 4); + /* Work out the length we're going to transmit. This is awkward for + * calls such as FS.StoreData where there's an extra injection of data + * after the initial fixed part. + */ + tx_total_len = call->request_size; + if (call->send_pages) { + tx_total_len += call->last_to - call->first_offset; + tx_total_len += (call->last - call->first) * PAGE_SIZE; + } + /* create a call */ rxcall = rxrpc_kernel_begin_call(afs_socket, &srx, call->key, - (unsigned long) call, gfp, + (unsigned long)call, + tx_total_len, gfp, (async ? afs_wake_up_async_call : afs_wake_up_call_waiter)); @@ -738,6 +750,8 @@ void afs_send_empty_reply(struct afs_call *call) _enter(""); + rxrpc_kernel_set_tx_length(afs_socket, call->rxcall, 0); + msg.msg_name = NULL; msg.msg_namelen = 0; iov_iter_kvec(&msg.msg_iter, WRITE | ITER_KVEC, NULL, 0, 0); @@ -772,6 +786,8 @@ void afs_send_simple_reply(struct afs_call *call, const void *buf, size_t len) _enter(""); + rxrpc_kernel_set_tx_length(afs_socket, call->rxcall, len); + iov[0].iov_base = (void *) buf; iov[0].iov_len = len; msg.msg_name = NULL; diff --git a/include/linux/rxrpc.h b/include/linux/rxrpc.h index bdd3175b9a48..7343f71783dc 100644 --- a/include/linux/rxrpc.h +++ b/include/linux/rxrpc.h @@ -57,6 +57,7 @@ enum rxrpc_cmsg_type { RXRPC_ACCEPT = 9, /* s-: [Service] accept request */ RXRPC_EXCLUSIVE_CALL = 10, /* s-: Call should be on exclusive connection */ RXRPC_UPGRADE_SERVICE = 11, /* s-: Request service upgrade for client call */ + RXRPC_TX_LENGTH = 12, /* s-: Total length of Tx data */ RXRPC__SUPPORTED }; diff --git a/include/net/af_rxrpc.h b/include/net/af_rxrpc.h index b5f5187f488c..c172709787af 100644 --- a/include/net/af_rxrpc.h +++ b/include/net/af_rxrpc.h @@ -33,6 +33,7 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *, struct sockaddr_rxrpc *, struct key *, unsigned long, + s64, gfp_t, rxrpc_notify_rx_t); int rxrpc_kernel_send_data(struct socket *, struct rxrpc_call *, @@ -46,5 +47,6 @@ void rxrpc_kernel_get_peer(struct socket *, struct rxrpc_call *, struct sockaddr_rxrpc *); int rxrpc_kernel_charge_accept(struct socket *, rxrpc_notify_rx_t, rxrpc_user_attach_call_t, unsigned long, gfp_t); +void rxrpc_kernel_set_tx_length(struct socket *, struct rxrpc_call *, s64); #endif /* _NET_RXRPC_H */ diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index 44a52b82bb5d..58ae0db52ea1 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -262,6 +262,7 @@ static int rxrpc_listen(struct socket *sock, int backlog) * @srx: The address of the peer to contact * @key: The security context to use (defaults to socket setting) * @user_call_ID: The ID to use + * @tx_total_len: Total length of data to transmit during the call (or -1) * @gfp: The allocation constraints * @notify_rx: Where to send notifications instead of socket queue * @@ -276,6 +277,7 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *sock, struct sockaddr_rxrpc *srx, struct key *key, unsigned long user_call_ID, + s64 tx_total_len, gfp_t gfp, rxrpc_notify_rx_t notify_rx) { @@ -303,7 +305,8 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *sock, cp.security_level = 0; cp.exclusive = false; cp.service_id = srx->srx_service; - call = rxrpc_new_client_call(rx, &cp, srx, user_call_ID, gfp); + call = rxrpc_new_client_call(rx, &cp, srx, user_call_ID, tx_total_len, + gfp); /* The socket has been unlocked. */ if (!IS_ERR(call)) call->notify_rx = notify_rx; diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index e9b536cb0acf..adbf37946450 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -528,6 +528,7 @@ struct rxrpc_call { struct rb_node sock_node; /* Node in rx->calls */ struct sk_buff *tx_pending; /* Tx socket buffer being filled */ wait_queue_head_t waitq; /* Wait queue for channel or Tx */ + s64 tx_total_len; /* Total length left to be transmitted (or -1) */ __be32 crypto_buf[2]; /* Temporary packet crypto buffer */ unsigned long user_call_ID; /* user-defined call ID */ unsigned long flags; @@ -683,7 +684,7 @@ struct rxrpc_call *rxrpc_alloc_call(gfp_t); struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *, struct rxrpc_conn_parameters *, struct sockaddr_rxrpc *, - unsigned long, gfp_t); + unsigned long, s64, gfp_t); void rxrpc_incoming_call(struct rxrpc_sock *, struct rxrpc_call *, struct sk_buff *); void rxrpc_release_call(struct rxrpc_sock *, struct rxrpc_call *); diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index 692110808baa..423030fd93be 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -127,6 +127,7 @@ struct rxrpc_call *rxrpc_alloc_call(gfp_t gfp) rwlock_init(&call->state_lock); atomic_set(&call->usage, 1); call->debug_id = atomic_inc_return(&rxrpc_debug_id); + call->tx_total_len = -1; memset(&call->sock_node, 0xed, sizeof(call->sock_node)); @@ -201,6 +202,7 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *rx, struct rxrpc_conn_parameters *cp, struct sockaddr_rxrpc *srx, unsigned long user_call_ID, + s64 tx_total_len, gfp_t gfp) __releases(&rx->sk.sk_lock.slock) { @@ -219,6 +221,7 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *rx, return call; } + call->tx_total_len = tx_total_len; trace_rxrpc_call(call, rxrpc_call_new_client, atomic_read(&call->usage), here, (const void *)user_call_ID); diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index d939a5b1abc3..2e636a525a65 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -29,6 +29,7 @@ enum rxrpc_command { }; struct rxrpc_send_params { + s64 tx_total_len; /* Total Tx data length (if send data) */ unsigned long user_call_ID; /* User's call ID */ u32 abort_code; /* Abort code to Tx (if abort) */ enum rxrpc_command command : 8; /* The command to implement */ @@ -207,6 +208,13 @@ static int rxrpc_send_data(struct rxrpc_sock *rx, more = msg->msg_flags & MSG_MORE; + if (call->tx_total_len != -1) { + if (len > call->tx_total_len) + return -EMSGSIZE; + if (!more && len != call->tx_total_len) + return -EMSGSIZE; + } + skb = call->tx_pending; call->tx_pending = NULL; rxrpc_see_skb(skb, rxrpc_skb_tx_seen); @@ -299,6 +307,8 @@ static int rxrpc_send_data(struct rxrpc_sock *rx, sp->remain -= copy; skb->mark += copy; copied += copy; + if (call->tx_total_len != -1) + call->tx_total_len -= copy; } /* check for the far side aborting the call or a network error @@ -436,6 +446,14 @@ static int rxrpc_sendmsg_cmsg(struct msghdr *msg, struct rxrpc_send_params *p) return -EINVAL; break; + case RXRPC_TX_LENGTH: + if (p->tx_total_len != -1 || len != sizeof(__s64)) + return -EINVAL; + p->tx_total_len = *(__s64 *)CMSG_DATA(cmsg); + if (p->tx_total_len < 0) + return -EINVAL; + break; + default: return -EINVAL; } @@ -443,6 +461,8 @@ static int rxrpc_sendmsg_cmsg(struct msghdr *msg, struct rxrpc_send_params *p) if (!got_user_ID) return -EINVAL; + if (p->tx_total_len != -1 && p->command != RXRPC_CMD_SEND_DATA) + return -EINVAL; _leave(" = 0"); return 0; } @@ -481,7 +501,8 @@ rxrpc_new_client_call_for_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, cp.exclusive = rx->exclusive | p->exclusive; cp.upgrade = p->upgrade; cp.service_id = srx->srx_service; - call = rxrpc_new_client_call(rx, &cp, srx, p->user_call_ID, GFP_KERNEL); + call = rxrpc_new_client_call(rx, &cp, srx, p->user_call_ID, + p->tx_total_len, GFP_KERNEL); /* The socket is now unlocked */ _leave(" = %p\n", call); @@ -501,6 +522,7 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) int ret; struct rxrpc_send_params p = { + .tx_total_len = -1, .user_call_ID = 0, .abort_code = 0, .command = RXRPC_CMD_SEND_DATA, @@ -555,6 +577,15 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) ret = -ERESTARTSYS; goto error_put; } + + if (p.tx_total_len != -1) { + ret = -EINVAL; + if (call->tx_total_len != -1 || + call->tx_pending || + call->tx_top != 0) + goto error_put; + call->tx_total_len = p.tx_total_len; + } } state = READ_ONCE(call->state); @@ -672,5 +703,24 @@ bool rxrpc_kernel_abort_call(struct socket *sock, struct rxrpc_call *call, mutex_unlock(&call->user_mutex); return aborted; } - EXPORT_SYMBOL(rxrpc_kernel_abort_call); + +/** + * rxrpc_kernel_set_tx_length - Set the total Tx length on a call + * @sock: The socket the call is on + * @call: The call to be informed + * @tx_total_len: The amount of data to be transmitted for this call + * + * Allow a kernel service to set the total transmit length on a call. This + * allows buffer-to-packet encrypt-and-copy to be performed. + * + * This function is primarily for use for setting the reply length since the + * request length can be set when beginning the call. + */ +void rxrpc_kernel_set_tx_length(struct socket *sock, struct rxrpc_call *call, + s64 tx_total_len) +{ + WARN_ON(call->tx_total_len != -1); + call->tx_total_len = tx_total_len; +} +EXPORT_SYMBOL(rxrpc_kernel_set_tx_length); -- cgit v1.2.3 From 99c195fb4eea405160ade58f74f62aed19b1822c Mon Sep 17 00:00:00 2001 From: Dave Watson Date: Wed, 14 Jun 2017 11:37:51 -0700 Subject: tls: Documentation Add documentation for the tcp ULP tls interface. Signed-off-by: Boris Pismenny Signed-off-by: Dave Watson Signed-off-by: David S. Miller --- Documentation/networking/tls.txt | 135 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 135 insertions(+) create mode 100644 Documentation/networking/tls.txt (limited to 'Documentation/networking') diff --git a/Documentation/networking/tls.txt b/Documentation/networking/tls.txt new file mode 100644 index 000000000000..77ed00631c12 --- /dev/null +++ b/Documentation/networking/tls.txt @@ -0,0 +1,135 @@ +Overview +======== + +Transport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs over +TCP. TLS provides end-to-end data integrity and confidentiality. + +User interface +============== + +Creating a TLS connection +------------------------- + +First create a new TCP socket and set the TLS ULP. + + sock = socket(AF_INET, SOCK_STREAM, 0); + setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls")); + +Setting the TLS ULP allows us to set/get TLS socket options. Currently +only the symmetric encryption is handled in the kernel. After the TLS +handshake is complete, we have all the parameters required to move the +data-path to the kernel. There is a separate socket option for moving +the transmit and the receive into the kernel. + + /* From linux/tls.h */ + struct tls_crypto_info { + unsigned short version; + unsigned short cipher_type; + }; + + struct tls12_crypto_info_aes_gcm_128 { + struct tls_crypto_info info; + unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE]; + unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE]; + unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE]; + unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE]; + }; + + + struct tls12_crypto_info_aes_gcm_128 crypto_info; + + crypto_info.info.version = TLS_1_2_VERSION; + crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128; + memcpy(crypto_info.iv, iv_write, TLS_CIPHER_AES_GCM_128_IV_SIZE); + memcpy(crypto_info.rec_seq, seq_number_write, + TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE); + memcpy(crypto_info.key, cipher_key_write, TLS_CIPHER_AES_GCM_128_KEY_SIZE); + memcpy(crypto_info.salt, implicit_iv_write, TLS_CIPHER_AES_GCM_128_SALT_SIZE); + + setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info)); + +Sending TLS application data +---------------------------- + +After setting the TLS_TX socket option all application data sent over this +socket is encrypted using TLS and the parameters provided in the socket option. +For example, we can send an encrypted hello world record as follows: + + const char *msg = "hello world\n"; + send(sock, msg, strlen(msg)); + +send() data is directly encrypted from the userspace buffer provided +to the encrypted kernel send buffer if possible. + +The sendfile system call will send the file's data over TLS records of maximum +length (2^14). + + file = open(filename, O_RDONLY); + fstat(file, &stat); + sendfile(sock, file, &offset, stat.st_size); + +TLS records are created and sent after each send() call, unless +MSG_MORE is passed. MSG_MORE will delay creation of a record until +MSG_MORE is not passed, or the maximum record size is reached. + +The kernel will need to allocate a buffer for the encrypted data. +This buffer is allocated at the time send() is called, such that +either the entire send() call will return -ENOMEM (or block waiting +for memory), or the encryption will always succeed. If send() returns +-ENOMEM and some data was left on the socket buffer from a previous +call using MSG_MORE, the MSG_MORE data is left on the socket buffer. + +Send TLS control messages +------------------------- + +Other than application data, TLS has control messages such as alert +messages (record type 21) and handshake messages (record type 22), etc. +These messages can be sent over the socket by providing the TLS record type +via a CMSG. For example the following function sends @data of @length bytes +using a record of type @record_type. + +/* send TLS control message using record_type */ + static int klts_send_ctrl_message(int sock, unsigned char record_type, + void *data, size_t length) + { + struct msghdr msg = {0}; + int cmsg_len = sizeof(record_type); + struct cmsghdr *cmsg; + char buf[CMSG_SPACE(cmsg_len)]; + struct iovec msg_iov; /* Vector of data to send/receive into. */ + + msg.msg_control = buf; + msg.msg_controllen = sizeof(buf); + cmsg = CMSG_FIRSTHDR(&msg); + cmsg->cmsg_level = SOL_TLS; + cmsg->cmsg_type = TLS_SET_RECORD_TYPE; + cmsg->cmsg_len = CMSG_LEN(cmsg_len); + *CMSG_DATA(cmsg) = record_type; + msg.msg_controllen = cmsg->cmsg_len; + + msg_iov.iov_base = data; + msg_iov.iov_len = length; + msg.msg_iov = &msg_iov; + msg.msg_iovlen = 1; + + return sendmsg(sock, &msg, 0); + } + +Control message data should be provided unencrypted, and will be +encrypted by the kernel. + +Integrating in to userspace TLS library +--------------------------------------- + +At a high level, the kernel TLS ULP is a replacement for the record +layer of a userspace TLS library. + +A patchset to OpenSSL to use ktls as the record layer is here: + +https://github.com/Mellanox/tls-openssl + +An example of calling send directly after a handshake using +gnutls. Since it doesn't implement a full record layer, control +messages are not supported: + +https://github.com/Mellanox/tls-af_ktls_tool -- cgit v1.2.3 From c017ce0a9ad753e3835d7f8f5fca89960380e816 Mon Sep 17 00:00:00 2001 From: Vincent Bernat Date: Tue, 27 Jun 2017 15:42:57 +0200 Subject: net: remove policy-routing.txt documentation It dates back from 2.1.16 and is obsolete since 2.1.68 when the current rule system has been introduced. Signed-off-by: Vincent Bernat Signed-off-by: David S. Miller --- Documentation/networking/policy-routing.txt | 150 ---------------------------- 1 file changed, 150 deletions(-) delete mode 100644 Documentation/networking/policy-routing.txt (limited to 'Documentation/networking') diff --git a/Documentation/networking/policy-routing.txt b/Documentation/networking/policy-routing.txt deleted file mode 100644 index 36f6936d7f21..000000000000 --- a/Documentation/networking/policy-routing.txt +++ /dev/null @@ -1,150 +0,0 @@ -Classes -------- - - "Class" is a complete routing table in common sense. - I.e. it is tree of nodes (destination prefix, tos, metric) - with attached information: gateway, device etc. - This tree is looked up as specified in RFC1812 5.2.4.3 - 1. Basic match - 2. Longest match - 3. Weak TOS. - 4. Metric. (should not be in kernel space, but they are) - 5. Additional pruning rules. (not in kernel space). - - We have two special type of nodes: - REJECT - abort route lookup and return an error value. - THROW - abort route lookup in this class. - - - Currently the number of classes is limited to 255 - (0 is reserved for "not specified class") - - Three classes are builtin: - - RT_CLASS_LOCAL=255 - local interface addresses, - broadcasts, nat addresses. - - RT_CLASS_MAIN=254 - all normal routes are put there - by default. - - RT_CLASS_DEFAULT=253 - if ip_fib_model==1, then - normal default routes are put there, if ip_fib_model==2 - all gateway routes are put there. - - -Rules ------ - Rule is a record of (src prefix, src interface, tos, dst prefix) - with attached information. - - Rule types: - RTP_ROUTE - lookup in attached class - RTP_NAT - lookup in attached class and if a match is found, - translate packet source address. - RTP_MASQUERADE - lookup in attached class and if a match is found, - masquerade packet as sourced by us. - RTP_DROP - silently drop the packet. - RTP_REJECT - drop the packet and send ICMP NET UNREACHABLE. - RTP_PROHIBIT - drop the packet and send ICMP COMM. ADM. PROHIBITED. - - Rule flags: - RTRF_LOG - log route creations. - RTRF_VALVE - One way route (used with masquerading) - -Default setup: - -root@amber:/pub/ip-routing # iproute -r -Kernel routing policy rules -Pref Source Destination TOS Iface Cl - 0 default default 00 * 255 - 254 default default 00 * 254 - 255 default default 00 * 253 - - -Lookup algorithm ----------------- - - We scan rules list, and if a rule is matched, apply it. - If a route is found, return it. - If it is not found or a THROW node was matched, continue - to scan rules. - -Applications ------------- - -1. Just ignore classes. All the routes are put into MAIN class - (and/or into DEFAULT class). - - HOWTO: iproute add PREFIX [ tos TOS ] [ gw GW ] [ dev DEV ] - [ metric METRIC ] [ reject ] ... (look at iproute utility) - - or use route utility from current net-tools. - -2. Opposite case. Just forget all that you know about routing - tables. Every rule is supplied with its own gateway, device - info. record. This approach is not appropriate for automated - route maintenance, but it is ideal for manual configuration. - - HOWTO: iproute addrule [ from PREFIX ] [ to PREFIX ] [ tos TOS ] - [ dev INPUTDEV] [ pref PREFERENCE ] route [ gw GATEWAY ] - [ dev OUTDEV ] ..... - - Warning: As of now the size of the routing table in this - approach is limited to 256. If someone likes this model, I'll - relax this limitation. - -3. OSPF classes (see RFC1583, RFC1812 E.3.3) - Very clean, stable and robust algorithm for OSPF routing - domains. Unfortunately, it is not widely used in the Internet. - - Proposed setup: - 255 local addresses - 254 interface routes - 253 ASE routes with external metric - 252 ASE routes with internal metric - 251 inter-area routes - 250 intra-area routes for 1st area - 249 intra-area routes for 2nd area - etc. - - Rules: - iproute addrule class 253 - iproute addrule class 252 - iproute addrule class 251 - iproute addrule to a-prefix-for-1st-area class 250 - iproute addrule to another-prefix-for-1st-area class 250 - ... - iproute addrule to a-prefix-for-2nd-area class 249 - ... - - Area classes must be terminated with reject record. - iproute add default reject class 250 - iproute add default reject class 249 - ... - -4. The Variant Router Requirements Algorithm (RFC1812 E.3.2) - Create 16 classes for different TOS values. - It is a funny, but pretty useless algorithm. - I listed it just to show the power of new routing code. - -5. All the variety of combinations...... - - -GATED ------ - - Gated does not understand classes, but it will work - happily in MAIN+DEFAULT. All policy routes can be set - and maintained manually. - -IMPORTANT NOTE --------------- - route.c has a compilation time switch CONFIG_IP_LOCAL_RT_POLICY. - If it is set, locally originated packets are routed - using all the policy list. This is not very convenient and - pretty ambiguous when used with NAT and masquerading. - I set it to FALSE by default. - - -Alexey Kuznetov -kuznet@ms2.inr.ac.ru -- cgit v1.2.3 From 75674c4cbb03a8b6ed7ef1f72b9e9aecc0bfd9dc Mon Sep 17 00:00:00 2001 From: Matteo Croce Date: Fri, 30 Jun 2017 18:21:47 +0200 Subject: Documentation: fix wrong example command In the IPVLAN documentation there is an example command line where the master and slave interface names are inverted. Fix the command line and also add the optional `name' keyword to better describe what the command is doing. v2: added commit message Signed-off-by: Matteo Croce Signed-off-by: David S. Miller --- Documentation/networking/ipvlan.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ipvlan.txt b/Documentation/networking/ipvlan.txt index 24196cef7c91..1fe42a874aae 100644 --- a/Documentation/networking/ipvlan.txt +++ b/Documentation/networking/ipvlan.txt @@ -22,9 +22,9 @@ The driver can be built into the kernel (CONFIG_IPVLAN=y) or as a module There are no module parameters for this driver and it can be configured using IProute2/ip utility. - ip link add link type ipvlan mode { l2 | l3 | l3s } + ip link add link name type ipvlan mode { l2 | l3 | l3s } - e.g. ip link add link ipvl0 eth0 type ipvlan mode l2 + e.g. ip link add link eth0 name ipvl0 type ipvlan mode l2 4. Operating modes: -- cgit v1.2.3