summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-01-09stmmac: move stmmac_clk, pclk, clk_ptp_ref and stmmac_rst to platform structurejpinto6-75/+70
This patch moves stmmac_clk, pclk, clk_ptp_ref and stmmac_rst to the plat_stmmacenet_data structure. It also moves these platform variables initialization to stmmac_platform. This was done for two reasons: a) If PCI is used, platform related code is being executed in stmmac_main resulting in warnings that have no sense and conceptually was not right b) stmmac as a synopsys reference ethernet driver stack will be hosting more and more drivers to its structure like synopsys/dwc_eth_qos.c. These drivers have their own DT bindings that are not compatible with stmmac's. One of the most important are the clock names, and so they need to be parsed in the glue logic and initialized there, and that is the main reason why the clocks were passed to the platform structure. Signed-off-by: Joao Pinto <jpinto@synopsys.com> Tested-by: Niklas Cassel <niklas.cassel@axis.com> Reviewed-by: Lars Persson <larper@axis.com> Acked-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09stmmac: adding DT parameter for LPI tx clock gatingjpinto8-4/+20
This patch adds a new parameter to the stmmac DT: snps,en-tx-lpi-clockgating. It was ported from synopsys/dwc_eth_qos.c and it is useful if lpi tx clock gating is needed by stmmac users also. Signed-off-by: Joao Pinto <jpinto@synopsys.com> Tested-by: Niklas Cassel <niklas.cassel@axis.com> Reviewed-by: Lars Persson <larper@axis.com> Acked-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09alx: add feature flag for rx checksummingTobias Regnery1-0/+1
The code to handle rx checksumming was in the driver since its introduction but for reasons unknown the feature flag was left out. Now it is possible to enable this feature with ethtool. Tested on my AR8161 ethernet card, there are no regressions observed in netperf if this feature is enabled. Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09Merge branch 'act_csum-sctp'David S. Miller3-1/+33
Davide Caratti says: ==================== net/sched: act_csum: add support for SCTP checksum This series extends current act_csum functionality to allow computation of SCTP checksums. Patch 1 ensures LIBCRC32C will be selected if NET_ACT_CSUM is selected. Patch 2 extends act_csum to handle IPPROTO_SCTP protocol in IPv4/IPv6 header, and eventually compute the CRC32c value. v2: - style fix in tc_csum.h - avoid nested if statement in act_csum.c ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net/sched: act_csum: compute crc32c on SCTP packetsDavide Caratti2-1/+32
modify act_csum to compute crc32c on IPv4/IPv6 packets having SCTP in their payload, and extend UAPI definitions accordingly. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net/sched: Kconfig: select LIBCRC32C if NET_ACT_CSUM is selectedDavide Caratti1-0/+1
LIBCRC32C is needed to compute crc32c on SCTP packets. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09Merge branch 'mlxsw-small-driver-update'David S. Miller3-54/+59
Jiri Pirko says: ==================== mlxsw: small driver update This patchset contains various small "non-net" fixes and enhancements. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09mlxsw: spectrum: Change ENOTSUPP to EOPNOTSUPPYotam Gigi1-4/+4
As ENOTSUPP is specific to NFS, change the return error value to EOPNOTSUPP in various places in the mlxsw driver. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09mlxsw: spectrum: Fix order of commands in port remove functionYotam Gigi1-1/+1
Fix the order of the free directives to match the port init function Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09mlxsw: spectrum: Make the add_matchall_tc_entry symmetricYotam Gigi1-43/+48
Currently, the mlxsw spectrum driver only supports offloading the matchall classifier together with the mirred action. To allow more matchall tc offloads, make the code symmetric so that it can be easily extended later on for other actions. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09mlxsw: cmd: Fix API name comments for event-queuesElad Raz1-5/+5
Probably some copy-paste error from "int_msix" that caused "int_" prefix to appear in the comments for all "eq_" APIs. Signed-off-by: Elad Raz <eladr@mellanox.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09mlxsw: Fix mlxsw_i2c_write return valueElad Raz1-1/+1
The "err" variable is been checked, return always 0. Signed-off-by: Elad Raz <eladr@mellanox.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net: ethernet: ti: cpsw: extend limits for cpsw_get/set_ringparamIvan Khoronzhuk1-4/+3
Allow to set number of descs close to possible values. In case of minimum limit it's equal to number of channels to be able to set at least one desc per channel. For maximum limit leave enough descs number for tx channels. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09cls_u32: don't bother explicitly initializing ->divisor to zeroAlexandru Moise1-1/+0
This struct member is already initialized to zero upon root_ht's allocation via kzalloc(). Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09Merge branch 'siphash'David S. Miller10-125/+1189
Jason A. Donenfeld says: ==================== Introduce The SipHash PRF This patch series introduces SipHash into the kernel. SipHash is a cryptographically secure PRF, which serves a variety of functions, and is introduced in patch #1. The following patch #2 introduces HalfSipHash, an optimization suitable for hash tables only. Finally, the last two patches in this series show two usages of the introduced siphash function family. It is expected that after this initial introduction, other usages will follow. Please read the extensive descriptions in patch #1 and patch #2 of what these functions do and the various levels of assurances. They're products of intense cryptographic research, and I believe they're suitable for the uses outlined herein. The use of SipHash is not limited to the networking subsystem -- indeed I would like to use it in other places too in the kernel. But after discussing with a few on this list and at Linus' suggestion, the initial import of these functions is coming through the networking tree. After these are merged, it will then be easier to expand use elsewhere. Changes v2->v3: - hsiphash keys now simply use an unsigned long, in order to avoid a cluttered ifdef and make it a bit more clear what's happening. - A typo in the documentation has been fixed. - The documentation has been augmented with an example relating to struct packing and passing. - The net_secret variable is now __read_mostly. Hopefully this is the last of the required revisions, and v3 can be merged into net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09syncookies: use SipHash in place of SHA1Jason A. Donenfeld2-38/+24
SHA1 is slower and less secure than SipHash, and so replacing syncookie generation with SipHash makes natural sense. Some BSDs have been doing this for several years in fact. The speedup should be similar -- and even more impressive -- to the speedup from the sequence number fix in this series. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09secure_seq: use SipHash in place of MD5Jason A. Donenfeld1-82/+63
This gives a clear speed and security improvement. Siphash is both faster and is more solid crypto than the aging MD5. Rather than manually filling MD5 buffers, for IPv6, we simply create a layout by a simple anonymous struct, for which gcc generates rather efficient code. For IPv4, we pass the values directly to the short input convenience functions. 64-bit x86_64: [ 1.683628] secure_tcpv6_sequence_number_md5# cycles: 99563527 [ 1.717350] secure_tcp_sequence_number_md5# cycles: 92890502 [ 1.741968] secure_tcpv6_sequence_number_siphash# cycles: 67825362 [ 1.762048] secure_tcp_sequence_number_siphash# cycles: 67485526 32-bit x86: [ 1.600012] secure_tcpv6_sequence_number_md5# cycles: 103227892 [ 1.634219] secure_tcp_sequence_number_md5# cycles: 94732544 [ 1.669102] secure_tcpv6_sequence_number_siphash# cycles: 96299384 [ 1.700165] secure_tcp_sequence_number_siphash# cycles: 86015473 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Miller <davem@davemloft.net> Cc: David Laight <David.Laight@aculab.com> Cc: Tom Herbert <tom@herbertland.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09siphash: implement HalfSipHash1-3 for hash tablesJason A. Donenfeld4-5/+546
HalfSipHash, or hsiphash, is a shortened version of SipHash, which generates 32-bit outputs using a weaker 64-bit key. It has *much* lower security margins, and shouldn't be used for anything too sensitive, but it could be used as a hashtable key function replacement, if the output is never exposed, and if the security requirement is not too high. The goal is to make this something that performance-critical jhash users would be willing to use. On 64-bit machines, HalfSipHash1-3 is slower than SipHash1-3, so we alias SipHash1-3 to HalfSipHash1-3 on those systems. 64-bit x86_64: [ 0.509409] test_siphash: SipHash2-4 cycles: 4049181 [ 0.510650] test_siphash: SipHash1-3 cycles: 2512884 [ 0.512205] test_siphash: HalfSipHash1-3 cycles: 3429920 [ 0.512904] test_siphash: JenkinsHash cycles: 978267 So, we map hsiphash() -> SipHash1-3 32-bit x86: [ 0.509868] test_siphash: SipHash2-4 cycles: 14812892 [ 0.513601] test_siphash: SipHash1-3 cycles: 9510710 [ 0.515263] test_siphash: HalfSipHash1-3 cycles: 3856157 [ 0.515952] test_siphash: JenkinsHash cycles: 1148567 So, we map hsiphash() -> HalfSipHash1-3 hsiphash() is roughly 3 times slower than jhash(), but comes with a considerable security improvement. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09siphash: add cryptographically secure PRFJason A. Donenfeld7-5/+561
SipHash is a 64-bit keyed hash function that is actually a cryptographically secure PRF, like HMAC. Except SipHash is super fast, and is meant to be used as a hashtable keyed lookup function, or as a general PRF for short input use cases, such as sequence numbers or RNG chaining. For the first usage: There are a variety of attacks known as "hashtable poisoning" in which an attacker forms some data such that the hash of that data will be the same, and then preceeds to fill up all entries of a hashbucket. This is a realistic and well-known denial-of-service vector. Currently hashtables use jhash, which is fast but not secure, and some kind of rotating key scheme (or none at all, which isn't good). SipHash is meant as a replacement for jhash in these cases. There are a modicum of places in the kernel that are vulnerable to hashtable poisoning attacks, either via userspace vectors or network vectors, and there's not a reliable mechanism inside the kernel at the moment to fix it. The first step toward fixing these issues is actually getting a secure primitive into the kernel for developers to use. Then we can, bit by bit, port things over to it as deemed appropriate. While SipHash is extremely fast for a cryptographically secure function, it is likely a bit slower than the insecure jhash, and so replacements will be evaluated on a case-by-case basis based on whether or not the difference in speed is negligible and whether or not the current jhash usage poses a real security risk. For the second usage: A few places in the kernel are using MD5 or SHA1 for creating secure sequence numbers, syn cookies, port numbers, or fast random numbers. SipHash is a faster and more fitting, and more secure replacement for MD5 in those situations. Replacing MD5 and SHA1 with SipHash for these uses is obvious and straight-forward, and so is submitted along with this patch series. There shouldn't be much of a debate over its efficacy. Dozens of languages are already using this internally for their hash tables and PRFs. Some of the BSDs already use this in their kernels. SipHash is a widely known high-speed solution to a widely known set of problems, and it's time we catch-up. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net: ipv4: remove disable of bottom half in inet_rtm_getrouteDavid Ahern1-2/+0
Nothing about the route lookup requires bottom half to be disabled. Remove the local_bh_disable ... local_bh_enable around ip_route_input. This appears to be a vestige of days gone by as it has been there since the beginning of git time. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net/mlx5: Activate support for 4K UARsEli Cohen1-0/+4
Activate 4K UAR support for firmware versions that support it. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-09IB/mlx5: Support 4k UAR for libmlx5Eli Cohen9-100/+42
Add fields to structs to convey to kernel an indication whether the library supports multi UARs per page and return to the library the size of a UAR based on the queried value. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-09IB/mlx5: Allow future extension of libmlx5 input dataEli Cohen6-166/+209
Current check requests that new fields in struct mlx5_ib_alloc_ucontext_req_v2 that are not known to the driver be zero. This was introduced so new libraries passing additional information to the kernel through struct mlx5_ib_alloc_ucontext_req_v2 will be notified by old kernels that do not support their request by failing the operation. This schecme is problematic since it requires libmlx5 to issue the requests with descending input size for struct mlx5_ib_alloc_ucontext_req_v2. To avoid this, we require that new features that will obey the following rules: If the feature requires one or more fields in the response and the at least one of the fields can be encoded such that a zero value means the kernel ignored the request then this field will provide the indication to the library. If no response is required or if zero is a valid response, a new field should be added that indicates to the library whether its request was processed. Fixes: b368d7cb8ceb ('IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext') Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-09IB/mlx5: Use blue flame register allocator in mlx5_ibEli Cohen10-221/+59
Make use of the blue flame registers allocator at mlx5_ib. Since blue flame was not really supported we remove all the code that is related to blue flame and we let all consumers to use the same blue flame register. Once blue flame is supported we will add the code. As part of this patch we also move the definition of struct mlx5_bf to mlx5_ib.h as it is only used by mlx5_ib. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-09net/mlx5: Add interface to get reference to a UAREli Cohen4-14/+59
A reference to a UAR is required to generate CQ or EQ doorbells. Since CQ or EQ doorbells can all be generated using the same UAR area without any effect on performance, we are just getting a reference to any available UAR, If one is not available we allocate it but we don't waste the blue flame registers it can provide and we will use them for subsequent allocations. We get a reference to such UAR and put in mlx5_priv so any kernel consumer can make use of it. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-09net: intel: e100: use new api ethtool_{get|set}_link_ksettingsPhilippe Reynes1-6/+8
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net: ibm: ibmvnic: use new api ethtool_{get|set}_link_ksettingsPhilippe Reynes1-13/+18
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net: ibm: ibmveth: use new api ethtool_{get|set}_link_ksettingsPhilippe Reynes1-12/+18
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net: ibm: emac: use new api ethtool_{get|set}_link_ksettingsPhilippe Reynes1-30/+40
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net: ibm: ehea: use new api ethtool_{get|set}_link_ksettingsPhilippe Reynes1-21/+30
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net: change init_inodecache() return voidyuan linyu1-4/+2
sock_init() call it but not check it's return value, so change it to void return and add an internal BUG_ON() check. Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09afs: Refcount the afs_call structDavid Howells4-79/+199
A static checker warning occurs in the AFS filesystem: fs/afs/cmservice.c:155 SRXAFSCB_CallBack() error: dereferencing freed memory 'call' due to the reply being sent before we access the server it points to. The act of sending the reply causes the call to be freed if an error occurs (but not if it doesn't). On top of this, the lifetime handling of afs_call structs is fragile because they get passed around through workqueues without any sort of refcounting. Deal with the issues by: (1) Fix the maybe/maybe not nature of the reply sending functions with regards to whether they release the call struct. (2) Refcount the afs_call struct and sort out places that need to get/put references. (3) Pass a ref through the work queue and release (or pass on) that ref in the work function. Care has to be taken because a work queue may already own a ref to the call. (4) Do the cleaning up in the put function only. (5) Simplify module cleanup by always incrementing afs_outstanding_calls whenever a call is allocated. (6) Set the backlog to 0 with kernel_listen() at the beginning of the process of closing the socket to prevent new incoming calls from occurring and to remove the contribution of preallocated calls from afs_outstanding_calls before we wait on it. A tracepoint is also added to monitor the afs_call refcount and lifetime. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Fixes: 08e0e7c82eea: "[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC."
2017-01-09rxrpc: Allow listen(sock, 0) to be used to disable listeningDavid Howells3-1/+11
Allow listen() with a backlog of 0 to be used to disable listening on an AF_RXRPC socket. This also releases any preallocation, thereby making it easier for a kernel service to account for all allocated call structures when shutting down the service. The socket cannot thereafter have listening reenabled, but must rather be closed and reopened. Signed-off-by: David Howells <dhowells@redhat.com>
2017-01-09afs: Kill afs_wait_modeDavid Howells7-149/+90
The afs_wait_mode struct isn't really necessary. Client calls only use one of a choice of two (synchronous or the asynchronous) and incoming calls don't use the wait at all. Replace with a boolean parameter. Signed-off-by: David Howells <dhowells@redhat.com>
2017-01-09afs: Add some tracepointsDavid Howells5-15/+144
Add three tracepoints to the AFS filesystem: (1) The afs_recv_data tracepoint logs data segments that are extracted from the data received from the peer through afs_extract_data(). (2) The afs_notify_call tracepoint logs notification from AF_RXRPC of data coming in to an asynchronous call. (3) The afs_cb_call tracepoint logs incoming calls that have had their operation ID extracted and mapped into a supported cache manager service call. To make (3) work, the name strings in the afs_call_type struct objects have to be annotated with __tracepoint_string. This is done with the CM_NAME() macro. Further, the AFS call state enum needs a name so that it can be used to declare parameter types. Signed-off-by: David Howells <dhowells@redhat.com>
2017-01-09Merge branch 'bcm_sf2-fixes'David S. Miller1-2/+9
Florian Fainelli says: ==================== net: dsa: bcm_sf2: Couple fixes Here are a couple of fixes for bcm_sf2, please queue these up for -stable as well, thank you very much! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net: dsa: bcm_sf2: Utilize nested MDIO read/writeFlorian Fainelli1-2/+2
We are implementing a MDIO bus which is behind another one, so use the nested version of the accessors to get lockdep annotations correct. Fixes: 461cd1b03e32 ("net: dsa: bcm_sf2: Register our slave MDIO bus") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net: dsa: bcm_sf2: Do not clobber b53_switch_opsFlorian Fainelli1-0/+7
We make the bcm_sf2 driver override ds->ops which points to b53_switch_ops since b53_switch_alloc() did the assignent. This is all well and good until a second b53 switch comes in, and ends up using the bcm_sf2 operations. Make a proper local copy, substitute the ds->ops pointer and then override the operations. Fixes: f458995b9ad8 ("net: dsa: bcm_sf2: Utilize core B53 driver when possible") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09Merge branch 'tc-skb-diet'David S. Miller13-118/+66
Willem de Bruijn says: ==================== convert tc_verd to integer bitfields The skb tc_verd field takes up two bytes but uses far fewer bits. Convert the remaining use cases to bitfields that fit in existing holes (depending on config options) and potentially save the two bytes in struct sk_buff. This patchset is based on an earlier set by Florian Westphal and its discussion (http://www.spinics.net/lists/netdev/msg329181.html). Patches 1 and 2 are low hanging fruit: removing the last traces of data that are no longer stored in tc_verd. Patches 3 and 4 convert tc_verd to individual bitfields (5 bits). Patch 5 reduces TC_AT to a single bitfield, as AT_STACK is not valid here (unlike in the case of TC_FROM). Patch 6 changes TC_FROM to two bitfields with clearly defined purpose. It may be possible to reduce storage further after this initial round. If tc_skip_classify is set only by IFB, testing skb_iif may suffice. The L2 header pushing/popping logic can perhaps be shared with AF_PACKET, which currently not pkt_type for the same purpose. Changes: RFC -> v1 - (patch 3): remove no longer needed label in tfc_action_exec - (patch 5): set tc_at_ingress at the same points as existing SET_TC_AT calls Tested ingress mirred + netem + ifb: ip link set dev ifb0 up tc qdisc add dev eth0 ingress tc filter add dev eth0 parent ffff: \ u32 match ip dport 8000 0xffff \ action mirred egress redirect dev ifb0 tc qdisc add dev ifb0 root netem delay 1000ms nc -u -l 8000 & ssh $otherhost nc -u $host 8000 Tested egress mirred: ip link add veth1 type veth peer name veth2 ip link set dev veth1 up ip link set dev veth2 up tcpdump -n -i veth2 udp and dst port 8000 & tc qdisc add dev eth0 root handle 1: prio tc filter add dev eth0 parent 1:0 \ u32 match ip dport 8000 0xffff \ action mirred egress redirect dev veth1 tc qdisc add dev veth1 root netem delay 1000ms nc -u $otherhost 8000 Tested ingress mirred: ip link add veth1 type veth peer name veth2 ip link add veth3 type veth peer name veth4 ip netns add ns0 ip netns add ns1 for i in 1 2 3 4; do \ NS=ns$((${i}%2)); \ ip link set dev veth${i} netns ${NS}; \ ip netns exec ${NS} \ ip addr add dev veth${i} 192.168.1.${i}/24; \ ip netns exec ${NS} \ ip link set dev veth${i} up; \ done ip netns exec ns0 tc qdisc add dev veth2 ingress ip netns exec ns0 \ tc filter add dev veth2 parent ffff: \ u32 match ip dport 8000 0xffff \ action mirred ingress redirect dev veth4 ip netns exec ns0 \ tcpdump -n -i veth4 udp and dst port 8000 & ip netns exec ns1 \ nc -u 192.168.1.2 8000 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net-tc: convert tc_from to tc_from_ingress and tc_redirectedWillem de Bruijn6-19/+15
The tc_from field fulfills two roles. It encodes whether a packet was redirected by an act_mirred device and, if so, whether act_mirred was called on ingress or egress. Split it into separate fields. The information is needed by the special IFB loop, where packets are taken out of the normal path by act_mirred, forwarded to IFB, then reinjected at their original location (ingress or egress) by IFB. The IFB device cannot use skb->tc_at_ingress, because that may have been overwritten as the packet travels from act_mirred to ifb_xmit, when it passes through tc_classify on the IFB egress path. Cache this value in skb->tc_from_ingress. That field is valid only if a packet arriving at ifb_xmit came from act_mirred. Other packets can be crafted to reach ifb_xmit. These must be dropped. Set tc_redirected on redirection and drop all packets that do not have this bit set. Both fields are set only on cloned skbs in tc actions, so original packet sources do not have to clear the bit when reusing packets (notably, pktgen and octeon). Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net-tc: convert tc_at to tc_at_ingressWillem de Bruijn4-14/+12
Field tc_at is used only within tc actions to distinguish ingress from egress processing. A single bit is sufficient for this purpose. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net-tc: convert tc_verd to integer bitfieldsWillem de Bruijn11-65/+29
Extract the remaining two fields from tc_verd and remove the __u16 completely. TC_AT and TC_FROM are converted to equivalent two-bit integer fields tc_at and tc_from. Where possible, use existing helper skb_at_tc_ingress when reading tc_at. Introduce helper skb_reset_tc to clear fields. Not documenting tc_from and tc_at, because they will be replaced with single bit fields in follow-on patches. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net-tc: extract skip classify bit from tc_verdWillem de Bruijn6-22/+23
Packets sent by the IFB device skip subsequent tc classification. A single bit governs this state. Move it out of tc_verd in anticipation of removing that __u16 completely. The new bitfield tc_skip_classify temporarily uses one bit of a hole, until tc_verd is removed completely in a follow-up patch. Remove the bit hole comment. It could be 2, 3, 4 or 5 bits long. With that many options, little value in documenting it. Introduce a helper function to deduplicate the logic in the two sites that check this bit. The field tc_skip_classify is set only in IFB on skbs cloned in act_mirred, so original packet sources do not have to clear the bit when reusing packets (notably, pktgen and octeon). Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net-tc: make MAX_RECLASSIFY_LOOP localWillem de Bruijn2-6/+2
This field is no longer kept in tc_verd. Remove it from the global definition of that struct. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net-tc: remove unused tc_verd fieldsWillem de Bruijn1-7/+0
Remove the last reference to tc_verd's munge and redirect ttl bits. These fields are no longer used. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net: stmmac: fix maxmtu assignment to be within valid rangeKweh, Hock Leong2-1/+15
There is no checking valid value of maxmtu when getting it from device tree. This resolution added the checking condition to ensure the assignment is made within a valid range. Signed-off-by: Kweh, Hock Leong <hock.leong.kweh@intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09mdio: Demote print from info to debug in mdio_device_registerFlorian Fainelli1-1/+1
While it is useful to know which MDIO device is being registered, demote the dev_info() to a dev_dbg(). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net: remove useless memset's in drivers get_stats64stephen hemminger3-4/+0
In dev_get_stats() the statistic structure storage has already been zeroed. Therefore network drivers do not need to call memset() again. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net: make ndo_get_stats64 a void functionstephen hemminger82-309/+166
The network device operation for reading statistics is only called in one place, and it ignores the return value. Having a structure return value is potentially confusing because some future driver could incorrectly assume that the return value was used. Fix all drivers with ndo_get_stats64 to have a void function. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09Merge branch '100GbE' of ↵David S. Miller7-30/+33
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2017-01-08 This series contains updates to fm10k only. Ngai-Mint changes the driver to use the MAC pointer in the fm10k_mac_info structure for fm10k_get_host_state_generic(). Fixed a race condition where the mailbox interrupt request bits can be cleared before being handled causing certain mailbox messages from the PF to be untreated and the PF will enter in some inactive state. Jake removes the typecast of u8 to char, and the extra variable that was created for the typecast. Bumps the driver version. Added back the receive descriptor timestamp value so that applications built on top of the IES API can function properly. Cleaned up the debug statistics flag, since debug statistics were removed and the flag was missed in the removal. Scott limits the DMA sync for CPU to the actual length of the packet, instead of the entire buffer, since the DMA sync occurs every time a packet is received. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>