summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2019-04-08xfrm: make xfrm modes builtinFlorian Westphal1-9/+4
after previous changes, xfrm_mode contains no function pointers anymore and all modules defining such struct contain no code except an init/exit functions to register the xfrm_mode struct with the xfrm core. Just place the xfrm modes core and remove the modules, the run-time xfrm_mode register/unregister functionality is removed. Before: text data bss dec filename 7523 200 2364 10087 net/xfrm/xfrm_input.o 40003 628 440 41071 net/xfrm/xfrm_state.o 15730338 6937080 4046908 26714326 vmlinux 7389 200 2364 9953 net/xfrm/xfrm_input.o 40574 656 440 41670 net/xfrm/xfrm_state.o 15730084 6937068 4046908 26714060 vmlinux The xfrm*_mode_{transport,tunnel,beet} modules are gone. v2: replace CONFIG_INET6_XFRM_MODE_* IS_ENABLED guards with CONFIG_IPV6 ones rather than removing them. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08xfrm: remove afinfo pointer from xfrm_modeFlorian Westphal1-1/+0
Adds an EXPORT_SYMBOL for afinfo_get_rcu, as it will now be called from ipv6 in case of CONFIG_IPV6=m. This change has virtually no effect on vmlinux size, but it reduces afinfo size and allows followup patch to make xfrm modes const. v2: mark if (afinfo) tests as likely (Sabrina) re-fetch afinfo according to inner_mode in xfrm_prepare_input(). Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08xfrm: remove output2 indirection from xfrm_modeFlorian Westphal1-13/+0
similar to previous patch: no external module dependencies, so we can avoid the indirection by placing this in the core. This change removes the last indirection from xfrm_mode and the xfrm4|6_mode_{beet,tunnel}.c modules contain (almost) no code anymore. Before: text data bss dec hex filename 3957 136 0 4093 ffd net/xfrm/xfrm_output.o 587 44 0 631 277 net/ipv4/xfrm4_mode_beet.o 649 32 0 681 2a9 net/ipv4/xfrm4_mode_tunnel.o 625 44 0 669 29d net/ipv6/xfrm6_mode_beet.o 599 32 0 631 277 net/ipv6/xfrm6_mode_tunnel.o After: text data bss dec hex filename 5359 184 0 5543 15a7 net/xfrm/xfrm_output.o 171 24 0 195 c3 net/ipv4/xfrm4_mode_beet.o 171 24 0 195 c3 net/ipv4/xfrm4_mode_tunnel.o 172 24 0 196 c4 net/ipv6/xfrm6_mode_beet.o 172 24 0 196 c4 net/ipv6/xfrm6_mode_tunnel.o v2: fold the *encap_add functions into xfrm*_prepare_output preserve (move) output2 comment (Sabrina) use x->outer_mode->encap, not inner fix a build breakage on ppc (kbuild robot) Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08xfrm: remove input2 indirection from xfrm_modeFlorian Westphal1-13/+0
No external dependencies on any module, place this in the core. Increase is about 1800 byte for xfrm_input.o. The beet helpers get added to internal header, as they can be reused from xfrm_output.c in the next patch (kernel contains several copies of them in the xfrm{4,6}_mode_beet.c files). Before: text data bss dec filename 5578 176 2364 8118 net/xfrm/xfrm_input.o 1180 64 0 1244 net/ipv4/xfrm4_mode_beet.o 171 40 0 211 net/ipv4/xfrm4_mode_transport.o 1163 40 0 1203 net/ipv4/xfrm4_mode_tunnel.o 1083 52 0 1135 net/ipv6/xfrm6_mode_beet.o 172 40 0 212 net/ipv6/xfrm6_mode_ro.o 172 40 0 212 net/ipv6/xfrm6_mode_transport.o 1056 40 0 1096 net/ipv6/xfrm6_mode_tunnel.o After: text data bss dec filename 7373 200 2364 9937 net/xfrm/xfrm_input.o 587 44 0 631 net/ipv4/xfrm4_mode_beet.o 171 32 0 203 net/ipv4/xfrm4_mode_transport.o 649 32 0 681 net/ipv4/xfrm4_mode_tunnel.o 625 44 0 669 net/ipv6/xfrm6_mode_beet.o 172 32 0 204 net/ipv6/xfrm6_mode_ro.o 172 32 0 204 net/ipv6/xfrm6_mode_transport.o 599 32 0 631 net/ipv6/xfrm6_mode_tunnel.o v2: pass inner_mode to xfrm_inner_mode_encap_remove to fix AF_UNSPEC selector breakage (bisected by Benedict Wong) Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08xfrm: remove gso_segment indirection from xfrm_modeFlorian Westphal1-5/+0
These functions are small and we only have versions for tunnel and transport mode for ipv4 and ipv6 respectively. Just place the 'transport or tunnel' conditional in the protocol specific function instead of using an indirection. Before: 3226 12 0 3238 net/ipv4/esp4_offload.o 7004 492 0 7496 net/ipv4/ip_vti.o 3339 12 0 3351 net/ipv6/esp6_offload.o 11294 460 0 11754 net/ipv6/ip6_vti.o 1180 72 0 1252 net/ipv4/xfrm4_mode_beet.o 428 48 0 476 net/ipv4/xfrm4_mode_transport.o 1271 48 0 1319 net/ipv4/xfrm4_mode_tunnel.o 1083 60 0 1143 net/ipv6/xfrm6_mode_beet.o 172 48 0 220 net/ipv6/xfrm6_mode_ro.o 429 48 0 477 net/ipv6/xfrm6_mode_transport.o 1164 48 0 1212 net/ipv6/xfrm6_mode_tunnel.o 15730428 6937008 4046908 26714344 vmlinux After: 3461 12 0 3473 net/ipv4/esp4_offload.o 7000 492 0 7492 net/ipv4/ip_vti.o 3574 12 0 3586 net/ipv6/esp6_offload.o 11295 460 0 11755 net/ipv6/ip6_vti.o 1180 64 0 1244 net/ipv4/xfrm4_mode_beet.o 171 40 0 211 net/ipv4/xfrm4_mode_transport.o 1163 40 0 1203 net/ipv4/xfrm4_mode_tunnel.o 1083 52 0 1135 net/ipv6/xfrm6_mode_beet.o 172 40 0 212 net/ipv6/xfrm6_mode_ro.o 172 40 0 212 net/ipv6/xfrm6_mode_transport.o 1056 40 0 1096 net/ipv6/xfrm6_mode_tunnel.o 15730424 6937008 4046908 26714340 vmlinux Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08xfrm: remove xmit indirection from xfrm_modeFlorian Westphal1-5/+0
There are only two versions (tunnel and transport). The ip/ipv6 versions are only differ in sizeof(iphdr) vs ipv6hdr. Place this in the core and use x->outer_mode->encap type to call the correct adjustment helper. Before: text data bss dec filename 15730311 6937008 4046908 26714227 vmlinux After: 15730428 6937008 4046908 26714344 vmlinux (about 117 byte increase) v2: use family from x->outer_mode, not inner Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08xfrm: remove output indirection from xfrm_modeFlorian Westphal1-14/+5
Same is input indirection. Only exception: we need to export xfrm_outer_mode_output for pktgen. Increases size of vmlinux by about 163 byte: Before: text data bss dec filename 15730208 6936948 4046908 26714064 vmlinux After: 15730311 6937008 4046908 26714227 vmlinux xfrm_inner_extract_output has no more external callers, make it static. v2: add IS_ENABLED(IPV6) guard in xfrm6_prepare_output add two missing breaks in xfrm_outer_mode_output (Sabrina Dubroca) add WARN_ON_ONCE for 'call AF_INET6 related output function, but CONFIG_IPV6=n' case. make xfrm_inner_extract_output static Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08xfrm: remove input indirection from xfrm_modeFlorian Westphal1-11/+0
No need for any indirection or abstraction here, both functions are pretty much the same and quite small, they also have no external dependencies. xfrm_prepare_input can then be made static. With allmodconfig build, size increase of vmlinux is 25 byte: Before: text data bss dec filename 15730207 6936924 4046908 26714039 vmlinux After: 15730208 6936948 4046908 26714064 vmlinux v2: Fix INET_XFRM_MODE_TRANSPORT name in is-enabled test (Sabrina Dubroca) change copied comment to refer to transport and network header, not skb->{h,nh}, which don't exist anymore. (Sabrina) make xfrm_prepare_input static (Eyal Birger) Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08xfrm: place af number into xfrm_mode structFlorian Westphal1-4/+5
This will be useful to know if we're supposed to decode ipv4 or ipv6. While at it, make the unregister function return void, all module_exit functions did just BUG(); there is never a point in doing error checks if there is no way to handle such error. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-03-22genetlink: make policy common to familyJohannes Berg2-4/+4
Since maxattr is common, the policy can't really differ sanely, so make it common as well. The only user that did in fact manage to make a non-common policy is taskstats, which has to be really careful about it (since it's still using a common maxattr!). This is no longer supported, but we can fake it using pre_doit. This reduces the size of e.g. nl80211.o (which has lots of commands): text data bss dec hex filename 398745 14323 2240 415308 6564c net/wireless/nl80211.o (before) 397913 14331 2240 414484 65314 net/wireless/nl80211.o (after) -------------------------------- -832 +8 0 -824 Which is obviously just 8 bytes for each command, and an added 8 bytes for the new policy pointer. I'm not sure why the ops list is counted as .text though. Most of the code transformations were done using the following spatch: @ops@ identifier OPS; expression POLICY; @@ struct genl_ops OPS[] = { ..., { - .policy = POLICY, }, ... }; @@ identifier ops.OPS; expression ops.POLICY; identifier fam; expression M; @@ struct genl_family fam = { .ops = OPS, .maxattr = M, + .policy = POLICY, ... }; This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing the cb->data as ops, which we want to change in a later genl patch. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-22rhashtable: rename rht_for_each*continue as *from.NeilBrown1-20/+20
The pattern set by list.h is that for_each..continue() iterators start at the next entry after the given one, while for_each..from() iterators start at the given entry. The rht_for_each*continue() iterators are documented as though the start at the 'next' entry, but actually start at the given entry, and they are used expecting that behaviour. So fix the documentation and change the names to *from for consistency with list.h Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-22rhashtable: don't hold lock on first table throughout insertion.NeilBrown1-13/+0
rhashtable_try_insert() currently holds a lock on the bucket in the first table, while also locking buckets in subsequent tables. This is unnecessary and looks like a hold-over from some earlier version of the implementation. As insert and remove always lock a bucket in each table in turn, and as insert only inserts in the final table, there cannot be any races that are not covered by simply locking a bucket in each table in turn. When an insert call reaches that last table it can be sure that there is no matchinf entry in any other table as it has searched them all, and insertion never happens anywhere but in the last table. The fact that code tests for the existence of future_tbl while holding a lock on the relevant bucket ensures that two threads inserting the same key will make compatible decisions about which is the "last" table. This simplifies the code and allows the ->rehash field to be discarded. We still need a way to ensure that a dead bucket_table is never re-linked by rhashtable_walk_stop(). This can be achieved by calling call_rcu() inside the locked region, and checking with rcu_head_after_call_rcu() in rhashtable_walk_stop() to see if the bucket table is empty and dead. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: dst: remove gc leftoversJulian Wiedmann1-11/+0
Get rid of some obsolete gc-related documentation and macros that were missed in commit 5b7c9a8ff828 ("net: remove dst gc related code"). CC: Wei Wang <weiwan@google.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21ipv4: Allow amount of dirty memory from fib resizing to be controllableDavid Ahern1-0/+4
fib_trie implementation calls synchronize_rcu when a certain amount of pages are dirty from freed entries. The number of pages was determined experimentally in 2009 (commit c3059477fce2d). At the current setting, synchronize_rcu is called often -- 51 times in a second in one test with an average of an 8 msec delay adding a fib entry. The total impact is a lot of slow down modifying the fib. This is seen in the output of 'time' - the difference between real time and sys+user. For example, using 720,022 single path routes and 'ip -batch'[1]: $ time ./ip -batch ipv4/routes-1-hops real 0m14.214s user 0m2.513s sys 0m6.783s So roughly 35% of the actual time to install the routes is from the ip command getting scheduled out, most notably due to synchronize_rcu (this is observed using 'perf sched timehist'). This patch makes the amount of dirty memory configurable between 64k where the synchronize_rcu is called often (small, low end systems that are memory sensitive) to 64M where synchronize_rcu is called rarely during a large FIB change (for high end systems with lots of memory). The default is 512kB which corresponds to the current setting of 128 pages with a 4kB page size. As an example, at 16MB the worst interval shows 4 calls to synchronize_rcu in a second blocking for up to 30 msec in a single instance, and a total of almost 100 msec across the 4 calls in the second. The trade off is allowing FIB entries to consume more memory in a given time window but but with much better fib insertion rates (~30% increase in prefixes/sec). With this patch and net.ipv4.fib_sync_mem set to 16MB, the same batch file runs in: $ time ./ip -batch ipv4/routes-1-hops real 0m9.692s user 0m2.491s sys 0m6.769s So the dead time is reduced to about 1/2 second or <5% of the real time. [1] 'ip' modified to not request ACK messages which improves route insertion times by about 20% Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of tun deviceKirill Tkhai1-0/+1
In commit f2780d6d7475 "tun: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of tun device" it was missed that tun may change its net ns, while net ns of socket remains the same as it was created initially. SIOCGSKNS returns net ns of socket, so it is not suitable for obtaining net ns of device. We may have two tun devices with the same names in two net ns, and in this case it's not possible to determ, which of them fd refers to (TUNGETIFF will return the same name). This patch adds new ioctl() cmd for obtaining net ns of a device. Reported-by: Harald Albrecht <harald.albrecht@gmx.net> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21ipv6: Change addrconf_f6i_alloc to use ip6_route_info_createDavid Ahern1-1/+2
Change addrconf_f6i_alloc to generate a fib6_config and call ip6_route_info_create. addrconf_f6i_alloc is the last caller to fib6_info_alloc besides ip6_route_info_create, and there is no reason for it to do its own initialization on a fib6_info. Host routes need to be created even if the device is down, so add a new flag, fc_ignore_dev_down, to fib6_config and update fib6_nh_init to not error out if device is not up. Notes on the conversion: - ip_fib_metrics_init is the same as fib6_config has fc_mx set to NULL and fc_mx_len set to 0 - dst_nocount is handled by the RTF_ADDRCONF flag - dst_host is handled by fc_dst_len = 128 nh_gw does not get set after the conversion to ip6_route_info_create but it should not be set in addrconf_f6i_alloc since this is a host route not a gateway route. Everything else is a straight forward map between fib6_info and fib6_config. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21ipv6: Add icmp_echo_ignore_anycast for ICMPv6Stephen Suryaputra1-0/+1
In addition to icmp_echo_ignore_multicast, there is a need to also prevent responding to pings to anycast addresses for security. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20net: remove 'fallback' argument from dev->ndo_select_queue()Paolo Abeni1-8/+4
After the previous patch, all the callers of ndo_select_queue() provide as a 'fallback' argument netdev_pick_tx. The only exceptions are nested calls to ndo_select_queue(), which pass down the 'fallback' available in the current scope - still netdev_pick_tx. We can drop such argument and replace fallback() invocation with netdev_pick_tx(). This avoids an indirect call per xmit packet in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen) with device drivers implementing such ndo. It also clean the code a bit. Tested with ixgbe and CONFIG_FCOE=m With pktgen using queue xmit: threads vanilla patched (kpps) (kpps) 1 2334 2428 2 4166 4278 4 7895 8100 v1 -> v2: - rebased after helper's name change Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20packet: rework packet_pick_tx_queue() to use common code selectionPaolo Abeni1-0/+2
Currently packet_pick_tx_queue() is the only caller of ndo_select_queue() using a fallback argument other than netdev_pick_tx. Leveraging rx queue, we can obtain a similar queue selection behavior using core helpers. After this change, ndo_select_queue() is always invoked with netdev_pick_tx() as fallback. We can change ndo_select_queue() signature in a followup patch, dropping an indirect call per transmitted packet in some scenarios (e.g. TCP syn and XDP generic xmit) This changes slightly how af packet queue selection happens when PACKET_QDISC_BYPASS is set. It's now more similar to plan dev_queue_xmit() tacking in account both XPS and TC mapping. v1 -> v2: - rebased after helper name change RFC -> v1: - initialize sender_cpu to the expected value Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20net: dev: rename queue selection helpers.Paolo Abeni1-3/+3
With the following patches, we are going to use __netdev_pick_tx() in many modules. Rename it to netdev_pick_tx(), to make it clear is a public API. Also rename the existing netdev_pick_tx() to netdev_core_pick_tx(), to avoid name clashes. Suggested-by: Eric Dumazet <edumazet@google.com> Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20net/tls: Add support of AES128-CCM based ciphersVakul Garg2-2/+28
Added support for AES128-CCM based record encryption. AES128-CCM is similar to AES128-GCM. Both of them have same salt/iv/mac size. The notable difference between the two is that while invoking AES128-CCM operation, the salt||nonce (which is passed as IV) has to be prefixed with a hardcoded value '2'. Further, CCM implementation in kernel requires IV passed in crypto_aead_request() to be full '16' bytes. Therefore, the record structure 'struct tls_rec' has been modified to reserve '16' bytes for IV. This works for both GCM and CCM based cipher. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20ipv6: Add icmp_echo_ignore_multicast support for ICMPv6Stephen Suryaputra1-0/+1
IPv4 has icmp_echo_ignore_broadcast to prevent responding to broadcast pings. IPv6 needs a similar mechanism. v1->v2: - Remove NET_IPV6_ICMP_ECHO_IGNORE_MULTICAST. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20tcp: free request sock directly upon TFO or syncookies errorGuillaume Nault1-3/+7
Since the request socket is created locally, it'd make more sense to use reqsk_free() instead of reqsk_put() in TFO and syncookies' error path. However, tcp_get_cookie_sock() may set ->rsk_refcnt before freeing the socket; tcp_conn_request() may also have non-null ->rsk_refcnt because of tcp_try_fastopen(). In both cases 'req' hasn't been exposed to the outside world and is safe to free immediately, but that'd trigger the WARN_ON_ONCE in reqsk_free(). Define __reqsk_free() for these situations where we know nobody's referencing the socket, even though ->rsk_refcnt might be non-null. Now we can consolidate the error path of tcp_get_cookie_sock() and tcp_conn_request(). Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-19tipc: support broadcast/replicast configurable for bc-linkHoang Le1-0/+2
Currently, a multicast stream uses either broadcast or replicast as transmission method, based on the ratio between number of actual destinations nodes and cluster size. However, when an L2 interface (e.g., VXLAN) provides pseudo broadcast support, this becomes very inefficient, as it blindly replicates multicast packets to all cluster/subnet nodes, irrespective of whether they host actual target sockets or not. The TIPC multicast algorithm is able to distinguish real destination nodes from other nodes, and hence provides a smarter and more efficient method for transferring multicast messages than pseudo broadcast can do. Because of this, we now make it possible for users to force the broadcast link to permanently switch to using replicast, irrespective of which capabilities the bearer provides, or pretend to provide. Conversely, we also make it possible to force the broadcast link to always use true broadcast. While maybe less useful in deployed systems, this may at least be useful for testing the broadcast algorithm in small clusters. We retain the current AUTOSELECT ability, i.e., to let the broadcast link automatically select which algorithm to use, and to switch back and forth between broadcast and replicast as the ratio between destination node number and cluster size changes. This remains the default method. Furthermore, we make it possible to configure the threshold ratio for such switches. The default ratio is now set to 10%, down from 25% in the earlier implementation. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-19sctp: get sctphdr by offset in sctp_compute_cksumXin Long1-1/+1
sctp_hdr(skb) only works when skb->transport_header is set properly. But in Netfilter, skb->transport_header for ipv6 is not guaranteed to be right value for sctphdr. It would cause to fail to check the checksum for sctp packets. So fix it by using offset, which is always right in all places. v1->v2: - Fix the changelog. Fixes: e6d8b64b34aa ("net: sctp: fix and consolidate SCTP checksumming code") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-19packets: Always register packet sk in the same orderMaxime Chevallier1-0/+6
When using fanouts with AF_PACKET, the demux functions such as fanout_demux_cpu will return an index in the fanout socket array, which corresponds to the selected socket. The ordering of this array depends on the order the sockets were added to a given fanout group, so for FANOUT_CPU this means sockets are bound to cpus in the order they are configured, which is OK. However, when stopping then restarting the interface these sockets are bound to, the sockets are reassigned to the fanout group in the reverse order, due to the fact that they were inserted at the head of the interface's AF_PACKET socket list. This means that traffic that was directed to the first socket in the fanout group is now directed to the last one after an interface restart. In the case of FANOUT_CPU, traffic from CPU0 will be directed to the socket that used to receive traffic from the last CPU after an interface restart. This commit introduces a helper to add a socket at the tail of a list, then uses it to register AF_PACKET sockets. Note that this changes the order in which sockets are listed in /proc and with sock_diag. Fixes: dc99f600698d ("packet: Add fanout support") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller4-63/+167
Daniel Borkmann says: ==================== pull-request: bpf 2019-03-16 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix a umem memory leak on cleanup in AF_XDP, from Björn. 2) Fix BTF to properly resolve forward-declared enums into their corresponding full enum definition types during deduplication, from Andrii. 3) Fix libbpf to reject invalid flags in xsk_socket__create(), from Magnus. 4) Fix accessing invalid pointer returned from bpf_tcp_sock() and bpf_sk_fullsock() after bpf_sk_release() was called, from Martin. 5) Fix generation of load/store DW instructions in PPC JIT, from Naveen. 6) Various fixes in BPF helper function documentation in bpf.h UAPI header used to bpf-helpers(7) man page, from Quentin. 7) Fix segfault in BPF test_progs when prog loading failed, from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-16xsk: fix umem memory leak on cleanupBjörn Töpel1-1/+0
When the umem is cleaned up, the task that created it might already be gone. If the task was gone, the xdp_umem_release function did not free the pages member of struct xdp_umem. It turned out that the task lookup was not needed at all; The code was a left-over when we moved from task accounting to user accounting [1]. This patch fixes the memory leak by removing the task lookup logic completely. [1] https://lore.kernel.org/netdev/20180131135356.19134-3-bjorn.topel@gmail.com/ Link: https://lore.kernel.org/netdev/c1cb2ca8-6a14-3980-8672-f3de0bb38dfd@suse.cz/ Fixes: c0c77d8fb787 ("xsk: add user memory registration support sockopt") Reported-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-16net: add documentation to socket.cPedro Tammela2-6/+12
Adds missing sphinx documentation to the socket.c's functions. Also fixes some whitespaces. I also changed the style of older documentation as an effort to have an uniform documentation style. Signed-off-by: Pedro Tammela <pctammela@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-15appletalk: Fix potential NULL pointer dereference in unregister_snap_clientYueHaibing1-1/+1
register_snap_client may return NULL, all the callers check it, but only print a warning. This will result in NULL pointer dereference in unregister_snap_client and other places. It has always been used like this since v2.6 Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-2/+2
Merge misc patches from Andrew Morton: - a little bit more MM - a few fixups [ The "little bit more MM" is actually just one of the three patches Andrew sent for mm/filemap.c, I'm still mulling over two more of them from Josef Bacik - Linus ] * emailed patches from Andrew Morton <akpm@linux-foundation.org>: include/linux/swap.h: use offsetof() instead of custom __swapoffset macro tools/testing/selftests/proc/proc-pid-vm.c: test with vsyscall in mind zram: default to lzo-rle instead of lzo filemap: pass vm_fault to the mmap ra helpers
2019-03-15include/linux/swap.h: use offsetof() instead of custom __swapoffset macroPi-Hsun Shih1-2/+2
Use offsetof() to calculate offset of a field to take advantage of compiler built-in version when possible, and avoid UBSAN warning when compiling with Clang: UBSAN: Undefined behaviour in mm/swapfile.c:3010:38 member access within null pointer of type 'union swap_header' CPU: 6 PID: 1833 Comm: swapon Tainted: G S 4.19.23 #43 Call trace: dump_backtrace+0x0/0x194 show_stack+0x20/0x2c __dump_stack+0x20/0x28 dump_stack+0x70/0x94 ubsan_epilogue+0x14/0x44 ubsan_type_mismatch_common+0xf4/0xfc __ubsan_handle_type_mismatch_v1+0x34/0x54 __se_sys_swapon+0x654/0x1084 __arm64_sys_swapon+0x1c/0x24 el0_svc_common+0xa8/0x150 el0_svc_compat_handler+0x2c/0x38 el0_svc_compat+0x8/0x18 Link: http://lkml.kernel.org/r/20190312081902.223764-1-pihsun@chromium.org Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-15bpf: add documentation for helpers bpf_spin_lock(), bpf_spin_unlock()Quentin Monnet1-0/+55
Add documentation for the BPF spinlock-related helpers to the doc in bpf.h. I added the constraints and restrictions coming with the use of spinlocks for BPF: not all of it is directly related to the use of the helper, but I thought it would be nice for users to find them in the man page. This list of restrictions is nearly a verbatim copy of the list in Alexei's commit log for those helpers. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-15bpf: fix documentation for eBPF helpersQuentin Monnet1-63/+65
Another round of minor fixes for the documentation of the BPF helpers located in the UAPI bpf.h header file. Changes include: - Moving around description of some helpers, to keep the descriptions in the same order as helpers are declared (bpf_map_push_elem(), leftover from commit 90b1023f68c7 ("bpf: fix documentation for eBPF helpers"), bpf_rc_keydown(), and bpf_skb_ancestor_cgroup_id()). - Fixing typos ("contex" -> "context"). - Harmonising return types ("void* " -> "void *", "uint64_t" -> "u64"). - Addition of the "bpf_" prefix to bpf_get_storage(). - Light additions of RST markup on some keywords. - Empty line deletion between description and return value for bpf_tcp_sock(). - Edit for the description for bpf_skb_ecn_set_ce() (capital letters, acronym expansion, no effect if ECT not set, more details on return value). Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-14Merge tag 'pm-5.1-rc1-2' of ↵Linus Torvalds2-10/+0
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These are mostly fixes and cleanups on top of the previously merged power management material for 5.1-rc1 with one cpupower utility update that wasn't pushed earlier due to unfortunate timing. Specifics: - Fix registration of new cpuidle governors partially broken during the 5.0 development cycle by mistake (Rafael Wysocki). - Avoid integer overflows in the menu cpuidle governor by making it discard the overflowing data points upfront (Rafael Wysocki). - Fix minor mistake in the recent update of the iowait boost computation in the intel_pstate driver (Rafael Wysocki). - Drop incorrect __init annotation from one function in the pxa2xx cpufreq driver (Arnd Bergmann). - Fix the operating performance points (OPP) framework initialization for devices in multiple power domains if only one of them is scalable (Rajendra Nayak). - Fix mistake in dev_pm_opp_set_rate() which causes it to skip updating the performance state if the new frequency is the same as the old one (Viresh Kumar). - Rework the cancellation of wakeup source timers to avoid potential issues with it and do some cleanups unlocked by that change (Viresh Kumar, Rafael Wysocki). - Clean up the code computing the active/suspended time of devices in the PM-runtime framework after recent changes (Ulf Hansson). - Make the power management infrastructure code use pr_fmt() consistently (Joe Perches). - Clean up the generic power domains (genpd) framework somewhat (Aisheng Dong). - Improve kerneldoc comments for two functions in the cpufreq core (Rafael Wysocki). - Fix typo in a PM QoS file description comment (Aisheng Dong). - Update the handling of CPU boost frequencies in the cpupower utility (Abhishek Goel)" * tag 'pm-5.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpuidle: governor: Add new governors to cpuidle_governors again cpufreq: intel_pstate: Fix up iowait_boost computation PM / OPP: Update performance state when freq == old_freq PM / wakeup: Drop wakeup_source_drop() PM / wakeup: Rework wakeup source timer cancellation PM / domains: Remove one unnecessary blank line PM / Domains: Return early for all errors in _genpd_power_off() PM / Domains: Improve warn for multiple states but no governor OPP: Fix handling of multiple power domains PM / QoS: Fix typo in file description cpufreq: pxa2xx: remove incorrect __init annotation PM-runtime: Call pm_runtime_active|suspended_time() from sysfs PM-runtime: Consolidate code to get active/suspended time PM: Add and use pr_fmt() cpufreq: Improve kerneldoc comments for cpufreq_cpu_get/put() cpuidle: menu: Avoid overflows when computing variance tools/power/cpupower: Display boost frequency separately
2019-03-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2-5/+9
Pull networking fixes from David Miller: "More fixes in the queue: 1) Netfilter nat can erroneously register the device notifier twice, fix from Florian Westphal. 2) Use after free in nf_tables, from Pablo Neira Ayuso. 3) Parallel update of steering rule fix in mlx5 river, from Eli Britstein. 4) RX processing panic in lan743x, fix from Bryan Whitehead. 5) Use before initialization of TCP_SKB_CB, fix from Christoph Paasch. 6) Fix locking in SRIOV mode of mlx4 driver, from Jack Morgenstein. 7) Fix TX stalls in lan743x due to mishandling of interrupt ACKing modes, from Bryan Whitehead. 8) Fix infoleak in l2tp_ip6_recvmsg(), from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits) pptp: dst_release sk_dst_cache in pptp_sock_destruct MAINTAINERS: GENET & SYSTEMPORT: Add internal Broadcom list l2tp: fix infoleak in l2tp_ip6_recvmsg() net/tls: Inform user space about send buffer availability net_sched: return correct value for *notify* functions lan743x: Fix TX Stall Issue net/mlx4_core: Fix qp mtt size calculation net/mlx4_core: Fix locking in SRIOV mode when switching between events and polling net/mlx4_core: Fix reset flow when in command polling mode mlxsw: minimal: Initialize base_mac mlxsw: core: Prevent duplication during QSFP module initialization net: dwmac-sun8i: fix a missing check of of_get_phy_mode net: sh_eth: fix a missing check of of_get_phy_mode net: 8390: fix potential NULL pointer dereferences net: fujitsu: fix a potential NULL pointer dereference net: qlogic: fix a potential NULL pointer dereference isdn: hfcpci: fix potential NULL pointer dereference Documentation: devicetree: add a new optional property for port mac address net: rocker: fix a potential NULL pointer dereference net: qlge: fix a potential NULL pointer dereference ...
2019-03-14Merge tag 'dmaengine-5.1-rc1' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds4-15/+68
Pull dmaengine updates from Vinod Koul: - dmatest updates for modularizing common struct and code - remove SG support for VDMA xilinx IP and updates to driver - Update to dw driver to support Intel iDMA controllers multi-block support - tegra updates for proper reporting of residue - Add Snow Ridge ioatdma device id and support for IOATDMA v3.4 - struct_size() usage and useless LIST_HEAD cleanups in subsystem. - qDMA controller driver for Layerscape SoCs - stm32-dma PM Runtime support - And usual updates to imx-sdma, sprd, Documentation, fsl-edma, bcm2835, qcom_hidma etc * tag 'dmaengine-5.1-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (81 commits) dmaengine: imx-sdma: fix consistent dma test failures dmaengine: imx-sdma: add a test for imx8mq multi sdma devices dmaengine: imx-sdma: add clock ratio 1:1 check dmaengine: dmatest: move test data alloc & free into functions dmaengine: dmatest: add short-hand `buf_size` var in dmatest_func() dmaengine: dmatest: wrap src & dst data into a struct dmaengine: ioatdma: support latency tolerance report (LTR) for v3.4 dmaengine: ioatdma: add descriptor pre-fetch support for v3.4 dmaengine: ioatdma: disable DCA enabling on IOATDMA v3.4 dmaengine: ioatdma: Add Snow Ridge ioatdma device id dmaengine: sprd: Change channel id to slave id for DMA cell specifier dt-bindings: dmaengine: sprd: Change channel id to slave id for DMA cell specifier dmaengine: mv_xor: Use correct device for DMA API Documentation :dmaengine: clarify DMA desc. pointer after submission Documentation: dmaengine: fix dmatest.rst warning dmaengine: k3dma: Add support for dma-channel-mask dmaengine: k3dma: Delete axi_config dmaengine: k3dma: Upgrade k3dma driver to support hisi_asp_dma hardware Documentation: bindings: dma: Add binding for dma-channel-mask Documentation: bindings: k3dma: Extend the k3dma driver binding to support hisi-asp ...
2019-03-14Merge tag 'rproc-v5.1' of git://github.com/andersson/remoteprocLinus Torvalds1-4/+4
Pull remoteproc updates from Bjorn Andersson: "This contains the last patches in Loic's remoteproc resource table handling changes, a number of updates to documentation, support for invoking the crash handler (for testing purposes), a fix for the handling of virtio devices during recovery, performance state votes in Qualcomm modem driver, support for specifying board specific firmware path for Qualcomm modem driver and improved support for graceful shutdown of Qualcomm remoteprocs" * tag 'rproc-v5.1' of git://github.com/andersson/remoteproc: (33 commits) remoteproc: fix for "dma-mapping: remove the DMA_MEMORY_EXCLUSIVE flag" remoteproc: fix rproc_check_carveout_da() returned error and comments remoteproc: fix trace buffer va initialization remoteproc: fix rproc_alloc_carveout() for rproc with iommu domain remoteproc: add warning on resource table cast remoteproc: fix rproc_alloc_carveout() bad variable cast remoteproc: fix rproc_da_to_va in case of unallocated carveout remoteproc: correct rproc_mem_entry_init() comments remoteproc: fix recovery procedure rpmsg: virtio: change header file sort style rpmsg: virtio: allocate buffer from parent remoteproc: st: add reserved memory support remoteproc: create vdev subdevice with specific dma memory pool remoteproc: q6v5_adsp: Remove voting for lpass_aon clock dt-binding: remoteproc: Remove lpass_aon clock from adsp pil clock list remoteproc: q6v5-mss: Active powerdomain for SDM845 remoteproc: q6v5-mss: Vote for rpmh power domains remoteproc: qcom: Add support for parsing fw dt bindings remoteproc: qcom_q6v5: don't auto boot remote processor remoteproc: qcom: Wait for shutdown-ack/ind on sysmon shutdown ...
2019-03-14Merge tag 'clk-for-linus' of ↵Linus Torvalds24-111/+706
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk subsystem updates from Stephen Boyd: "We have a fairly balanced mix of clk driver updates and clk framework updates this time around. It's the usual pile of new drivers for new hardware out there and the normal small fixes and updates, but then we have some core framework changes too. In the core framework, we introduce support for a clk_get_optional() API to get clks that may not always be populated and a way to devm manage clkdev lookups registered by provider drivers. We also do some refactoring to simplify the interface between clkdev and the common clk framework so we can reuse the DT parsing and clk_get() path in provider drivers in the future. This work will continue in the next few cycles while we convert how providers specify clk parents. On the driver side, the biggest part of the dirstat is the Amlogic clk driver that got support for the G12A SoC. It dominates with almost half the overall diff, while the second largest part of the diff is in the i.MX clk driver that gained support for imx8mm SoCs. After that, we have the Actions Semiconductor and Qualcomm drivers rounding out the big part of the dirstat because they both got new hardware support for SoCs. The rest is just various updates and non-critical fixes for existing drivers. Core: - Convert a few clk bindings to JSON schema format - Add a {devm_}clk_get_optional() API - Add devm_clk_hw_register_clkdev() API to manage clkdev lookups - Start rewriting clk parent registration and supporting device links by moving around code that supports clk_get() and DT parsing of the 'clocks' property New Drivers: - Add Qualcomm MSM8998 RPM managed clks - IPA clk support on Qualcomm RPMh clk controllers - Actions Semi S500 SoC clk support - Support for fixed rate clks populated from an MMIO register - Add RPC (QSPI/HyperFLASH) clocks on Renesas R-Car V3H - Add TMU (timer) clocks on Renesas RZ/G2E - Add Amlogic G12A Always-On Clock Controller - Add 32k clock generation for Amlogic AXG - Add support for the Mali GPU clocks on Amlogic Meson8 - Add Amlogic G12A EE clock controller driver - Add missing CANFD clocks on Renesas RZ/G2M and RZ/G2E - Add i.MX8MM SoC clk driver support Removed Drivers: - Remove clps711x driver as the board support is gone Updates: - 3rd ECO fix for Mediatek MT2712 SoCs - Updates for Qualcomm MSM8998 GCC clks - Random static analysis fixes for clk drivers - Support for sleeping gpios in the clk-gpio type - Minor fixes for STM32MP1 clk driver (parents, critical flag, etc.) - Split LCDC into two clks on the Marvell MMP2 SoC - Various DT of_node refcount fixes - Get rid of CLK_IS_BASIC from TI code (yay!) - TI Autoidle clk support - Fix Amlogic Meson8 APB clock ID name - Claim input clocks through DT for Amlogic AXG and GXBB - Correct the DU (display unit) parent clock on Renesas RZ/G2E - Exynos5433 IMEM CMU crypto clk support (SlimSS) - Fix for the PLL-MIPI on the Allwinner A23 - Fix Rockchip rk3328 PLL rate calculation - Add SET_RATE_PARENT flag on display clk of Rockhip rk3066 - i.MX SCU clk driver clk_set_parent() and cpufreq support" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (150 commits) dt-bindings: clock: imx8mq: Fix numbering overlaps and gaps clk: ti: clkctrl: Fix clkdm_name regression for TI_CLK_CLKCTRL_COMPAT clk: fixup default index for of_clk_get_by_name() clk: Move of_clk_*() APIs into clk.c from clkdev.c clk: Inform the core about consumer devices clk: Introduce of_clk_get_hw_from_clkspec() clk: core: clarify the check for runtime PM clk: Combine __clk_get() and __clk_create_clk() clk: imx8mq: add GPIO clocks to clock tree clk: mediatek: correct cpu clock name for MT8173 SoC clk: imx: Refactor entire sccg pll clk clk: imx: scu: add cpu frequency scaling support clk: mediatek: Mark bus and DRAM related clocks as critical clk: mediatek: Add flags to mtk_gate clk: mediatek: Add MUX_FLAGS macro clk: qcom: gcc-sdm845: Define parent of PCIe PIPE clocks clk: ingenic: Remove set but not used variable 'enable' clk: at91: programmable: remove unneeded register read clk: mediatek: using CLK_MUX_ROUND_CLOSEST for the clock of dpi1_sel clk: mediatek: add MUX_GATE_FLAGS_2 ...
2019-03-14Merge branches 'pm-core', 'pm-sleep' and 'pm-qos'Rafael J. Wysocki2-10/+0
* pm-core: PM-runtime: Call pm_runtime_active|suspended_time() from sysfs PM-runtime: Consolidate code to get active/suspended time * pm-sleep: PM / wakeup: Drop wakeup_source_drop() PM / wakeup: Rework wakeup source timer cancellation * pm-qos: PM / QoS: Fix typo in file description
2019-03-13bpf: Add bpf_get_listener_sock(struct bpf_sock *sk) helperMartin KaFai Lau1-1/+10
Add a new helper "struct bpf_sock *bpf_get_listener_sock(struct bpf_sock *sk)" which returns a bpf_sock in TCP_LISTEN state. It will trace back to the listener sk from a request_sock if possible. It returns NULL for all other cases. No reference is taken because the helper ensures the sk is in SOCK_RCU_FREE (where the TCP_LISTEN sock should be in). Hence, bpf_sk_release() is unnecessary and the verifier does not allow bpf_sk_release(listen_sk) to be called either. The following is also allowed because the bpf_prog is run under rcu_read_lock(): sk = bpf_sk_lookup_tcp(); /* if (!sk) { ... } */ listen_sk = bpf_get_listener_sock(sk); /* if (!listen_sk) { ... } */ bpf_sk_release(sk); src_port = listen_sk->src_port; /* Allowed */ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-13bpf: Fix bpf_tcp_sock and bpf_sk_fullsock issue related to bpf_sk_releaseMartin KaFai Lau2-1/+40
Lorenz Bauer [thanks!] reported that a ptr returned by bpf_tcp_sock(sk) can still be accessed after bpf_sk_release(sk). Both bpf_tcp_sock() and bpf_sk_fullsock() have the same issue. This patch addresses them together. A simple reproducer looks like this: sk = bpf_sk_lookup_tcp(); /* if (!sk) ... */ tp = bpf_tcp_sock(sk); /* if (!tp) ... */ bpf_sk_release(sk); snd_cwnd = tp->snd_cwnd; /* oops! The verifier does not complain. */ The problem is the verifier did not scrub the register's states of the tcp_sock ptr (tp) after bpf_sk_release(sk). [ Note that when calling bpf_tcp_sock(sk), the sk is not always refcount-acquired. e.g. bpf_tcp_sock(skb->sk). The verifier works fine for this case. ] Currently, the verifier does not track if a helper's return ptr (in REG_0) is "carry"-ing one of its argument's refcount status. To carry this info, the reg1->id needs to be stored in reg0. One approach was tried, like "reg0->id = reg1->id", when calling "bpf_tcp_sock()". The main idea was to avoid adding another "ref_obj_id" for the same reg. However, overlapping the NULL marking and ref tracking purpose in one "id" does not work well: ref_sk = bpf_sk_lookup_tcp(); fullsock = bpf_sk_fullsock(ref_sk); tp = bpf_tcp_sock(ref_sk); if (!fullsock) { bpf_sk_release(ref_sk); return 0; } /* fullsock_reg->id is marked for NOT-NULL. * Same for tp_reg->id because they have the same id. */ /* oops. verifier did not complain about the missing !tp check */ snd_cwnd = tp->snd_cwnd; Hence, a new "ref_obj_id" is needed in "struct bpf_reg_state". With a new ref_obj_id, when bpf_sk_release(sk) is called, the verifier can scrub all reg states which has a ref_obj_id match. It is done with the changes in release_reg_references() in this patch. While fixing it, sk_to_full_sk() is removed from bpf_tcp_sock() and bpf_sk_fullsock() to avoid these helpers from returning another ptr. It will make bpf_sk_release(tp) possible: sk = bpf_sk_lookup_tcp(); /* if (!sk) ... */ tp = bpf_tcp_sock(sk); /* if (!tp) ... */ bpf_sk_release(tp); A separate helper "bpf_get_listener_sock()" will be added in a later patch to do sk_to_full_sk(). Misc change notes: - To allow bpf_sk_release(tp), the arg of bpf_sk_release() is changed from ARG_PTR_TO_SOCKET to ARG_PTR_TO_SOCK_COMMON. ARG_PTR_TO_SOCKET is removed from bpf.h since no helper is using it. - arg_type_is_refcounted() is renamed to arg_type_may_be_refcounted() because ARG_PTR_TO_SOCK_COMMON is the only one and skb->sk is not refcounted. All bpf_sk_release(), bpf_sk_fullsock() and bpf_tcp_sock() take ARG_PTR_TO_SOCK_COMMON. - check_refcount_ok() ensures is_acquire_function() cannot take arg_type_may_be_refcounted() as its argument. - The check_func_arg() can only allow one refcount-ed arg. It is guaranteed by check_refcount_ok() which ensures at most one arg can be refcounted. Hence, it is a verifier internal error if >1 refcount arg found in check_func_arg(). - In release_reference(), release_reference_state() is called first to ensure a match on "reg->ref_obj_id" can be found before scrubbing the reg states with release_reg_references(). - reg_is_refcounted() is no longer needed. 1. In mark_ptr_or_null_regs(), its usage is replaced by "ref_obj_id && ref_obj_id == id" because, when is_null == true, release_reference_state() should only be called on the ref_obj_id obtained by a acquire helper (i.e. is_acquire_function() == true). Otherwise, the following would happen: sk = bpf_sk_lookup_tcp(); /* if (!sk) { ... } */ fullsock = bpf_sk_fullsock(sk); if (!fullsock) { /* * release_reference_state(fullsock_reg->ref_obj_id) * where fullsock_reg->ref_obj_id == sk_reg->ref_obj_id. * * Hence, the following bpf_sk_release(sk) will fail * because the ref state has already been released in the * earlier release_reference_state(fullsock_reg->ref_obj_id). */ bpf_sk_release(sk); } 2. In release_reg_references(), the current reg_is_refcounted() call is unnecessary because the id check is enough. - The type_is_refcounted() and type_is_refcounted_or_null() are no longer needed also because reg_is_refcounted() is removed. Fixes: 655a51e536c0 ("bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock") Reported-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-13Merge tag 'pwm/for-5.1-rc1' of ↵Linus Torvalds1-19/+18
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "The changes for this cycle are across the board. The bulk of it is cleanups, but there's also new device support in some drivers as well as more conversions to the atomic API" * tag 'pwm/for-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (24 commits) pwm: atmel: Remove useless symbolic definitions pwm: bcm-kona: Update macros to remove braces around numbers pwm: imx27: Only enable the clocks once in .get_state() pwm: rcar: Improve calculation of divider pwm: rcar: Remove legacy APIs pwm: rcar: Use "atomic" API on rcar_pwm_resume() pwm: rcar: Add support "atomic" API pwm: atmel: Add support for SAM9X60's PWM controller pwm: atmel: Add PWM binding for SAM9X60 pwm: atmel: Rename objects of type atmel_pwm_data pwm: atmel: Add support for controllers with 32 bit counters pwm: atmel: Add struct atmel_pwm_data pwm: Add MediaTek MT8183 display PWM driver support pwm: hibvt: Add hi3559v100 support dt-bindings: pwm: hibvt: Add hi3559v100 support pwm: hibvt: Use individual struct per of-data pwm: imx: Signedness bug in imx_pwm_get_state() pwm: imx: Split into two drivers pwm: imx: Don't print an error on -EPROBE_DEFER pwm: imx: Set driver data earlier simplifying the end of ->probe() ...
2019-03-13Merge tag 'mailbox-v5.1' of ↵Linus Torvalds1-0/+20
git://git.linaro.org/landing-teams/working/fujitsu/integration Pull mailbox updates from Jassi Brar: - mailbox-test: support multiple controller instances - misc cleanup: IMX, STM32 and Tegra - new driver: ZynqMP IPI * tag 'mailbox-v5.1' of git://git.linaro.org/landing-teams/working/fujitsu/integration: mailbox: imx: keep MU irq working during suspend/resume dt-bindings: mailbox: Add Xilinx IPI Mailbox mailbox: ZynqMP IPI mailbox controller mailbox: stm32-ipcc: remove useless device_init_wakeup call mailbox: stm32-ipcc: do not enable wakeup source by default mailbox: mailbox-test: fix null pointer if no mmio mailbox: mailbox-test: fix debugfs in multi-instances mailbox: tegra-hsp: mark suspend function as __maybe_unused
2019-03-13Merge tag 'libnvdimm-for-5.1' of ↵Linus Torvalds2-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "The bulk of this has been in -next since before the merge window opened, with no known collisions / issues reported. The only detail worth noting, outside the summary below, is that the "libnvdimm-start-pad" topic has been truncated to just cleanups and small fixes. The full topic branch would have doubled down on hacks around the "section alignment" limitation of the core-mm, instead effort is now being spent to address that root issue in the memory hotplug implementation for v5.2. - Fix nfit-bus command submission regression - Support retrieval of short-ARS results if the ARS state is "requires continuation", and even if the "no_init_ars" module parameter is specified - Allow busy-polling of the kernel ARS state by allowing root to reset the exponential back-off timer - Filter potentially stale ARS results by tracking query-ARS relative to the previous start-ARS - Enhance dax_device alignment checks - Add support for the Hyper-V family of device-specific-methods (DSMs) - Add several fixes and workarounds for Hyper-V compatibility - Fix support to cache the dirty-shutdown-count at init" * tag 'libnvdimm-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (25 commits) libnvdimm/namespace: Clean up holder_class_store() libnvdimm/of_pmem: Fix platform_no_drv_owner.cocci warnings acpi/nfit: Update NFIT flags error message libnvdimm/btt: Fix LBA masking during 'free list' population libnvdimm/btt: Remove unnecessary code in btt_freelist_init libnvdimm/pfn: Remove dax_label_reserve dax: Check the end of the block-device capacity with dax_direct_access() nfit/ars: Avoid stale ARS results nfit/ars: Allow root to busy-poll the ARS state machine nfit/ars: Introduce scrub_flags nfit/ars: Remove ars_start_flags nfit/ars: Attempt short-ARS even in the no_init_ars case nfit/ars: Attempt a short-ARS whenever the ARS state is idle at boot acpi/nfit: Require opt-in for read-only label configurations libnvdimm/pmem: Honor force_raw for legacy pmem regions libnvdimm/pfn: Account for PAGE_SIZE > info-block-size in nd_pfn_init() libnvdimm: Fix altmap reservation size calculation libnvdimm, pfn: Fix over-trim in trim_pfn_device() acpi/nfit: Fix bus command validation libnvdimm/dimm: Add a no-BLK quirk based on NVDIMM family ...
2019-03-13Merge tag 'upstream-5.1-rc1' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBI and UBIFS updates from Richard Weinberger: - A new interface for UBI to deal better with read disturb - Reject unsupported ioctl flags in UBIFS (xfstests found it) * tag 'upstream-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubi: wl: Silence uninitialized variable warning ubifs: Reject unsupported ioctl flags explicitly ubi: Expose the bitrot interface ubi: Introduce in_pq()
2019-03-13Merge tag 'ceph-for-5.1-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds1-0/+1
Pull ceph updates from Ilya Dryomov: "The highlights are: - rbd will now ignore discards that aren't aligned and big enough to actually free up some space (myself). This is controlled by the new alloc_size map option and can be disabled if needed. - support for rbd deep-flatten feature (myself). Deep-flatten allows "rbd flatten" to fully disconnect the clone image and its snapshots from the parent and make the parent snapshot removable. - a new round of cap handling improvements (Zheng Yan). The kernel client should now be much more prompt about releasing its caps and it is possible to put a limit on the number of caps held. - support for getting ceph.dir.pin extended attribute (Zheng Yan)" * tag 'ceph-for-5.1-rc1' of git://github.com/ceph/ceph-client: (26 commits) Documentation: modern versions of ceph are not backed by btrfs rbd: advertise support for RBD_FEATURE_DEEP_FLATTEN rbd: whole-object write and zeroout should copyup when snapshots exist rbd: copyup with an empty snapshot context (aka deep-copyup) rbd: introduce rbd_obj_issue_copyup_ops() rbd: stop copying num_osd_ops in rbd_obj_issue_copyup() rbd: factor out __rbd_osd_req_create() rbd: clear ->xferred on error from rbd_obj_issue_copyup() rbd: remove experimental designation from kernel layering ceph: add mount option to limit caps count ceph: periodically trim stale dentries ceph: delete stale dentry when last reference is dropped ceph: remove dentry_lru file from debugfs ceph: touch existing cap when handling reply ceph: pass inclusive lend parameter to filemap_write_and_wait_range() rbd: round off and ignore discards that are too small rbd: handle DISCARD and WRITE_ZEROES separately rbd: get rid of obj_req->obj_request_count libceph: use struct_size() for kmalloc() in crush_decode() ceph: send cap releases more aggressively ...
2019-03-13Merge tag 'nfs-for-5.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds14-40/+872
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - Fixes for NFS I/O request leakages - Fix error handling paths in the NFS I/O recoalescing code - Reinitialise NFSv4.1 sequence results before retransmitting a request - Fix a soft lockup in the delegation recovery code - Bulk destroy of layouts needs to be safe w.r.t. umount - Prevent thundering herd issues when the SUNRPC socket is not connected - Respect RPC call timeouts when retrying transmission Features: - Convert rpc auth layer to use xdr_streams - Config option to disable insecure RPCSEC_GSS crypto types - Reduce size of RPC receive buffers - Readdirplus optimization by cache mechanism - Convert SUNRPC socket send code to use iov_iter() - SUNRPC micro-optimisations to avoid indirect calls - Add support for the pNFS LAYOUTERROR operation and use it with the pNFS/flexfiles driver - Add trace events to report non-zero NFS status codes - Various removals of unnecessary dprintks Bugfixes and cleanups: - Fix a number of sparse warnings and documentation format warnings - Fix nfs_parse_devname to not modify it's argument - Fix potential corruption of page being written through pNFS/blocks - fix xfstest generic/099 failures on nfsv3 - Avoid NFSv4.1 "false retries" when RPC calls are interrupted - Abort I/O early if the pNFS/flexfiles layout segment was invalidated - Avoid unnecessary pNFS/flexfiles layout invalidations" * tag 'nfs-for-5.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (90 commits) SUNRPC: Take the transport send lock before binding+connecting SUNRPC: Micro-optimise when the task is known not to be sleeping SUNRPC: Check whether the task was transmitted before rebind/reconnect SUNRPC: Remove redundant calls to RPC_IS_QUEUED() SUNRPC: Clean up SUNRPC: Respect RPC call timeouts when retrying transmission SUNRPC: Fix up RPC back channel transmission SUNRPC: Prevent thundering herd when the socket is not connected SUNRPC: Allow dynamic allocation of back channel slots NFSv4.1: Bump the default callback session slot count to 16 SUNRPC: Convert remaining GFP_NOIO, and GFP_NOWAIT sites in sunrpc NFS/flexfiles: Clean up mirror DS initialisation NFS/flexfiles: Remove dead code in ff_layout_mirror_valid() NFS/flexfile: Simplify nfs4_ff_layout_select_ds_stateid() NFS/flexfile: Simplify nfs4_ff_layout_ds_version() NFS/flexfiles: Simplify ff_layout_get_ds_cred() NFS/flexfiles: Simplify nfs4_ff_find_or_create_ds_client() NFS/flexfiles: Simplify nfs4_ff_layout_select_ds_fh() NFS/flexfiles: Speed up read failover when DSes are down NFS/flexfiles: Don't invalidate DS deviceids for being unresponsive ...
2019-03-13Merge tag 'fuse-update-5.1' of ↵Linus Torvalds1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: "Scalability and performance improvements, as well as minor bug fixes and cleanups" * tag 'fuse-update-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (25 commits) fuse: cache readdir calls if filesystem opts out of opendir fuse: support clients that don't implement 'opendir' fuse: lift bad inode checks into callers fuse: multiplex cached/direct_io file operations fuse add copy_file_range to direct io fops fuse: use iov_iter based generic splice helpers fuse: Switch to using async direct IO for FOPEN_DIRECT_IO fuse: use atomic64_t for khctr fuse: clean up aborted fuse: Protect ff->reserved_req via corresponding fi->lock fuse: Protect fi->nlookup with fi->lock fuse: Introduce fi->lock to protect write related fields fuse: Convert fc->attr_version into atomic64_t fuse: Add fuse_inode argument to fuse_prepare_release() fuse: Verify userspace asks to requeue interrupt that we really sent fuse: Do some refactoring in fuse_dev_do_write() fuse: Wake up req->waitq of only if not background fuse: Optimize request_end() by not taking fiq->waitq.lock fuse: Kill fasync only if interrupt is queued in queue_interrupt() fuse: Remove stale comment in end_requests() ...
2019-03-13Merge branch 'work.mount' of ↵Linus Torvalds8-23/+412
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs mount infrastructure updates from Al Viro: "The rest of core infrastructure; no new syscalls in that pile, but the old parts are switched to new infrastructure. At that point conversions of individual filesystems can happen independently; some are done here (afs, cgroup, procfs, etc.), there's also a large series outside of that pile dealing with NFS (quite a bit of option-parsing stuff is getting used there - it's one of the most convoluted filesystems in terms of mount-related logics), but NFS bits are the next cycle fodder. It got seriously simplified since the last cycle; documentation is probably the weakest bit at the moment - I considered dropping the commit introducing Documentation/filesystems/mount_api.txt (cutting the size increase by quarter ;-), but decided that it would be better to fix it up after -rc1 instead. That pile allows to do followup work in independent branches, which should make life much easier for the next cycle. fs/super.c size increase is unpleasant; there's a followup series that allows to shrink it considerably, but I decided to leave that until the next cycle" * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits) afs: Use fs_context to pass parameters over automount afs: Add fs_context support vfs: Add some logging to the core users of the fs_context log vfs: Implement logging through fs_context vfs: Provide documentation for new mount API vfs: Remove kern_mount_data() hugetlbfs: Convert to fs_context cpuset: Use fs_context kernfs, sysfs, cgroup, intel_rdt: Support fs_context cgroup: store a reference to cgroup_ns into cgroup_fs_context cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper cgroup_do_mount(): massage calling conventions cgroup: stash cgroup_root reference into cgroup_fs_context cgroup2: switch to option-by-option parsing cgroup1: switch to option-by-option parsing cgroup: take options parsing into ->parse_monolithic() cgroup: fold cgroup1_mount() into cgroup1_get_tree() cgroup: start switching to fs_context ipc: Convert mqueue fs to fs_context proc: Add fs_context support to procfs ...