starfive-tech/linux.git - StarFive Tech Linux Kernel for VisionFive (JH7110) boards (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2022-05-22	rxrpc, afs: Fix selection of abort codes	David Howells	3	-6/+8
	The RX_USER_ABORT code should really only be used to indicate that the user of the rxrpc service (ie. userspace) implicitly caused a call to be aborted - for instance if the AF_RXRPC socket is closed whilst the call was in progress. (The user may also explicitly abort a call and specify the abort code to use). Change some of the points of generation to use other abort codes instead: (1) Abort the call with RXGEN_SS_UNMARSHAL or RXGEN_CC_UNMARSHAL if we see ENOMEM and EFAULT during received data delivery and abort with RX_CALL_DEAD in the default case. (2) Abort with RXGEN_SS_MARSHAL if we get ENOMEM whilst trying to send a reply. (3) Abort with RX_CALL_DEAD if we stop hearing from the peer if we had heard from the peer and abort with RX_CALL_TIMEOUT if we hadn't. (4) Abort with RX_CALL_DEAD if we try to disconnect a call that's not completed successfully or been aborted. Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	rxrpc: Return an error to sendmsg if call failed	David Howells	1	-0/+6
	If at the end of rxrpc sendmsg() or rxrpc_kernel_send_data() the call that was being given data was aborted remotely or otherwise failed, return an error rather than returning the amount of data buffered for transmission. The call (presumably) did not complete, so there's not much point continuing with it. AF_RXRPC considers it "complete" and so will be unwilling to do anything else with it - and won't send a notification for it, deeming the return from sendmsg sufficient. Not returning an error causes afs to incorrectly handle a StoreData operation that gets interrupted by a change of address due to NAT reconfiguration. This doesn't normally affect most operations since their request parameters tend to fit into a single UDP packet and afs_make_call() returns before the server responds; StoreData is different as it involves transmission of a lot of data. This can be triggered on a client by doing something like: dd if=/dev/zero of=/afs/example.com/foo bs=1M count=512 at one prompt, and then changing the network address at another prompt, e.g.: ifconfig enp6s0 inet 192.168.6.2 && route add 192.168.6.1 dev enp6s0 Tracing packets on an Auristor fileserver looks something like: 192.168.6.1 -> 192.168.6.3 RX 107 ACK Idle Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 192.168.6.3 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(64538) (64538) 192.168.6.3 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(64538) (64538) 192.168.6.1 -> 192.168.6.3 RX 107 ACK Idle Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 <ARP exchange for 192.168.6.2> 192.168.6.2 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(0) (0) 192.168.6.2 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(0) (0) 192.168.6.1 -> 192.168.6.2 RX 107 ACK Exceeds Window Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 192.168.6.1 -> 192.168.6.2 RX 74 ABORT Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 192.168.6.1 -> 192.168.6.2 RX 74 ABORT Seq: 29321 Call: 4 Source Port: 7000 Destination Port: 7001 The Auristor fileserver logs code -453 (RXGEN_SS_UNMARSHAL), but the abort code received by kafs is -5 (RX_PROTOCOL_ERROR) as the rx layer sees the condition and generates an abort first and the unmarshal error is a consequence of that at the application layer. Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2021-December/004810.html # v1 Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	rxrpc: Automatically generate trace tag enums	David Howells	1	-219/+42
	Automatically generate trace tag enums from the symbol -> string mapping tables rather than having the enums as well, thereby reducing duplicated data. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	rxrpc: Fix locking issue	David Howells	8	-22/+62
	There's a locking issue with the per-netns list of calls in rxrpc. The pieces of code that add and remove a call from the list use write_lock() and the calls procfile uses read_lock() to access it. However, the timer callback function may trigger a removal by trying to queue a call for processing and finding that it's already queued - at which point it has a spare refcount that it has to do something with. Unfortunately, if it puts the call and this reduces the refcount to 0, the call will be removed from the list. Unfortunately, since the _bh variants of the locking functions aren't used, this can deadlock. ================================ WARNING: inconsistent lock state 5.18.0-rc3-build4+ #10 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/2/25 [HC0[0]:SC1[1]:HE1:SE0] takes: ffff888107ac4038 (&rxnet->call_lock){+.?.}-{2:2}, at: rxrpc_put_call+0x103/0x14b {SOFTIRQ-ON-W} state was registered at: ... Possible unsafe locking scenario: CPU0 ---- lock(&rxnet->call_lock); <Interrupt> lock(&rxnet->call_lock); * DEADLOCK * 1 lock held by ksoftirqd/2/25: #0: ffff8881008ffdb0 ((&call->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x23d Changes ======= ver #2) - Changed to using list_next_rcu() rather than rcu_dereference() directly. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	rxrpc: Use refcount_t rather than atomic_t	David Howells	13	-122/+119
	Move to using refcount_t rather than atomic_t for refcounts in rxrpc. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	rxrpc: Allow list of in-use local UDP endpoints to be viewed in /proc	David Howells	4	-22/+94
	Allow the list of in-use local UDP endpoints in the current network namespace to be viewed in /proc. To aid with this, the endpoint list is converted to an hlist and RCU-safe manipulation is used so that the list can be read with only the RCU read lock held. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	Merge branch 'ipa-next'	David S. Miller	14	-162/+156
	Alex Elder says: ==================== net: ipa: a few more small items This series consists of three small sets of changes. Version 2 adds a patch that avoids a warning that occurs when handling a modem crash (I unfortunately didn't notice it earlier). All other patches are the same--just rebased. The first three patches allow a few endpoint features to be specified. At this time, currently-defined endpoints retain the same configuration, but when the monitor functionality is added in the next cycle these options will be required. The fourth patch simply removes an unused function, explaining also why it would likely never be used. The fifth patch is new. It counts the number of modem TX endpoints and uses it to determine how many TREs a transaction needs when when handling a modem crash. It is needed to avoid exceeding the limited number of commands imposed by the last four patches. And the last four patches refactor code related to IPA immediate commands, eliminating an unused field and then simplifying and removing some unneeded code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	net: ipa: use data space for command opcodes	Alex Elder	1	-2/+4
	The 64-bit data field in a transaction is not used for commands. And the opcode array is only used for commands. They're (currently) the same size; save a little space in the transaction structure by enclosing the two fields in a union. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	net: ipa: remove command info pool	Alex Elder	5	-50/+18
	The ipa_cmd_info structure now contains only one field, and it's an enumerated type whose values all fit in 8 bits. Currently we'll never use more than 8 TREs in a command transaction, and we can represent that number of command opcodes in the same space as a 64 bit pointer to an ipa_cmd_info structure. Define IPA_COMMAND_TRANS_TRE_MAX as the maximum number of TREs that can be in a command transaction. Replace the info pointer in a transaction with a fixed-size array named cmd_opcode[] of that many bytes. Store the opcode in this array when adding a command TRE to a transaction, as was done previously for the info array. This makes the ipa_cmd_info unused, so get rid of it. When committing an immediate command transaction, use the channel's Boolean command flag to determine whether to fill in the opcode, which will be taken (as before) from the array in the transaction. This makes the command info pool unnecessary, so get rid of it. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	net: ipa: remove command direction argument	Alex Elder	3	-20/+9
	We no longer use the direction argument for gsi_trans_cmd_add(), so get rid of it in its definition, and in its seven callers. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	net: ipa: get rid of ipa_cmd_info->direction	Alex Elder	2	-6/+1
	The direction field of the ipa_cmd_info structure is set, but never used. It seems it might have been used for the DMA_SHARED_MEM immediate command, but the DIRECTION flag is set based on the value of the passed-in direction flag there. Anyway, remove this unused field from the ipa_cmd_info structure. This is done as a separate patch to make it very obvious that it's not required. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	net: ipa: count the number of modem TX endpoints	Alex Elder	2	-5/+7
	In ipa_endpoint_modem_exception_reset_all(), a high estimate was made of the number of endpoints that need their status register updated. We only used what was needed, so the high estimate didn't matter much. However the next few patches are going to limit the number of commands in a single transaction, and the overestimate would exceed that. So count the number of modem TX endpoints at initialization time, and use it in ipa_endpoint_modem_exception_reset_all(). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	net: ipa: kill gsi_trans_commit_wait_timeout()	Alex Elder	4	-40/+7
	Since the beginning gsi_trans_commit_wait_timeout() has existed to provide a way to allow waiting a limited time for a transaction to complete. But that function has never been used. In fact, there is no use for this function, because a transaction committed to hardware should always complete. The only reason it might not complete is if there were a hardware failure, or perhaps a system configuration error. Furthermore, if a timeout ever did occur, the IPA hardware would be in an indeterminate state, from which there is no recovery. It would require some sort of complete IPA reset, and would require the participation of the modem, and at this time there is no such sequence defined. So get rid of the definition of gsi_trans_commit_wait_timeout(), and update a few comments accordingly. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	net: ipa: specify RX aggregation time limit in config data	Alex Elder	8	-4/+35
	Don't assume that a 500 microsecond time limit should be used for all receive endpoints that support aggregation. Instead, specify the time limit to use in the configuration data. Set a 500 microsecond limit for all existing RX endpoints, as before. Checking for overflow for the time limit field is a bit complicated. Rather than duplicate a lot of code in ipa_endpoint_data_valid_one(), call WARN() if any value is found to be too large when encoding it. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	net: ipa: support hard aggregation limits	Alex Elder	2	-35/+69
	Add a new flag for AP receive endpoints that indicates whether a "hard limit" is used as a criterion for closing aggregation. Add comments explaining the difference between "hard" and "soft" aggregation limits. Pass a flag to ipa_aggr_size_kb() so it computes the proper aggregation size value whether using hard or soft limits. Move that function earlier in "ipa_endpoint.c" so it can be used without a forward-reference. Update ipa_endpoint_data_valid_one() so it validates endpoints whose data indicate a hard aggregation limit is used, and so it reports set aggregation flags for endpoints without aggregation enabled. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	net: ipa: make endpoint HOLB drop configurable	Alex Elder	2	-2/+8
	Add a new Boolean flag for RX endpoints defining whether HOLB drop is initially enabled or disabled for the endpoint. All existing AP endpoints should have HOLB drop disabled. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	qed: fix typos in comments	Julia Lawall	2	-2/+2
	Spelling mistakes (triple letters) in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	nfp: flower: fix typo in comment	Julia Lawall	1	-1/+1
	Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	net: marvell: prestera: fix typo in comment	Julia Lawall	1	-1/+1
	Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	cirrus: cs89x0: fix typo in comment	Julia Lawall	1	-1/+1
	Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	net: qed: fix typos in comments	Julia Lawall	3	-5/+5
	Spelling mistakes (triple letters) in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	net/mlx5: fix typo in comment	Julia Lawall	1	-1/+1
	Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	net: mvpp2: fix typo in comment	Julia Lawall	1	-1/+1
	Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22	net: sparx5: switchdev: fix typo in comment	Julia Lawall	1	-1/+1
	Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-21	Merge branch 'bpf: refine kernel.unprivileged_bpf_disabled behaviour'	Alexei Starovoitov	3	-1/+408
	Alan Maguire says: ==================== Unprivileged BPF disabled (kernel.unprivileged_bpf_disabled >= 1) is the default in most cases now; when set, the BPF system call is blocked for users without CAP_BPF/CAP_SYS_ADMIN. In some cases however, it makes sense to split activities between capability-requiring ones - such as program load/attach - and those that might not require capabilities such as reading perf/ringbuf events, reading or updating BPF map configuration etc. One example of this sort of approach is a service that loads a BPF program, and a user-space program that interacts with it. Here - rather than blocking all BPF syscall commands - unprivileged BPF disabled blocks the key object-creating commands (prog load, map load). Discussion has alluded to this idea in the past [1], and Alexei mentioned it was also discussed at LSF/MM/BPF this year. Changes since v3 [2]: - added acks to patch 1 - CI was failing on Ubuntu; I suspect the issue was an old capability.h file which specified CAP_LAST_CAP as < CAP_BPF, leading to the logic disabling all caps not disabling CAP_BPF. Use CAP_BPF as basis for "all caps" bitmap instead as we explicitly define it in cap_helpers.h if not already found in capabilities.h - made global variables arguments to subtests instead (Andrii, patch 2) Changes since v2 [3]: - added acks from Yonghong - clang compilation issue in selftest with bpf_prog_query() (Alexei, patch 2) - disable all capabilities for test (Yonghong, patch 2) - add assertions that size of perf/ringbuf data matches expectations (Yonghong, patch 2) - add map array size definition, remove unneeded whitespace (Yonghong, patch 2) Changes since RFC [4]: - widened scope of commands unprivileged BPF disabled allows (Alexei, patch 1) - removed restrictions on map types for lookup, update, delete (Alexei, patch 1) - removed kernel CONFIG parameter controlling unprivileged bpf disabled change (Alexei, patch 1) - widened test scope to cover most BPF syscall commands, with positive and negative subtests [1] https://lore.kernel.org/bpf/CAADnVQLTBhCTAx1a_nev7CgMZxv1Bb7ecz1AFRin8tHmjPREJA@mail.gmail.com/ [2] https://lore.kernel.org/bpf/1652880861-27373-1-git-send-email-alan.maguire@oracle.com/T/ [3] https://lore.kernel.org/bpf/1652788780-25520-1-git-send-email-alan.maguire@oracle.com/T/#t [4] https://lore.kernel.org/bpf/20220511163604.5kuczj6jx3ec5qv6@MBP-98dd607d3435.dhcp.thefacebook.com/T/#mae65f35a193279e718f37686da636094d69b96ee ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-21	selftests/bpf: add tests verifying unprivileged bpf behaviour	Alan Maguire	2	-0/+395
	tests load/attach bpf prog with maps, perfbuf and ringbuf, pinning them. Then effective caps are dropped and we verify we can - pick up the pin - create ringbuf/perfbuf - get ringbuf/perfbuf events, carry out map update, lookup and delete - create a link Negative testing also ensures - BPF prog load fails - BPF map create fails - get fd by id fails - get next id fails - query fails - BTF load fails Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/1652970334-30510-3-git-send-email-alan.maguire@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-21	bpf: refine kernel.unprivileged_bpf_disabled behaviour	Alan Maguire	1	-1/+13
	With unprivileged BPF disabled, all cmds associated with the BPF syscall are blocked to users without CAP_BPF/CAP_SYS_ADMIN. However there are use cases where we may wish to allow interactions with BPF programs without being able to load and attach them. So for example, a process with required capabilities loads/attaches a BPF program, and a process with less capabilities interacts with it; retrieving perf/ring buffer events, modifying map-specified config etc. With all BPF syscall commands blocked as a result of unprivileged BPF being disabled, this mode of interaction becomes impossible for processes without CAP_BPF. As Alexei notes "The bpf ACL model is the same as traditional file's ACL. The creds and ACLs are checked at open(). Then during file's write/read additional checks might be performed. BPF has such functionality already. Different map_creates have capability checks while map_lookup has: map_get_sys_perms(map, f) & FMODE_CAN_READ. In other words it's enough to gate FD-receiving parts of bpf with unprivileged_bpf_disabled sysctl. The rest is handled by availability of FD and access to files in bpffs." So key fd creation syscall commands BPF_PROG_LOAD and BPF_MAP_CREATE are blocked with unprivileged BPF disabled and no CAP_BPF. And as Alexei notes, map creation with unprivileged BPF disabled off blocks creation of maps aside from array, hash and ringbuf maps. Programs responsible for loading and attaching the BPF program can still control access to its pinned representation by restricting permissions on the pin path, as with normal files. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/r/1652970334-30510-2-git-send-email-alan.maguire@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-21	bpf: Allow kfunc in tracing and syscall programs.	Benjamin Tissoires	1	-0/+6
	Tracing and syscall BPF program types are very convenient to add BPF capabilities to subsystem otherwise not BPF capable. When we add kfuncs capabilities to those program types, we can add BPF features to subsystems without having to touch BPF core. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20220518205924.399291-2-benjamin.tissoires@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-21	Merge branch 'add-a-bhash2-table-hashed-by-port-address'	Jakub Kicinski	10	-83/+611
	Joanne Koong says: ==================== Add a bhash2 table hashed by port + address This patchset proposes adding a bhash2 table that hashes by port and address. The motivation behind bhash2 is to expedite bind requests in situations where the port has many sockets in its bhash table entry, which makes checking bind conflicts costly especially given that we acquire the table entry spinlock while doing so, which can cause softirq cpu lockups and can prevent new tcp connections. We ran into this problem at Meta where the traffic team binds a large number of IPs to port 443 and the bind() call took a significant amount of time which led to cpu softirq lockups, which caused packet drops and other failures on the machine The patches are as follows: 1/2 - Adds a second bhash table (bhash2) hashed by port and address 2/2 - Adds a test for timing how long an additional bind request takes when the bhash entry is populated When experimentally testing this on a local server for ~24k sockets bound to the port, the results seen were: ipv4: before - 0.002317 seconds with bhash2 - 0.000018 seconds ipv6: before - 0.002431 seconds with bhash2 - 0.000021 seconds ==================== Link: https://lore.kernel.org/r/20220520001834.2247810-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-21	selftests: Add test for timing a bind request to a port with a populated ↵	Joanne Koong	3	-0/+122
	bhash entry This test populates the bhash table for a given port with MAX_THREADS * MAX_CONNECTIONS sockets, and then times how long a bind request on the port takes. When populating the bhash table, we create the sockets and then bind the sockets to the same address and port (SO_REUSEADDR and SO_REUSEPORT are set). When timing how long a bind on the port takes, we bind on a different address without SO_REUSEPORT set. We do not set SO_REUSEPORT because we are interested in the case where the bind request does not go through the tb->fastreuseport path, which is fragile (eg tb->fastreuseport path does not work if binding with a different uid). To run the test locally, I did: * ulimit -n 65535000 * ip addr add 2001:0db8:0:f101::1 dev eth0 * ./bind_bhash_test 443 Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-21	net: Add a second bind table hashed by port and address	Joanne Koong	7	-83/+489
	We currently have one tcp bind table (bhash) which hashes by port number only. In the socket bind path, we check for bind conflicts by traversing the specified port's inet_bind2_bucket while holding the bucket's spinlock (see inet_csk_get_port() and inet_csk_bind_conflict()). In instances where there are tons of sockets hashed to the same port at different addresses, checking for a bind conflict is time-intensive and can cause softirq cpu lockups, as well as stops new tcp connections since __inet_inherit_port() also contests for the spinlock. This patch proposes adding a second bind table, bhash2, that hashes by port and ip address. Searching the bhash2 table leads to significantly faster conflict resolution and less time holding the spinlock. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-21	wwan: iosm: use a flexible array rather than allocate short objects	Jakub Kicinski	1	-4/+1
	GCC array-bounds warns that ipc_coredump_get_list() under-allocates the size of struct iosm_cd_table *cd_table. This is avoidable - we just need a flexible array. Nothing calls sizeof() on struct iosm_cd_list or anything that contains it. Reviewed-by: M Chetan Kumar <m.chetan.kumar@intel.com> Link: https://lore.kernel.org/r/20220520060013.2309497-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-21	hv_netvsc: Fix potential dereference of NULL pointer	Yongzhi Liu	1	-1/+4
	The return value of netvsc_devinfo_get() needs to be checked to avoid use of NULL pointer in case of an allocation failure. Fixes: 0efeea5fb153 ("hv_netvsc: Add the support of hibernation") Signed-off-by: Yongzhi Liu <lyz_cs@pku.edu.cn> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://lore.kernel.org/r/1652962188-129281-1-git-send-email-lyz_cs@pku.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-21	stcp: Use memset_after() to zero sctp_stream_out_ext	Xiu Jianfeng	1	-6/+3
	Use memset_after() helper to simplify the code, there is no functional change in this patch. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Link: https://lore.kernel.org/r/20220519062932.249926-1-xiujianfeng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-21	net: mscc: fix the alignment in ocelot_port_fdb_del()	Alaa Mohamed	1	-1/+1
	align the extack argument of the ocelot_port_fdb_del() function. Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com> Link: https://lore.kernel.org/r/20220520002040.4442-1-eng.alaamohamedsoliman.am@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-21	net: vxlan: Fix kernel coding style	Alaa Mohamed	1	-7/+6
	The continuation line does not align with the opening bracket and this patch fix it. Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com> Link: https://lore.kernel.org/r/20220520003614.6073-1-eng.alaamohamedsoliman.am@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-21	eth: bnxt: make ulp_id unsigned to make GCC 12 happy	Jakub Kicinski	2	-12/+12
	GCC array bounds checking complains that ulp_id is validated only against upper bound. Make it unsigned. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20220520061955.2312968-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-21	selftests: fib_nexthops: Make ping timeout configurable	Amit Cohen	1	-25/+28
	Commit 49bb39bddad2 ("selftests: fib_nexthops: Make the test more robust") increased the timeout of ping commands to 5 seconds, to make the test more robust. Make the timeout configurable using '-w' argument to allow user to change it depending on the system that runs the test. Some systems suffer from slow forwarding performance, so they may need to change the timeout. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20220519070921.3559701-1-amcohen@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-21	net: wwan: t7xx: use GFP_ATOMIC under spin lock in t7xx_cldma_gpd_set_next_ptr()	Yang Yingliang	1	-5/+5
	Sometimes t7xx_cldma_gpd_set_next_ptr() is called under spin lock, so add 'gfp_mask' parameter in t7xx_cldma_gpd_set_next_ptr() to pass the flag. Fixes: 39d439047f1d ("net: wwan: t7xx: Add control DMA interface") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Link: https://lore.kernel.org/r/20220519032108.2996400-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-21	Merge branch 'amt-fix-several-bugs-in-gateway-mode'	Jakub Kicinski	1	-5/+6
	Taehee Yoo says: ==================== amt: fix several bugs in gateway mode This patchset fixes bugs in amt module. First patch fixes amt gateway mode's status stuck. amt gateway and relay established so these two mode manage status. But gateway stuck to change its own status if a relay doesn't send responses. Second patch fixes a memory leak. amt gateway skips some handling of advertisement message. So, a memory leak would occur. ==================== Link: https://lore.kernel.org/r/20220519031555.3192-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-21	net: tulip: fix build with CONFIG_GSC	Rolf Eike Beer	1	-1/+1
	Fix typo which breaks build for parisc. Fixes: 3daebfbeb455 ("net: tulip: convert to devres") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lore.kernel.org/all/CA+G9fYuCzU5VZ_nc+6NEdBXJdVCH=J2SB1Na1G_NS_0BNdGYtg@mail.gmail.com/ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Link: https://lore.kernel.org/r/4719560.GXAFRqVoOG@eto.sf-tec.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-21	amt: fix memory leak for advertisement message	Taehee Yoo	1	-3/+2
	When a gateway receives an advertisement message, it extracts relay information and then it should be freed. But the advertisement handler doesn't free it. So, memory leak would occur. Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-21	amt: fix gateway mode stuck	Taehee Yoo	1	-2/+4
	If a gateway can not receive any response to requests from a relay, gateway resets status from SENT_REQUEST to INIT and variable about a relay as well. And then it should start the full establish step from sending a discovery message and receiving advertisement message. But, after failure in amt_req_work() it continues sending a request message step with flushed(invalid) relay information and sets SENT_REQUEST. So, a gateway can't be established with a relay. In order to avoid this situation, it stops sending the request message step if it fails. Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-21	net: avoid strange behavior with skb_defer_max == 1	Jakub Kicinski	1	-8/+5
	When user sets skb_defer_max to 1 the kick threshold is 0 (half of 1). If we increment queue length before the check the kick will never happen, and the skb may get stranded. This is likely harmless but can be avoided by moving the increment after the check. This way skb_defer_max == 1 will always kick. Still a silly config to have, but somehow that feels more correct. While at it drop a comment which seems to be outdated or confusing, and wrap the defer_count write with a WRITE_ONCE() since it's read on the fast path that avoids taking the lock. Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220518185522.2038683-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-21	net: stmmac: fix out-of-bounds access in a selftest	Jakub Kicinski	1	-7/+6
	GCC 12 points out that struct tc_action is smaller than struct tcf_action: drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c: In function ‘stmmac_test_rxp’: drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c:1132:21: warning: array subscript ‘struct tcf_gact[0]’ is partly outside array bounds of ‘unsigned char[272]’ [-Warray-bounds] 1132 \| gact->tcf_action = TC_ACT_SHOT; \| ^~ Fixes: ccfc639a94f2 ("net: stmmac: selftests: Add a selftest for Flexible RX Parser") Link: https://lore.kernel.org/r/20220519004305.2109708-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-21	sfc/siena: Remove duplicate check on segments	Martin Habets	1	-8/+1
	Siena only supports software TSO. This means more code can be deleted, as pointed out by the Smatch static checker warning: drivers/net/ethernet/sfc/siena/tx.c:184 __efx_siena_enqueue_skb() warn: duplicate check 'segments' (previous on line 158) Fixes: 956f2d86cb37 ("sfc/siena: Remove build references to missing functionality") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/kernel-janitors/YoH5tJMnwuGTrn1Z@kili/ Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/165294463549.23865.4557617334650441347.stgit@palantir17.mph.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-21	selftests/bpf: Remove filtered subtests from output	Mykola Lysenko	2	-2/+8
	Currently filtered subtests show up in the output as skipped. Before: $ sudo ./test_progs -t log_fixup/missing_map #94 /1 log_fixup/bad_core_relo_trunc_none:SKIP #94 /2 log_fixup/bad_core_relo_trunc_partial:SKIP #94 /3 log_fixup/bad_core_relo_trunc_full:SKIP #94 /4 log_fixup/bad_core_relo_subprog:SKIP #94 /5 log_fixup/missing_map:OK #94 log_fixup:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED After: $ sudo ./test_progs -t log_fixup/missing_map #94 /5 log_fixup/missing_map:OK #94 log_fixup:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Mykola Lysenko <mykolal@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220520061303.4004808-1-mykolal@fb.com
2022-05-21	selftests/bpf: Fix subtest number formatting in test_progs	Mykola Lysenko	1	-3/+9
	Remove weird spaces around / while preserving proper indentation Signed-off-by: Mykola Lysenko <mykolal@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Müller <deso@posteo.net> Link: https://lore.kernel.org/bpf/20220520070144.10312-1-mykolal@fb.com
2022-05-21	selftests/bpf: Add missing trampoline program type to trampoline_count test	Yuntao Wang	3	-91/+61
	Currently the trampoline_count test doesn't include any fmod_ret bpf programs, fix it to make the test cover all possible trampoline program types. Since fmod_ret bpf programs can't be attached to __set_task_comm function, as it's neither whitelisted for error injection nor a security hook, change it to bpf_modify_return_test. This patch also does some other cleanups such as removing duplicate code, dropping inconsistent comments, etc. Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220519150610.601313-1-ytcoode@gmail.com
2022-05-21	Merge branch 'bpf: mptcp: Support for mptcp_sock'	Andrii Nakryiko	18	-10/+381
	Mat Martineau says: ==================== This patch set adds BPF access to mptcp_sock structures, along with associated self tests. You may recognize some of the code from earlier (https://lore.kernel.org/bpf/20200918121046.190240-6-nicolas.rybowski@tessares.net/) but it has been reworked quite a bit. v1 -> v2: Emit BTF type, add func_id checks in verifier.c and bpf_trace.c, remove build check for CONFIG_BPF_JIT, add selftest check for CONFIG_MPTCP, and add a patch to include CONFIG_IKCONFIG/CONFIG_IKCONFIG_PROC for the BPF self tests. v2 -> v3: Access sysctl through the filesystem to work around CI use of the more limited busybox sysctl command. v3 -> v4: Dropped special case kernel code for tcp_sock is_mptcp, use existing bpf_tcp_helpers.h, and add check for 'ip mptcp monitor' support. v4 -> v5: Use BPF test skeleton, more consistent use of ASSERT macros, drop some unnecessary parameters / checks, and use tracing to acquire MPTCP token. Geliang Tang (6): bpf: add bpf_skc_to_mptcp_sock_proto selftests/bpf: Enable CONFIG_IKCONFIG_PROC in config selftests/bpf: test bpf_skc_to_mptcp_sock selftests/bpf: verify token of struct mptcp_sock selftests/bpf: verify ca_name of struct mptcp_sock selftests/bpf: verify first of struct mptcp_sock ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>