summaryrefslogtreecommitdiff
path: root/include/uapi
AgeCommit message (Collapse)AuthorFilesLines
2024-08-29bpf: Replace bpf_lpm_trie_key 0-length array with flexible arrayKees Cook1-1/+18
[ Upstream commit 896880ff30866f386ebed14ab81ce1ad3710cfc4 ] Replace deprecated 0-length array in struct bpf_lpm_trie_key with flexible array. Found with GCC 13: ../kernel/bpf/lpm_trie.c:207:51: warning: array subscript i is outside array bounds of 'const __u8[0]' {aka 'const unsigned char[]'} [-Warray-bounds=] 207 | *(__be16 *)&key->data[i]); | ^~~~~~~~~~~~~ ../include/uapi/linux/swab.h:102:54: note: in definition of macro '__swab16' 102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x)) | ^ ../include/linux/byteorder/generic.h:97:21: note: in expansion of macro '__be16_to_cpu' 97 | #define be16_to_cpu __be16_to_cpu | ^~~~~~~~~~~~~ ../kernel/bpf/lpm_trie.c:206:28: note: in expansion of macro 'be16_to_cpu' 206 | u16 diff = be16_to_cpu(*(__be16 *)&node->data[i] ^ | ^~~~~~~~~~~ In file included from ../include/linux/bpf.h:7: ../include/uapi/linux/bpf.h:82:17: note: while referencing 'data' 82 | __u8 data[0]; /* Arbitrary size */ | ^~~~ And found at run-time under CONFIG_FORTIFY_SOURCE: UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:218:49 index 0 is out of range for type '__u8 [*]' Changing struct bpf_lpm_trie_key is difficult since has been used by userspace. For example, in Cilium: struct egress_gw_policy_key { struct bpf_lpm_trie_key lpm_key; __u32 saddr; __u32 daddr; }; While direct references to the "data" member haven't been found, there are static initializers what include the final member. For example, the "{}" here: struct egress_gw_policy_key in_key = { .lpm_key = { 32 + 24, {} }, .saddr = CLIENT_IP, .daddr = EXTERNAL_SVC_IP & 0Xffffff, }; To avoid the build time and run time warnings seen with a 0-sized trailing array for struct bpf_lpm_trie_key, introduce a new struct that correctly uses a flexible array for the trailing bytes, struct bpf_lpm_trie_key_u8. As part of this, include the "header" portion (which is just the "prefixlen" member), so it can be used by anything building a bpf_lpr_trie_key that has trailing members that aren't a u8 flexible array (like the self-test[1]), which is named struct bpf_lpm_trie_key_hdr. Unfortunately, C++ refuses to parse the __struct_group() helper, so it is not possible to define struct bpf_lpm_trie_key_hdr directly in struct bpf_lpm_trie_key_u8, so we must open-code the union directly. Adjust the kernel code to use struct bpf_lpm_trie_key_u8 through-out, and for the selftest to use struct bpf_lpm_trie_key_hdr. Add a comment to the UAPI header directing folks to the two new options. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Closes: https://paste.debian.net/hidden/ca500597/ Link: https://lore.kernel.org/all/202206281009.4332AA33@keescook/ [1] Link: https://lore.kernel.org/bpf/20240222155612.it.533-kees@kernel.org Stable-dep-of: 59f2f841179a ("bpf: Avoid kfree_rcu() under lock in bpf_lpm_trie.") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03m68k: amiga: Turn off Warp1260 interrupts during bootPaolo Pisati1-0/+3
commit 1d8491d3e726984343dd8c3cdbe2f2b47cfdd928 upstream. On an Amiga 1200 equipped with a Warp1260 accelerator, an interrupt storm coming from the accelerator board causes the machine to crash in local_irq_enable() or auto_irq_enable(). Disabling interrupts for the Warp1260 in amiga_parse_bootinfo() fixes the problem. Link: https://lore.kernel.org/r/ZkjwzVwYeQtyAPrL@amaterasu.local Cc: stable <stable@kernel.org> Signed-off-by: Paolo Pisati <p.pisati@gmail.com> Reviewed-by: Michael Schmitz <schmitzmic@gmail.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20240601153254.186225-1-p.pisati@gmail.com Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03netfilter: nf_tables: rise cap on SELinux secmark contextPablo Neira Ayuso1-1/+1
[ Upstream commit e29630247be24c3987e2b048f8e152771b32d38b ] secmark context is artificially limited 256 bytes, rise it to 4Kbytes. Fixes: fb961945457f ("netfilter: nf_tables: add SECMARK support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-05syscalls: fix compat_sys_io_pgetevents_time64 usageArnd Bergmann1-1/+1
commit d3882564a77c21eb746ba5364f3fa89b88de3d61 upstream. Using sys_io_pgetevents() as the entry point for compat mode tasks works almost correctly, but misses the sign extension for the min_nr and nr arguments. This was addressed on parisc by switching to compat_sys_io_pgetevents_time64() in commit 6431e92fc827 ("parisc: io_pgetevents_time64() needs compat syscall in 32-bit compat mode"), as well as by using more sophisticated system call wrappers on x86 and s390. However, arm64, mips, powerpc, sparc and riscv still have the same bug. Change all of them over to use compat_sys_io_pgetevents_time64() like parisc already does. This was clearly the intention when the function was originally added, but it got hooked up incorrectly in the tables. Cc: stable@vger.kernel.org Fixes: 48166e6ea47d ("y2038: add 64-bit time_t syscalls to all 32-bit architectures") Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-12bpf: Pack struct bpf_fib_lookupAnton Protopopov1-1/+1
[ Upstream commit f91717007217d975aa975ddabd91ae1a107b9bff ] The struct bpf_fib_lookup is supposed to be of size 64. A recent commit 59b418c7063d ("bpf: Add a check for struct bpf_fib_lookup size") added a static assertion to check this property so that future changes to the structure will not accidentally break this assumption. As it immediately turned out, on some 32-bit arm systems, when AEABI=n, the total size of the structure was equal to 68, see [1]. This happened because the bpf_fib_lookup structure contains a union of two 16-bit fields: union { __u16 tot_len; __u16 mtu_result; }; which was supposed to compile to a 16-bit-aligned 16-bit field. On the aforementioned setups it was instead both aligned and padded to 32-bits. Declare this inner union as __attribute__((packed, aligned(2))) such that it always is of size 2 and is aligned to 16 bits. [1] https://lore.kernel.org/all/CA+G9fYtsoP51f-oP_Sp5MOq-Ffv8La2RztNpwvE6+R1VtFiLrw@mail.gmail.com/#t Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Fixes: e1850ea9bd9e ("bpf: bpf_fib_lookup return MTU value as output when looked up") Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240403123303.1452184-1-aspsk@isovalent.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-17scsi: mpi3mr: Avoid memcpy field-spanning write WARNINGShin'ichiro Kawasaki1-1/+1
[ Upstream commit 429846b4b6ce9853e0d803a2357bb2e55083adf0 ] When the "storcli2 show" command is executed for eHBA-9600, mpi3mr driver prints this WARNING message: memcpy: detected field-spanning write (size 128) of single field "bsg_reply_buf->reply_buf" at drivers/scsi/mpi3mr/mpi3mr_app.c:1658 (size 1) WARNING: CPU: 0 PID: 12760 at drivers/scsi/mpi3mr/mpi3mr_app.c:1658 mpi3mr_bsg_request+0x6b12/0x7f10 [mpi3mr] The cause of the WARN is 128 bytes memcpy to the 1 byte size array "__u8 replay_buf[1]" in the struct mpi3mr_bsg_in_reply_buf. The array is intended to be a flexible length array, so the WARN is a false positive. To suppress the WARN, remove the constant number '1' from the array declaration and clarify that it has flexible length. Also, adjust the memory allocation size to match the change. Suggested-by: Sathya Prakash Veerichetty <sathya.prakash@broadcom.com> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20240323084155.166835-1-shinichiro.kawasaki@wdc.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-27PCI/DPC: Use FIELD_GET()Bjorn Helgaas1-0/+1
[ Upstream commit 9a9eec4765737b9b2a8d6ae03de6480a5f12dd5c ] Use FIELD_GET() to remove dependencies on the field position, i.e., the shift value. No functional change intended. Link: https://lore.kernel.org/r/20231018113254.17616-5-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13Input: allocate keycode for Display refresh rate toggleGergo Koteles1-0/+1
[ Upstream commit cfeb98b95fff25c442f78a6f616c627bc48a26b7 ] Newer Lenovo Yogas and Legions with 60Hz/90Hz displays send a wmi event when Fn + R is pressed. This is intended for use to switch between the two refresh rates. Allocate a new KEY_REFRESH_RATE_TOGGLE keycode for it. Signed-off-by: Gergo Koteles <soyer@irl.hu> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Link: https://lore.kernel.org/r/15a5d08c84cf4d7b820de34ebbcf8ae2502fb3ca.1710065750.git.soyer@irl.hu Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27RDMA/irdma: Allow accurate reporting on QP max send/recv WRSindhu Devale1-0/+6
[ Upstream commit 3a8498720450174b8db450d3375a04dca81b3534 ] Currently the attribute cap.max_send_wr and cap.max_recv_wr sent from user-space during create QP are the provider computed SQ/RQ depth as opposed to raw values passed from application. This inhibits computation of an accurate value for max_send_wr and max_recv_wr for this QP in the kernel which matches the value returned in user create QP. Also these capabilities needs to be reported from the driver in query QP. Add support by extending the ABI to allow the raw cap.max_send_wr and cap.max_recv_wr to be passed from user-space, while keeping compatibility for the older scheme. The internal HW depth and shift needed for the WQs needs to be computed now for both kernel and user-mode QPs. Add new helpers to assist with this: irdma_uk_calc_depth_shift_sq, irdma_uk_calc_depth_shift_rq and irdma_uk_calc_depth_shift_wq. Consolidate all the user mode QP setup into a new function irdma_setup_umode_qp which keeps it with its counterpart irdma_setup_kmode_qp. Signed-off-by: Youvaraj Sagar <youvaraj.sagar@intel.com> Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230725155525.1081-2-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Stable-dep-of: 926e8ea4b8da ("RDMA/irdma: Remove duplicate assignment") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-06bpf: Derive source IP addr via bpf_*_fib_lookup()Martynas Pumputis1-0/+10
commit dab4e1f06cabb6834de14264394ccab197007302 upstream. Extend the bpf_fib_lookup() helper by making it to return the source IPv4/IPv6 address if the BPF_FIB_LOOKUP_SRC flag is set. For example, the following snippet can be used to derive the desired source IP address: struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr }; ret = bpf_skb_fib_lookup(skb, p, sizeof(p), BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_SKIP_NEIGH); if (ret != BPF_FIB_LKUP_RET_SUCCESS) return TC_ACT_SHOT; /* the p.ipv4_src now contains the source address */ The inability to derive the proper source address may cause malfunctions in BPF-based dataplanes for hosts containing netdevs with more than one routable IP address or for multi-homed hosts. For example, Cilium implements packet masquerading in BPF. If an egressing netdev to which the Cilium's BPF prog is attached has multiple IP addresses, then only one [hardcoded] IP address can be used for masquerading. This breaks connectivity if any other IP address should have been selected instead, for example, when a public and private addresses are attached to the same egress interface. The change was tested with Cilium [1]. Nikolay Aleksandrov helped to figure out the IPv6 addr selection. [1]: https://github.com/cilium/cilium/pull/28283 Signed-off-by: Martynas Pumputis <m@lambda.lt> Link: https://lore.kernel.org/r/20231007081415.33502-2-m@lambda.lt Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06bpf: Add table ID to bpf_fib_lookup BPF helperLouis DeLosSantos1-3/+18
commit 8ad77e72caae22a1ddcfd0c03f2884929e93b7a4 upstream. Add ability to specify routing table ID to the `bpf_fib_lookup` BPF helper. A new field `tbid` is added to `struct bpf_fib_lookup` used as parameters to the `bpf_fib_lookup` BPF helper. When the helper is called with the `BPF_FIB_LOOKUP_DIRECT` and `BPF_FIB_LOOKUP_TBID` flags the `tbid` field in `struct bpf_fib_lookup` will be used as the table ID for the fib lookup. If the `tbid` does not exist the fib lookup will fail with `BPF_FIB_LKUP_RET_NOT_FWDED`. The `tbid` field becomes a union over the vlan related output fields in `struct bpf_fib_lookup` and will be zeroed immediately after usage. This functionality is useful in containerized environments. For instance, if a CNI wants to dictate the next-hop for traffic leaving a container it can create a container-specific routing table and perform a fib lookup against this table in a "host-net-namespace-side" TC program. This functionality also allows `ip rule` like functionality at the TC layer, allowing an eBPF program to pick a routing table based on some aspect of the sk_buff. As a concrete use case, this feature will be used in Cilium's SRv6 L3VPN datapath. When egress traffic leaves a Pod an eBPF program attached by Cilium will determine which VRF the egress traffic should target, and then perform a FIB lookup in a specific table representing this VRF's FIB. Signed-off-by: Louis DeLosSantos <louis.delos.devel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230505-bpf-add-tbid-fib-lookup-v2-1-0a31c22c748c@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06uapi: in6: replace temporary label with rfc9486Justin Iurman1-1/+1
[ Upstream commit 6a2008641920a9c6fe1abbeb9acbec463215d505 ] Not really a fix per se, but IPV6_TLV_IOAM is still tagged as "TEMPORARY IANA allocation for IOAM", while RFC 9486 is available for some time now. Just update the reference. Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace") Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240226124921.9097-1-justin.iurman@uliege.be Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-16netfilter: nft_compat: reject unused compat flagPablo Neira Ayuso1-0/+2
[ Upstream commit 292781c3c5485ce33bd22b2ef1b2bed709b4d672 ] Flag (1 << 0) is ignored is set, never used, reject it it with EINVAL instead. Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-01btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_argsQu Wenruo1-0/+3
commit 173431b274a9a54fc10b273b46e67f46bcf62d2e upstream. Add extra sanity check for btrfs_ioctl_defrag_range_args::flags. This is not really to enhance fuzzing tests, but as a preparation for future expansion on btrfs_ioctl_defrag_range_args. In the future we're going to add new members, allowing more fine tuning for btrfs defrag. Without the -ENONOTSUPP error, there would be no way to detect if the kernel supports those new defrag features. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-26bpf: Add crosstask check to __bpf_get_stackJordan Rome1-0/+3
[ Upstream commit b8e3a87a627b575896e448021e5c2f8a3bc19931 ] Currently get_perf_callchain only supports user stack walking for the current task. Passing the correct *crosstask* param will return 0 frames if the task passed to __bpf_get_stack isn't the current one instead of a single incorrect frame/address. This change passes the correct *crosstask* param but also does a preemptive check in __bpf_get_stack if the task is current and returns -EOPNOTSUPP if it is not. This issue was found using bpf_get_task_stack inside a BPF iterator ("iter/task"), which iterates over all tasks. bpf_get_task_stack works fine for fetching kernel stacks but because get_perf_callchain relies on the caller to know if the requested *task* is the current one (via *crosstask*) it was failing in a confusing way. It might be possible to get user stacks for all tasks utilizing something like access_process_vm but that requires the bpf program calling bpf_get_task_stack to be sleepable and would therefore be a breaking change. Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()") Signed-off-by: Jordan Rome <jordalgo@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231108112334.3433136-1-jordalgo@meta.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-13netfilter: nft_exthdr: add boolean DCCP option matchingJeremy Sowden1-0/+2
[ Upstream commit b9f9a485fb0eb80b0e2b90410b28cbb9b0e85687 ] The xt_dccp iptables module supports the matching of DCCP packets based on the presence or absence of DCCP options. Extend nft_exthdr to add this functionality to nftables. Link: https://bugzilla.netfilter.org/show_bug.cgi?id=930 Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de> Stable-dep-of: 63331e37fb22 ("netfilter: nf_tables: fix 'exist' matching on bigendian arches") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-08uapi: propagate __struct_group() attributes to the container unionDmitry Antipov1-1/+1
[ Upstream commit 4e86f32a13af1970d21be94f659cae56bbe487ee ] Recently the kernel test robot has reported an ARM-specific BUILD_BUG_ON() in an old and unmaintained wil6210 wireless driver. The problem comes from the structure packing rules of old ARM ABI ('-mabi=apcs-gnu'). For example, the following structure is packed to 18 bytes instead of 16: struct poorly_packed { unsigned int a; unsigned int b; unsigned short c; union { struct { unsigned short d; unsigned int e; } __attribute__((packed)); struct { unsigned short d; unsigned int e; } __attribute__((packed)) inner; }; } __attribute__((packed)); To fit it into 16 bytes, it's required to add packed attribute to the container union as well: struct poorly_packed { unsigned int a; unsigned int b; unsigned short c; union { struct { unsigned short d; unsigned int e; } __attribute__((packed)); struct { unsigned short d; unsigned int e; } __attribute__((packed)) inner; } __attribute__((packed)); } __attribute__((packed)); Thanks to Andrew Pinski of GCC team for sorting the things out at https://gcc.gnu.org/pipermail/gcc/2023-November/242888.html. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311150821.cI4yciFE-lkp@intel.com Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Link: https://lore.kernel.org/r/20231120110607.98956-1-dmantipov@yandex.ru Fixes: 50d7bd38c3aa ("stddef: Introduce struct_group() helper macro") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28vsock: read from socket's error queueArseniy Krasnov1-0/+17
[ Upstream commit 49dbe25adac42d3e06f65d1420946bec65896222 ] This adds handling of MSG_ERRQUEUE input flag in receive call. This flag is used to read socket's error queue instead of data queue. Possible scenario of error queue usage is receiving completions for transmission with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK' and 'VSOCK_RECVERR'. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20x86/sev: Change snp_guest_issue_request()'s fw_err argumentDionna Glaze1-2/+16
[ Upstream commit 0144e3b85d7b42e8a4cda991c0e81f131897457a ] The GHCB specification declares that the firmware error value for a guest request will be stored in the lower 32 bits of EXIT_INFO_2. The upper 32 bits are for the VMM's own error code. The fw_err argument to snp_guest_issue_request() is thus a misnomer, and callers will need access to all 64 bits. The type of unsigned long also causes problems, since sw_exit_info2 is u64 (unsigned long long) vs the argument's unsigned long*. Change this type for issuing the guest request. Pass the ioctl command struct's error field directly instead of in a local variable, since an incomplete guest request may not set the error code, and uninitialized stack memory would be written back to user space. The firmware might not even be called, so bookend the call with the no firmware call error and clear the error. Since the "fw_err" field is really exitinfo2 split into the upper bits' vmm error code and lower bits' firmware error code, convert the 64 bit value to a union. [ bp: - Massage commit message - adjust code - Fix a build issue as Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202303070609.vX6wp2Af-lkp@intel.com - print exitinfo2 in hex Tom: - Correct -EIO exit case. ] Signed-off-by: Dionna Glaze <dionnaglaze@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230214164638.1189804-5-dionnaglaze@google.com Link: https://lore.kernel.org/r/20230307192449.24732-12-bp@alien8.de Stable-dep-of: db10cb9b5746 ("virt: sevguest: Fix passing a stack buffer as a scatterlist target") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20crypto: ccp - Name -1 return value as SEV_RET_NO_FW_CALLPeter Gonda1-0/+7
[ Upstream commit efb339a83368ab25de1a18c0fdff85e01c13a1ea ] The PSP can return a "firmware error" code of -1 in circumstances where the PSP has not actually been called. To make this protocol unambiguous, name the value SEV_RET_NO_FW_CALL. [ bp: Massage a bit. ] Signed-off-by: Peter Gonda <pgonda@google.com> Signed-off-by: Dionna Glaze <dionnaglaze@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20221207010210.2563293-2-dionnaglaze@google.com Stable-dep-of: db10cb9b5746 ("virt: sevguest: Fix passing a stack buffer as a scatterlist target") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-02gtp: uapi: fix GTPA_MAXPablo Neira Ayuso1-1/+1
[ Upstream commit adc8df12d91a2b8350b0cd4c7fec3e8546c9d1f8 ] Subtract one to __GTPA_MAX, otherwise GTPA_MAX is off by 2. Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10netlink: remove the flex array from struct nlmsghdrJakub Kicinski1-2/+0
commit c73a72f4cbb47672c8cc7f7d7aba52f1cb15baca upstream. I've added a flex array to struct nlmsghdr in commit 738136a0e375 ("netlink: split up copies in the ack construction") to allow accessing the data easily. It leads to warnings with clang, if user space wraps this structure into another struct and the flex array is not at the end of the container. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/all/20221114023927.GA685@u2004-local/ Link: https://lore.kernel.org/r/20221118033903.1651026-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-10netlink: split up copies in the ack constructionJakub Kicinski1-0/+2
[ Upstream commit 738136a0e3757a8534df3ad97d6ff6d7f429f6c1 ] Clean up the use of unsafe_memcpy() by adding a flexible array at the end of netlink message header and splitting up the header and data copies. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: d0f95894fda7 ("netlink: annotate data-races around sk->sk_err") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookupMartin KaFai Lau1-0/+6
[ Upstream commit 31de4105f00d64570139bc5494a201b0bd57349f ] The bpf_fib_lookup() also looks up the neigh table. This was done before bpf_redirect_neigh() was added. In the use case that does not manage the neigh table and requires bpf_fib_lookup() to lookup a fib to decide if it needs to redirect or not, the bpf prog can depend only on using bpf_redirect_neigh() to lookup the neigh. It also keeps the neigh entries fresh and connected. This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid the double neigh lookup when the bpf prog always call bpf_redirect_neigh() to do the neigh lookup. The params->smac output is skipped together when SKIP_NEIGH is set because bpf_redirect_neigh() will figure out the smac also. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230217205515.3583372-1-martin.lau@linux.dev Stable-dep-of: 5baa0433a15e ("neighbour: fix data-races around n->output") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10net: change accept_ra_min_rtr_lft to affect all RA lifetimesPatrick Rohr1-1/+1
commit 5027d54a9c30bc7ec808360378e2b4753f053f25 upstream. accept_ra_min_rtr_lft only considered the lifetime of the default route and discarded entire RAs accordingly. This change renames accept_ra_min_rtr_lft to accept_ra_min_lft, and applies the value to individual RA sections; in particular, router lifetime, PIO preferred lifetime, and RIO lifetime. If any of those lifetimes are lower than the configured value, the specific RA section is ignored. In order for the sysctl to be useful to Android, it should really apply to all lifetimes in the RA, since that is what determines the minimum frequency at which RAs must be processed by the kernel. Android uses hardware offloads to drop RAs for a fraction of the minimum of all lifetimes present in the RA (some networks have very frequent RAs (5s) with high lifetimes (2h)). Despite this, we have encountered networks that set the router lifetime to 30s which results in very frequent CPU wakeups. Instead of disabling IPv6 (and dropping IPv6 ethertype in the WiFi firmware) entirely on such networks, it seems better to ignore the misconfigured routers while still processing RAs from other IPv6 routers on the same network (i.e. to support IoT applications). The previous implementation dropped the entire RA based on router lifetime. This turned out to be hard to expand to the other lifetimes present in the RA in a consistent manner; dropping the entire RA based on RIO/PIO lifetimes would essentially require parsing the whole thing twice. Fixes: 1671bcfd76fd ("net: add sysctl accept_ra_min_rtr_lft") Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Patrick Rohr <prohr@google.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230726230701.919212-1-prohr@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-10net: add sysctl accept_ra_min_rtr_lftPatrick Rohr1-0/+1
commit 1671bcfd76fdc0b9e65153cf759153083755fe4c upstream. This change adds a new sysctl accept_ra_min_rtr_lft to specify the minimum acceptable router lifetime in an RA. If the received RA router lifetime is less than the configured value (and not 0), the RA is ignored. This is useful for mobile devices, whose battery life can be impacted by networks that configure RAs with a short lifetime. On such networks, the device should never gain IPv6 provisioning and should attempt to drop RAs via hardware offload, if available. Signed-off-by: Patrick Rohr <prohr@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-06bpf: Clarify error expectations from bpf_clone_redirectStanislav Fomichev1-1/+3
[ Upstream commit 7cb779a6867fea00b4209bcf6de2f178a743247d ] Commit 151e887d8ff9 ("veth: Fixing transmit return status for dropped packets") exposed the fact that bpf_clone_redirect is capable of returning raw NET_XMIT_XXX return codes. This is in the conflict with its UAPI doc which says the following: "0 on success, or a negative error in case of failure." Update the UAPI to reflect the fact that bpf_clone_redirect can return positive error numbers, but don't explicitly define their meaning. Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230911194731.286342-1-sdf@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-23netfilter: ebtables: fix fortify warnings in size_entry_mwt()GONG, Ruiqi1-6/+8
[ Upstream commit a7ed3465daa240bdf01a5420f64336fee879c09d ] When compiling with gcc 13 and CONFIG_FORTIFY_SOURCE=y, the following warning appears: In function ‘fortify_memcpy_chk’, inlined from ‘size_entry_mwt’ at net/bridge/netfilter/ebtables.c:2118:2: ./include/linux/fortify-string.h:592:25: error: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning] 592 | __read_overflow2_field(q_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The compiler is complaining: memcpy(&offsets[1], &entry->watchers_offset, sizeof(offsets) - sizeof(offsets[0])); where memcpy reads beyong &entry->watchers_offset to copy {watchers,target,next}_offset altogether into offsets[]. Silence the warning by wrapping these three up via struct_group(). Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13dma-buf/sync_file: Fix docs syntaxRob Clark1-1/+1
[ Upstream commit 05d56d8079d510a2994039470f65bea85f0075ee ] Fixes the warning: include/uapi/linux/sync_file.h:77: warning: Function parameter or member 'num_fences' not described in 'sync_file_info' Fixes: 2d75c88fefb2 ("staging/android: refactor SYNC IOCTLs") Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20230724145000.125880-1-robdclark@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13media: uapi: HEVC: Add num_delta_pocs_of_ref_rps_idx fieldBenjamin Gaignard1-1/+5
[ Upstream commit ae440c5da33cdb90a109f2df2a0360c67b3fab7e ] Some drivers firmwares parse by themselves slice header and need num_delta_pocs_of_ref_rps_idx value to parse slice header short_term_ref_pic_set(). Use one of the 4 reserved bytes to store this value without changing the v4l2_ctrl_hevc_decode_params structure size and padding. This value also exist in DXVA API. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com> Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> [hverkuil: fix typo in num_delta_pocs_of_ref_rps_idx doc] Stable-dep-of: 297160d411e3 ("media: mediatek: vcodec: move core context from device to each instance") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-03block: Fix a source code comment in include/uapi/linux/blkzoned.hBart Van Assche1-5/+5
[ Upstream commit e0933b526fbfd937c4a8f4e35fcdd49f0e22d411 ] Fix the symbolic names for zone conditions in the blkzoned.h header file. Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <dlemoal@kernel.org> Fixes: 6a0cb1bc106f ("block: Implement support for zoned block devices") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230706201422.3987341-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19autofs: use flexible array in ioctl structureArnd Bergmann1-1/+1
commit e910c8e3aa02dc456e2f4c32cb479523c326b534 upstream. Commit df8fc4e934c1 ("kbuild: Enable -fstrict-flex-arrays=3") introduced a warning for the autofs_dev_ioctl structure: In function 'check_name', inlined from 'validate_dev_ioctl' at fs/autofs/dev-ioctl.c:131:9, inlined from '_autofs_dev_ioctl' at fs/autofs/dev-ioctl.c:624:8: fs/autofs/dev-ioctl.c:33:14: error: 'strchr' reading 1 or more bytes from a region of size 0 [-Werror=stringop-overread] 33 | if (!strchr(name, '/')) | ^~~~~~~~~~~~~~~~~ In file included from include/linux/auto_dev-ioctl.h:10, from fs/autofs/autofs_i.h:10, from fs/autofs/dev-ioctl.c:14: include/uapi/linux/auto_dev-ioctl.h: In function '_autofs_dev_ioctl': include/uapi/linux/auto_dev-ioctl.h:112:14: note: source object 'path' of size 0 112 | char path[0]; | ^~~~ This is easily fixed by changing the gnu 0-length array into a c99 flexible array. Since this is a uapi structure, we have to be careful about possible regressions but this one should be fine as they are equivalent here. While it would break building with ancient gcc versions that predate c99, it helps building with --std=c99 and -Wpedantic builds in user space, as well as non-gnu compilers. This means we probably also want it fixed in stable kernels. Cc: stable@vger.kernel.org Cc: Kees Cook <keescook@chromium.org> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230523081944.581710-1-arnd@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19media: videodev2.h: Fix struct v4l2_input tuner index commentMarek Vasut1-1/+1
[ Upstream commit 26ae58f65e64fa7ba61d64bae752e59e08380c6a ] VIDIOC_ENUMINPUT documentation describes the tuner field of struct v4l2_input as index: Documentation/userspace-api/media/v4l/vidioc-enuminput.rst " * - __u32 - ``tuner`` - Capture devices can have zero or more tuners (RF demodulators). When the ``type`` is set to ``V4L2_INPUT_TYPE_TUNER`` this is an RF connector and this field identifies the tuner. It corresponds to struct :c:type:`v4l2_tuner` field ``index``. For details on tuners see :ref:`tuner`. " Drivers I could find also use the 'tuner' field as an index, e.g.: drivers/media/pci/bt8xx/bttv-driver.c bttv_enum_input() drivers/media/usb/go7007/go7007-v4l2.c vidioc_enum_input() However, the UAPI comment claims this field is 'enum v4l2_tuner_type': include/uapi/linux/videodev2.h This field being 'enum v4l2_tuner_type' is unlikely as it seems to be never used that way in drivers, and documentation confirms it. It seem this comment got in accidentally in the commit which this patch fixes. Fix the UAPI comment to stop confusion. This was pointed out by Dmitry while reviewing VIDIOC_ENUMINPUT support for strace. Fixes: 6016af82eafc ("[media] v4l2: use __u32 rather than enums in ioctl() structs") Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19block: change all __u32 annotations to __be32 in affs_hardblocks.hMichael Schmitz1-34/+34
commit 95a55437dc49fb3342c82e61f5472a71c63d9ed0 upstream. The Amiga partition parser module uses signed int for partition sector address and count, which will overflow for disks larger than 1 TB. Use u64 as type for sector address and size to allow using disks up to 2 TB without LBD support, and disks larger than 2 TB with LBD. The RBD format allows to specify disk sizes up to 2^128 bytes (though native OS limitations reduce this somewhat, to max 2^68 bytes), so check for u64 overflow carefully to protect against overflowing sector_t. This bug was reported originally in 2012, and the fix was created by the RDB author, Joanne Dow <jdow@earthlink.net>. A patch had been discussed and reviewed on linux-m68k at that time but never officially submitted (now resubmitted as patch 1 of this series). Patch 3 (this series) adds additional error checking and warning messages. One of the error checks now makes use of the previously unused rdb_CylBlocks field, which causes a 'sparse' warning (cast to restricted __be32). Annotate all 32 bit fields in affs_hardblocks.h as __be32, as the on-disk format of RDB and partition blocks is always big endian. Reported-by: Martin Steigerwald <Martin@lichtvoll.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=43511 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Message-ID: <201206192146.09327.Martin@lichtvoll.de> Cc: <stable@vger.kernel.org> # 5.2 Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20230620201725.7020-3-schmitzmic@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-21net/sched: act_api: add specific EXT_WARN_MSG for tc actionHangbin Liu1-0/+1
commit 2f59823fe696caa844249a90bb3f9aeda69cfe5c upstream. In my previous commit 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message") I didn't notice the tc action use different enum with filter. So we can't use TCA_EXT_WARN_MSG directly for tc action. Let's add a TCA_ROOT_EXT_WARN_MSG for tc action specifically and put this param before going to the TCA_ACT_TAB nest. Fixes: 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-21sched: add new attr TCA_EXT_WARN_MSG to report tc extact messageHangbin Liu1-0/+1
[ Upstream commit 0349b8779cc949ad9e6aced32672ee48cf79b497 ] We will report extack message if there is an error via netlink_ack(). But if the rule is not to be exclusively executed by the hardware, extack is not passed along and offloading failures don't get logged. In commit 81c7288b170a ("sched: cls: enable verbose logging") Marcelo made cls could log verbose info for offloading failures, which helps improving Open vSwitch debuggability when using flower offloading. It would also be helpful if userspace monitor tools, like "tc monitor", could log this kind of message, as it doesn't require vswitchd log level adjusment. Let's add a new tc attributes to report the extack message so the monitor program could receive the failures. e.g. # tc monitor added chain dev enp3s0f1np1 parent ffff: chain 0 added filter dev enp3s0f1np1 ingress protocol all pref 49152 flower chain 0 handle 0x1 ct_state +trk+new not_in_hw action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 Warning: mlx5_core: matching on ct_state +new isn't supported. In this patch I only report the extack message on add/del operations. It doesn't look like we need to report the extack message on get/dump operations. Note this message not only reporte to multicast groups, it could also be reported unicast, which may affect the current usersapce tool's behaivor. Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20230113034353.2766735-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: 84ad0af0bccd ("net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-21net: ethtool: correct MAX attribute value for statsJakub Kicinski1-1/+1
[ Upstream commit 52f79609c0c5b25fddb88e85f25ce08aa7e3fb42 ] When compiling YNL generated code compiler complains about array-initializer-out-of-bounds. Turns out the MAX value for STATS_GRP uses the value for STATS. This may lead to random corruptions in user space (kernel itself doesn't use this value as it never parses stats). Fixes: f09ea6fb1272 ("ethtool: add a new command for reading standard stats") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-05ipv{4,6}/raw: fix output xfrm lookup wrt protocolNicolas Dichtel1-0/+1
[ Upstream commit 3632679d9e4f879f49949bb5b050e0de553e4739 ] With a raw socket bound to IPPROTO_RAW (ie with hdrincl enabled), the protocol field of the flow structure, build by raw_sendmsg() / rawv6_sendmsg()), is set to IPPROTO_RAW. This breaks the ipsec policy lookup when some policies are defined with a protocol in the selector. For ipv6, the sin6_port field from 'struct sockaddr_in6' could be used to specify the protocol. Just accept all values for IPPROTO_RAW socket. For ipv4, the sin_port field of 'struct sockaddr_in' could not be used without breaking backward compatibility (the value of this field was never checked). Let's add a new kind of control message, so that the userland could specify which protocol is used. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") CC: stable@vger.kernel.org Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20230522120820.1319391-1-nicolas.dichtel@6wind.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-05inet: Add IP_LOCAL_PORT_RANGE socket optionJakub Sitnicki1-0/+1
[ Upstream commit 91d0b78c5177f3e42a4d8738af8ac19c3a90d002 ] Users who want to share a single public IP address for outgoing connections between several hosts traditionally reach for SNAT. However, SNAT requires state keeping on the node(s) performing the NAT. A stateless alternative exists, where a single IP address used for egress can be shared between several hosts by partitioning the available ephemeral port range. In such a setup: 1. Each host gets assigned a disjoint range of ephemeral ports. 2. Applications open connections from the host-assigned port range. 3. Return traffic gets routed to the host based on both, the destination IP and the destination port. An application which wants to open an outgoing connection (connect) from a given port range today can choose between two solutions: 1. Manually pick the source port by bind()'ing to it before connect()'ing the socket. This approach has a couple of downsides: a) Search for a free port has to be implemented in the user-space. If the chosen 4-tuple happens to be busy, the application needs to retry from a different local port number. Detecting if 4-tuple is busy can be either easy (TCP) or hard (UDP). In TCP case, the application simply has to check if connect() returned an error (EADDRNOTAVAIL). That is assuming that the local port sharing was enabled (REUSEADDR) by all the sockets. # Assume desired local port range is 60_000-60_511 s = socket(AF_INET, SOCK_STREAM) s.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) s.bind(("192.0.2.1", 60_000)) s.connect(("1.1.1.1", 53)) # Fails only if 192.0.2.1:60000 -> 1.1.1.1:53 is busy # Application must retry with another local port In case of UDP, the network stack allows binding more than one socket to the same 4-tuple, when local port sharing is enabled (REUSEADDR). Hence detecting the conflict is much harder and involves querying sock_diag and toggling the REUSEADDR flag [1]. b) For TCP, bind()-ing to a port within the ephemeral port range means that no connecting sockets, that is those which leave it to the network stack to find a free local port at connect() time, can use the this port. IOW, the bind hash bucket tb->fastreuse will be 0 or 1, and the port will be skipped during the free port search at connect() time. 2. Isolate the app in a dedicated netns and use the use the per-netns ip_local_port_range sysctl to adjust the ephemeral port range bounds. The per-netns setting affects all sockets, so this approach can be used only if: - there is just one egress IP address, or - the desired egress port range is the same for all egress IP addresses used by the application. For TCP, this approach avoids the downsides of (1). Free port search and 4-tuple conflict detection is done by the network stack: system("sysctl -w net.ipv4.ip_local_port_range='60000 60511'") s = socket(AF_INET, SOCK_STREAM) s.setsockopt(SOL_IP, IP_BIND_ADDRESS_NO_PORT, 1) s.bind(("192.0.2.1", 0)) s.connect(("1.1.1.1", 53)) # Fails if all 4-tuples 192.0.2.1:60000-60511 -> 1.1.1.1:53 are busy For UDP this approach has limited applicability. Setting the IP_BIND_ADDRESS_NO_PORT socket option does not result in local source port being shared with other connected UDP sockets. Hence relying on the network stack to find a free source port, limits the number of outgoing UDP flows from a single IP address down to the number of available ephemeral ports. To put it another way, partitioning the ephemeral port range between hosts using the existing Linux networking API is cumbersome. To address this use case, add a new socket option at the SOL_IP level, named IP_LOCAL_PORT_RANGE. The new option can be used to clamp down the ephemeral port range for each socket individually. The option can be used only to narrow down the per-netns local port range. If the per-socket range lies outside of the per-netns range, the latter takes precedence. UAPI-wise, the low and high range bounds are passed to the kernel as a pair of u16 values in host byte order packed into a u32. This avoids pointer passing. PORT_LO = 40_000 PORT_HI = 40_511 s = socket(AF_INET, SOCK_STREAM) v = struct.pack("I", PORT_HI << 16 | PORT_LO) s.setsockopt(SOL_IP, IP_LOCAL_PORT_RANGE, v) s.bind(("127.0.0.1", 0)) s.getsockname() # Local address between ("127.0.0.1", 40_000) and ("127.0.0.1", 40_511), # if there is a free port. EADDRINUSE otherwise. [1] https://github.com/cloudflare/cloudflare-blog/blob/232b432c1d57/2022-02-connectx/connectx.py#L116 Reviewed-by: Marek Majkowski <marek@cloudflare.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 3632679d9e4f ("ipv{4,6}/raw: fix output xfrm lookup wrt protocol") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-30ASoC: Intel: Skylake: Fix declaration of enum skl_ch_cfgCezary Rojewski1-1/+2
commit 95109657471311601b98e71f03d0244f48dc61bb upstream. Constant 'C4_CHANNEL' does not exist on the firmware side. Value 0xC is reserved for 'C7_1' instead. Fixes: 04afbbbb1cba ("ASoC: Intel: Skylake: Update the topology interface structure") Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Signed-off-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com> Link: https://lore.kernel.org/r/20230519201711.4073845-4-amadeuszx.slawinski@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-24open: return EINVAL for O_DIRECTORY | O_CREATChristian Brauner1-1/+0
[ Upstream commit 43b450632676fb60e9faeddff285d9fac94a4f58 ] After a couple of years and multiple LTS releases we received a report that the behavior of O_DIRECTORY | O_CREAT changed starting with v5.7. On kernels prior to v5.7 combinations of O_DIRECTORY, O_CREAT, O_EXCL had the following semantics: (1) open("/tmp/d", O_DIRECTORY | O_CREAT) * d doesn't exist: create regular file * d exists and is a regular file: ENOTDIR * d exists and is a directory: EISDIR (2) open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL) * d doesn't exist: create regular file * d exists and is a regular file: EEXIST * d exists and is a directory: EEXIST (3) open("/tmp/d", O_DIRECTORY | O_EXCL) * d doesn't exist: ENOENT * d exists and is a regular file: ENOTDIR * d exists and is a directory: open directory On kernels since to v5.7 combinations of O_DIRECTORY, O_CREAT, O_EXCL have the following semantics: (1) open("/tmp/d", O_DIRECTORY | O_CREAT) * d doesn't exist: ENOTDIR (create regular file) * d exists and is a regular file: ENOTDIR * d exists and is a directory: EISDIR (2) open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL) * d doesn't exist: ENOTDIR (create regular file) * d exists and is a regular file: EEXIST * d exists and is a directory: EEXIST (3) open("/tmp/d", O_DIRECTORY | O_EXCL) * d doesn't exist: ENOENT * d exists and is a regular file: ENOTDIR * d exists and is a directory: open directory This is a fairly substantial semantic change that userspace didn't notice until Pedro took the time to deliberately figure out corner cases. Since no one noticed this breakage we can somewhat safely assume that O_DIRECTORY | O_CREAT combinations are likely unused. The v5.7 breakage is especially weird because while ENOTDIR is returned indicating failure a regular file is actually created. This doesn't make a lot of sense. Time was spent finding potential users of this combination. Searching on codesearch.debian.net showed that codebases often express semantical expectations about O_DIRECTORY | O_CREAT which are completely contrary to what our code has done and currently does. The expectation often is that this particular combination would create and open a directory. This suggests users who tried to use that combination would stumble upon the counterintuitive behavior no matter if pre-v5.7 or post v5.7 and quickly realize neither semantics give them what they want. For some examples see the code examples in [1] to [3] and the discussion in [4]. There are various ways to address this issue. The lazy/simple option would be to restore the pre-v5.7 behavior and to just live with that bug forever. But since there's a real chance that the O_DIRECTORY | O_CREAT quirk isn't relied upon we should try to get away with murder(ing bad semantics) first. If we need to Frankenstein pre-v5.7 behavior later so be it. So let's simply return EINVAL categorically for O_DIRECTORY | O_CREAT combinations. In addition to cleaning up the old bug this also opens up the possiblity to make that flag combination do something more intuitive in the future. Starting with this commit the following semantics apply: (1) open("/tmp/d", O_DIRECTORY | O_CREAT) * d doesn't exist: EINVAL * d exists and is a regular file: EINVAL * d exists and is a directory: EINVAL (2) open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL) * d doesn't exist: EINVAL * d exists and is a regular file: EINVAL * d exists and is a directory: EINVAL (3) open("/tmp/d", O_DIRECTORY | O_EXCL) * d doesn't exist: ENOENT * d exists and is a regular file: ENOTDIR * d exists and is a directory: open directory One additional note, O_TMPFILE is implemented as: #define __O_TMPFILE 020000000 #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY) #define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT) For older kernels it was important to return an explicit error when O_TMPFILE wasn't supported. So O_TMPFILE requires that O_DIRECTORY is raised alongside __O_TMPFILE. It also enforced that O_CREAT wasn't specified. Since O_DIRECTORY | O_CREAT could be used to create a regular allowing that combination together with __O_TMPFILE would've meant that false positives were possible, i.e., that a regular file was created instead of a O_TMPFILE. This could've been used to trick userspace into thinking it operated on a O_TMPFILE when it wasn't. Now that we block O_DIRECTORY | O_CREAT completely the check for O_CREAT in the __O_TMPFILE branch via if ((flags & O_TMPFILE_MASK) != O_TMPFILE) can be dropped. Instead we can simply check verify that O_DIRECTORY is raised via if (!(flags & O_DIRECTORY)) and explain this in two comments. As Aleksa pointed out O_PATH is unaffected by this change since it always returned EINVAL if O_CREAT was specified - with or without O_DIRECTORY. Link: https://lore.kernel.org/lkml/20230320071442.172228-1-pedro.falcato@gmail.com Link: https://sources.debian.org/src/flatpak/1.14.4-1/subprojects/libglnx/glnx-dirfd.c/?hl=324#L324 [1] Link: https://sources.debian.org/src/flatpak-builder/1.2.3-1/subprojects/libglnx/glnx-shutil.c/?hl=251#L251 [2] Link: https://sources.debian.org/src/ostree/2022.7-2/libglnx/glnx-dirfd.c/?hl=324#L324 [3] Link: https://www.openwall.com/lists/oss-security/2014/11/26/14 [4] Reported-by: Pedro Falcato <pedro.falcato@gmail.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-11btrfs: scrub: reject unsupported scrub flagsQu Wenruo1-0/+1
commit 604e6681e114d05a2e384c4d1e8ef81918037ef5 upstream. Since the introduction of scrub interface, the only flag that we support is BTRFS_SCRUB_READONLY. Thus there is no sanity checks, if there are some undefined flags passed in, we just ignore them. This is problematic if we want to introduce new scrub flags, as we have no way to determine if such flags are supported. Address the problem by introducing a check for the flags, and if unsupported flags are set, return -EOPNOTSUPP to inform the user space. This check should be backported for all supported kernels before any new scrub flags are introduced. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-11uapi/linux/const.h: prefer ISO-friendly __typeof__Kevin Brodsky1-1/+1
[ Upstream commit 31088f6f7906253ef4577f6a9b84e2d42447dba0 ] typeof is (still) a GNU extension, which means that it cannot be used when building ISO C (e.g. -std=c99). It should therefore be avoided in uapi headers in favour of the ISO-friendly __typeof__. Unfortunately this issue could not be detected by CONFIG_UAPI_HEADER_TEST=y as the __ALIGN_KERNEL() macro is not expanded in any uapi header. This matters from a userspace perspective, not a kernel one. uapi headers and their contents are expected to be usable in a variety of situations, and in particular when building ISO C applications (with -std=c99 or similar). This particular problem can be reproduced by trying to use the __ALIGN_KERNEL macro directly in application code, say: #include <linux/const.h> int align(int x, int a) { return __KERNEL_ALIGN(x, a); } and trying to build that with -std=c99. Link: https://lkml.kernel.org/r/20230411092747.3759032-1-kevin.brodsky@arm.com Fixes: a79ff731a1b2 ("netfilter: xtables: make XT_ALIGN() usable in exported headers by exporting __ALIGN_KERNEL()") Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reported-by: Ruben Ayrapetyan <ruben.ayrapetyan@arm.com> Tested-by: Ruben Ayrapetyan <ruben.ayrapetyan@arm.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Tested-by: Petr Vorel <pvorel@suse.cz> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11usb: uvc: Enumerate valid values for color matchingDaniel Scally1-0/+30
[ Upstream commit e16cab9c1596e251761d2bfb5e1467950d616963 ] The color matching descriptors defined in the UVC Specification contain 3 fields with discrete numeric values representing particular settings. Enumerate those values so that later code setting them can be more readable. Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Daniel Scally <dan.scally@ideasonboard.com> Link: https://lore.kernel.org/r/20230202114142.300858-2-dan.scally@ideasonboard.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11media: uvcvideo: Silence memcpy() run-time false positive warningsKees Cook1-1/+1
[ Upstream commit b839212988575c701aab4d3d9ca15e44c87e383c ] The memcpy() in uvc_video_decode_meta() intentionally copies across the length and flags members and into the trailing buf flexible array. Split the copy so that the compiler can better reason about (the lack of) buffer overflows here. Avoid the run-time false positive warning: memcpy: detected field-spanning write (size 12) of single field "&meta->length" at drivers/media/usb/uvc/uvc_video.c:1355 (size 1) Additionally fix a typo in the documentation for struct uvc_meta_buf. Reported-by: ionut_n2001@yahoo.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=216810 Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10vfio/type1: exclude mdevs from VFIO_UPDATE_VADDRSteve Sistare1-6/+9
commit ef3a3f6a294ba65fd906a291553935881796f8a5 upstream. Disable the VFIO_UPDATE_VADDR capability if mediated devices are present. Their kernel threads could be blocked indefinitely by a misbehaving userland while trying to pin/unpin pages while vaddrs are being updated. Do not allow groups to be added to the container while vaddr's are invalid, so we never need to block user threads from pinning, and can delete the vaddr-waiting code in a subsequent patch. Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr") Cc: stable@vger.kernel.org Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1675184289-267876-2-git-send-email-steven.sistare@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-10io_uring: Replace 0-length array with flexible arrayKees Cook1-1/+1
commit 36632d062975a9ff4410c90dd6d37922b68d0920 upstream. Zero-length arrays are deprecated[1]. Replace struct io_uring_buf_ring's "bufs" with a flexible array member. (How is the size of this array verified?) Detected with GCC 13, using -fstrict-flex-arrays=3: In function 'io_ring_buffer_select', inlined from 'io_buffer_select' at io_uring/kbuf.c:183:10: io_uring/kbuf.c:141:23: warning: array subscript 255 is outside the bounds of an interior zero-length array 'struct io_uring_buf[0]' [-Wzero-length-bounds] 141 | buf = &br->bufs[head]; | ^~~~~~~~~~~~~~~ In file included from include/linux/io_uring.h:7, from io_uring/kbuf.c:10: include/uapi/linux/io_uring.h: In function 'io_buffer_select': include/uapi/linux/io_uring.h:628:41: note: while referencing 'bufs' 628 | struct io_uring_buf bufs[0]; | ^~~~ [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays Fixes: c7fb19428d67 ("io_uring: add support for ring mapped supplied buffers") Cc: Jens Axboe <axboe@kernel.dk> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: stable@vger.kernel.org Cc: io-uring@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230105190507.gonna.131-kees@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-14drm/virtio: exbuf->fence_fd unmodified on interrupted waitRyan Neph1-0/+1
[ Upstream commit 8f20660f053cefd4693e69cfff9cf58f4f7c4929 ] An interrupted dma_fence_wait() becomes an -ERESTARTSYS returned to userspace ioctl(DRM_IOCTL_VIRTGPU_EXECBUFFER) calls, prompting to retry the ioctl(), but the passed exbuf->fence_fd has been reset to -1, making the retry attempt fail at sync_file_get_fence(). The uapi for DRM_IOCTL_VIRTGPU_EXECBUFFER is changed to retain the passed value for exbuf->fence_fd when returning anything besides a successful result from the ioctl. Fixes: 2cd7b6f08bc4 ("drm/virtio: add in/out fence support for explicit synchronization") Signed-off-by: Ryan Neph <ryanneph@chromium.org> Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230203233345.2477767-1-ryanneph@chromium.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-14uapi: add missing ip/ipv6 header dependencies for linux/stddef.hHerton R. Krzesinski2-0/+2
[ Upstream commit 03702d4d29be4e2510ec80b248dbbde4e57030d9 ] Since commit 58e0be1ef6118 ("net: use struct_group to copy ip/ipv6 header addresses"), ip and ipv6 headers started to use the __struct_group definition, which is defined at include/uapi/linux/stddef.h. However, linux/stddef.h isn't explicitly included in include/uapi/linux/{ip,ipv6}.h, which breaks build of xskxceiver bpf selftest if you install the uapi headers in the system: $ make V=1 xskxceiver -C tools/testing/selftests/bpf ... make: Entering directory '(...)/tools/testing/selftests/bpf' gcc -g -O0 -rdynamic -Wall -Werror (...) In file included from xskxceiver.c:79: /usr/include/linux/ip.h:103:9: error: expected specifier-qualifier-list before ‘__struct_group’ 103 | __struct_group(/* no tag */, addrs, /* no attrs */, | ^~~~~~~~~~~~~~ ... Include the missing <linux/stddef.h> dependency in ip.h and do the same for the ipv6.h header. Fixes: 58e0be1ef611 ("net: use struct_group to copy ip/ipv6 header addresses") Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-01netfilter: conntrack: unify established states for SCTP pathsSriram Yagnaraman2-2/+2
commit a44b7651489f26271ac784b70895e8a85d0cebf4 upstream. An SCTP endpoint can start an association through a path and tear it down over another one. That means the initial path will not see the shutdown sequence, and the conntrack entry will remain in ESTABLISHED state for 5 days. By merging the HEARTBEAT_ACKED and ESTABLISHED states into one ESTABLISHED state, there remains no difference between a primary or secondary path. The timeout for the merged ESTABLISHED state is set to 210 seconds (hb_interval * max_path_retrans + rto_max). So, even if a path doesn't see the shutdown sequence, it will expire in a reasonable amount of time. With this change in place, there is now more than one state from which we can transition to ESTABLISHED, COOKIE_ECHOED and HEARTBEAT_SENT, so handle the setting of ASSURED bit whenever a state change has happened and the new state is ESTABLISHED. Removed the check for dir==REPLY since the transition to ESTABLISHED can happen only in the reply direction. Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>