summaryrefslogtreecommitdiff
path: root/security
AgeCommit message (Collapse)AuthorFilesLines
2022-10-19selinux: enable use of both GFP_KERNEL and GFP_ATOMIC in convert_context()GONG, Ruiqi3-5/+6
The following warning was triggered on a hardware environment: SELinux: Converting 162 SID table entries... BUG: sleeping function called from invalid context at __might_sleep+0x60/0x74 0x0 in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 5943, name: tar CPU: 7 PID: 5943 Comm: tar Tainted: P O 5.10.0 #1 Call trace: dump_backtrace+0x0/0x1c8 show_stack+0x18/0x28 dump_stack+0xe8/0x15c ___might_sleep+0x168/0x17c __might_sleep+0x60/0x74 __kmalloc_track_caller+0xa0/0x7dc kstrdup+0x54/0xac convert_context+0x48/0x2e4 sidtab_context_to_sid+0x1c4/0x36c security_context_to_sid_core+0x168/0x238 security_context_to_sid_default+0x14/0x24 inode_doinit_use_xattr+0x164/0x1e4 inode_doinit_with_dentry+0x1c0/0x488 selinux_d_instantiate+0x20/0x34 security_d_instantiate+0x70/0xbc d_splice_alias+0x4c/0x3c0 ext4_lookup+0x1d8/0x200 [ext4] __lookup_slow+0x12c/0x1e4 walk_component+0x100/0x200 path_lookupat+0x88/0x118 filename_lookup+0x98/0x130 user_path_at_empty+0x48/0x60 vfs_statx+0x84/0x140 vfs_fstatat+0x20/0x30 __se_sys_newfstatat+0x30/0x74 __arm64_sys_newfstatat+0x1c/0x2c el0_svc_common.constprop.0+0x100/0x184 do_el0_svc+0x1c/0x2c el0_svc+0x20/0x34 el0_sync_handler+0x80/0x17c el0_sync+0x13c/0x140 SELinux: Context system_u:object_r:pssp_rsyslog_log_t:s0:c0 is not valid (left unmapped). It was found that within a critical section of spin_lock_irqsave in sidtab_context_to_sid(), convert_context() (hooked by sidtab_convert_params.func) might cause the process to sleep via allocating memory with GFP_KERNEL, which is problematic. As Ondrej pointed out [1], convert_context()/sidtab_convert_params.func has another caller sidtab_convert_tree(), which is okay with GFP_KERNEL. Therefore, fix this problem by adding a gfp_t argument for convert_context()/sidtab_convert_params.func and pass GFP_KERNEL/_ATOMIC properly in individual callers. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20221018120111.1474581-1-gongruiqi1@huawei.com/ [1] Reported-by: Tan Ninghao <tanninghao1@huawei.com> Fixes: ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve performance") Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> [PM: wrap long BUG() output lines, tweak subject line] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-10-11Merge tag 'mm-stable-2022-10-08' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ...
2022-10-10Merge tag 'tpmdd-next-v6.1-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm updates from Jarkko Sakkinen: "Just a few bug fixes this time" * tag 'tpmdd-next-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: selftest: tpm2: Add Client.__del__() to close /dev/tpm* handle security/keys: Remove inconsistent __user annotation char: move from strlcpy with unused retval to strscpy
2022-10-10Merge tag 'powerpc-6.1-1' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Remove our now never-true definitions for pgd_huge() and p4d_leaf(). - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit. - Add support for syscall wrappers. - Add support for KFENCE on 64-bit. - Update 64-bit HV KVM to use the new guest state entry/exit accounting API. - Support execute-only memory when using the Radix MMU (P9 or later). - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests. - Updates to our linker script to move more data into read-only sections. - Allow the VDSO to be randomised on 32-bit. - Many other small features and fixes. Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Christophe Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas, Gaosheng Cui, Gustavo A. R. Silva, Haren Myneni, Hari Bathini, Jilin Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Rohan McLure, Russell Currey, Sachin Sant, Segher Boessenkool, Shrikanth Hegde, Tyrel Datwyler, Wolfram Sang, ye xingchen, and Zheng Yongjun. * tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits) KVM: PPC: Book3S HV: Fix stack frame regs marker powerpc: Don't add __powerpc_ prefix to syscall entry points powerpc/64s/interrupt: Fix stack frame regs marker powerpc/64: Fix msr_check_and_set/clear MSR[EE] race powerpc/64s/interrupt: Change must-hard-mask interrupt check from BUG to WARN powerpc/pseries: Add firmware details to the hardware description powerpc/powernv: Add opal details to the hardware description powerpc: Add device-tree model to the hardware description powerpc/64: Add logical PVR to the hardware description powerpc: Add PVR & CPU name to hardware description powerpc: Add hardware description string powerpc/configs: Enable PPC_UV in powernv_defconfig powerpc/configs: Update config files for removed/renamed symbols powerpc/mm: Fix UBSAN warning reported on hugetlb powerpc/mm: Always update max/min_low_pfn in mem_topology_setup() powerpc/mm/book3s/hash: Rename flush_tlb_pmd_range powerpc: Drops STABS_DEBUG from linker scripts powerpc/64s: Remove lost/old comment powerpc/64s: Remove old STAB comment powerpc: remove orphan systbl_chk.sh ...
2022-10-07Merge tag 'pull-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds4-5/+5
Pull vfs constification updates from Al Viro: "whack-a-mole: constifying struct path *" * tag 'pull-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ecryptfs: constify path spufs: constify path nd_jump_link(): constify path audit_init_parent(): constify path __io_setxattr(): constify path do_proc_readlink(): constify path overlayfs: constify path fs/notify: constify path may_linkat(): constify path do_sys_name_to_handle(): constify path ->getprocattr(): attribute name is const char *, TYVM...
2022-10-07Merge tag 'pull-tomoyo' of ↵Linus Torvalds4-10/+5
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc tomoyo changes from Al Viro: "A couple of assorted tomoyo patches" * tag 'pull-tomoyo' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: tomoyo: struct path it might get from LSM callers won't have NULL dentry or mnt tomoyo: use vsnprintf() properly
2022-10-05security/keys: Remove inconsistent __user annotationVincenzo Frascino1-1/+1
The declaration of keyring_read does not match the definition (security/keys/keyring.c). In this case the definition is correct because it matches what defined in "struct key_type::read" (linux/key-type.h). Fix the declaration removing the inconsistent __user annotation. Cc: David Howells <dhowells@redhat.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Paul Moore <paul@paul-moore.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2022-10-04Merge tag 'net-next-6.1' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Introduce and use a single page frag cache for allocating small skb heads, clawing back the 10-20% performance regression in UDP flood test from previous fixes. - Run packets which already went thru HW coalescing thru SW GRO. This significantly improves TCP segment coalescing and simplifies deployments as different workloads benefit from HW or SW GRO. - Shrink the size of the base zero-copy send structure. - Move TCP init under a new slow / sleepable version of DO_ONCE(). BPF: - Add BPF-specific, any-context-safe memory allocator. - Add helpers/kfuncs for PKCS#7 signature verification from BPF programs. - Define a new map type and related helpers for user space -> kernel communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF). - Allow targeting BPF iterators to loop through resources of one task/thread. - Add ability to call selected destructive functions. Expose crash_kexec() to allow BPF to trigger a kernel dump. Use CAP_SYS_BOOT check on the loading process to judge permissions. - Enable BPF to collect custom hierarchical cgroup stats efficiently by integrating with the rstat framework. - Support struct arguments for trampoline based programs. Only structs with size <= 16B and x86 are supported. - Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping sockets (instead of just TCP and UDP sockets). - Add a helper for accessing CLOCK_TAI for time sensitive network related programs. - Support accessing network tunnel metadata's flags. - Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open. - Add support for writing to Netfilter's nf_conn:mark. Protocols: - WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation (MLO) work (802.11be, WiFi 7). - vsock: improve support for SO_RCVLOWAT. - SMC: support SO_REUSEPORT. - Netlink: define and document how to use netlink in a "modern" way. Support reporting missing attributes via extended ACK. - IPSec: support collect metadata mode for xfrm interfaces. - TCPv6: send consistent autoflowlabel in SYN_RECV state and RST packets. - TCP: introduce optional per-netns connection hash table to allow better isolation between namespaces (opt-in, at the cost of memory and cache pressure). - MPTCP: support TCP_FASTOPEN_CONNECT. - Add NEXT-C-SID support in Segment Routing (SRv6) End behavior. - Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets. - Open vSwitch: - Allow specifying ifindex of new interfaces. - Allow conntrack and metering in non-initial user namespace. - TLS: support the Korean ARIA-GCM crypto algorithm. - Remove DECnet support. Driver API: - Allow selecting the conduit interface used by each port in DSA switches, at runtime. - Ethernet Power Sourcing Equipment and Power Device support. - Add tc-taprio support for queueMaxSDU parameter, i.e. setting per traffic class max frame size for time-based packet schedules. - Support PHY rate matching - adapting between differing host-side and link-side speeds. - Introduce QUSGMII PHY mode and 1000BASE-KX interface mode. - Validate OF (device tree) nodes for DSA shared ports; make phylink-related properties mandatory on DSA and CPU ports. Enforcing more uniformity should allow transitioning to phylink. - Require that flash component name used during update matches one of the components for which version is reported by info_get(). - Remove "weight" argument from driver-facing NAPI API as much as possible. It's one of those magic knobs which seemed like a good idea at the time but is too indirect to use in practice. - Support offload of TLS connections with 256 bit keys. New hardware / drivers: - Ethernet: - Microchip KSZ9896 6-port Gigabit Ethernet Switch - Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs - Analog Devices ADIN1110 and ADIN2111 industrial single pair Ethernet (10BASE-T1L) MAC+PHY. - Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP). - Ethernet SFPs / modules: - RollBall / Hilink / Turris 10G copper SFPs - HALNy GPON module - WiFi: - CYW43439 SDIO chipset (brcmfmac) - CYW89459 PCIe chipset (brcmfmac) - BCM4378 on Apple platforms (brcmfmac) Drivers: - CAN: - gs_usb: HW timestamp support - Ethernet PHYs: - lan8814: cable diagnostics - Ethernet NICs: - Intel (100G): - implement control of FCS/CRC stripping - port splitting via devlink - L2TPv3 filtering offload - nVidia/Mellanox: - tunnel offload for sub-functions - MACSec offload, w/ Extended packet number and replay window offload - significantly restructure, and optimize the AF_XDP support, align the behavior with other vendors - Huawei: - configuring DSCP map for traffic class selection - querying standard FEC statistics - querying SerDes lane number via ethtool - Marvell/Cavium: - egress priority flow control - MACSec offload - AMD/SolarFlare: - PTP over IPv6 and raw Ethernet - small / embedded: - ax88772: convert to phylink (to support SFP cages) - altera: tse: convert to phylink - ftgmac100: support fixed link - enetc: standard Ethtool counters - macb: ZynqMP SGMII dynamic configuration support - tsnep: support multi-queue and use page pool - lan743x: Rx IP & TCP checksum offload - igc: add xdp frags support to ndo_xdp_xmit - Ethernet high-speed switches: - Marvell (prestera): - support SPAN port features (traffic mirroring) - nexthop object offloading - Microchip (sparx5): - multicast forwarding offload - QoS queuing offload (tc-mqprio, tc-tbf, tc-ets) - Ethernet embedded switches: - Marvell (mv88e6xxx): - support RGMII cmode - NXP (felix): - standardized ethtool counters - Microchip (lan966x): - QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets) - traffic policing and mirroring - link aggregation / bonding offload - QUSGMII PHY mode support - Qualcomm 802.11ax WiFi (ath11k): - cold boot calibration support on WCN6750 - support to connect to a non-transmit MBSSID AP profile - enable remain-on-channel support on WCN6750 - Wake-on-WLAN support for WCN6750 - support to provide transmit power from firmware via nl80211 - support to get power save duration for each client - spectral scan support for 160 MHz - MediaTek WiFi (mt76): - WiFi-to-Ethernet bridging offload for MT7986 chips - RealTek WiFi (rtw89): - P2P support" * tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1864 commits) eth: pse: add missing static inlines once: rename _SLOW to _SLEEPABLE net: pse-pd: add regulator based PSE driver dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller ethtool: add interface to interact with Ethernet Power Equipment net: mdiobus: search for PSE nodes by parsing PHY nodes. net: mdiobus: fwnode_mdiobus_register_phy() rework error handling net: add framework to support Ethernet PSE and PDs devices dt-bindings: net: phy: add PoDL PSE property net: marvell: prestera: Propagate nh state from hw to kernel net: marvell: prestera: Add neighbour cache accounting net: marvell: prestera: add stub handler neighbour events net: marvell: prestera: Add heplers to interact with fib_notifier_info net: marvell: prestera: Add length macros for prestera_ip_addr net: marvell: prestera: add delayed wq and flush wq on deinit net: marvell: prestera: Add strict cleanup of fib arbiter net: marvell: prestera: Add cleanup of allocated fib_nodes net: marvell: prestera: Add router nexthops ABI eth: octeon: fix build after netif_napi_add() changes net/mlx5: E-Switch, Return EBUSY if can't get mode lock ...
2022-10-04Merge tag 'landlock-6.1-rc1' of ↵Linus Torvalds2-21/+21
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux Pull landlock updates from Mickaël Salaün: "Improve user help for Landlock (documentation and sample)" * tag 'landlock-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: landlock: Fix documentation style landlock: Slightly improve documentation and fix spelling samples/landlock: Print hints about ABI versions
2022-10-04Merge tag 'fs.acl.rework.prep.v6.1' of ↵Linus Torvalds1-3/+14
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull vfs acl updates from Christian Brauner: "These are general fixes and preparatory changes related to the ongoing posix acl rework. The actual rework where we build a type safe posix acl api wasn't ready for this merge window but we're hopeful for the next merge window. General fixes: - Some filesystems like 9p and cifs have to implement custom posix acl handlers because they require access to the dentry in order to set and get posix acls while the set and get inode operations currently don't. But the ntfs3 filesystem has no such requirement and thus implemented custom posix acl xattr handlers when it really didn't have to. So this pr contains patch that just implements set and get inode operations for ntfs3 and switches it to rely on the generic posix acl xattr handlers. (We would've appreciated reviews from the ntfs3 maintainers but we didn't get any. But hey, if we really broke it we'll fix it. But fstests for ntfs3 said it's fine.) - The posix_acl_fix_xattr_common() helper has been adapted so it can be used by a few more callers and avoiding open-coding the same checks over and over. Other than the two general fixes this series introduces a new helper vfs_set_acl_prepare(). The reason for this helper is so that we can mitigate one of the source that change {g,u}id values directly in the uapi struct. With the vfs_set_acl_prepare() helper we can move the idmapped mount fixup into the generic posix acl set handler. The advantage of this is that it allows us to remove the posix_acl_setxattr_idmapped_mnt() helper which so far we had to call in vfs_setxattr() to account for idmapped mounts. While semantically correct the problem with this approach was that we had to keep the value parameter of the generic vfs_setxattr() call as non-const. This is rectified in this series. Ultimately, we will get rid of all the extreme kludges and type unsafety once we have merged the posix api - hopefully during the next merge window - built solely around get and set inode operations. Which incidentally will also improve handling of posix acls in security and especially in integrity modesl. While this will come with temporarily having two inode operation for posix acls that is nothing compared to the problems we have right now and so well worth it. We'll end up with something that we can actually reason about instead of needing to write novels to explain what's going on" * tag 'fs.acl.rework.prep.v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: xattr: always us is_posix_acl_xattr() helper acl: fix the comments of posix_acl_xattr_set xattr: constify value argument in vfs_setxattr() ovl: use vfs_set_acl_prepare() acl: move idmapping handling into posix_acl_xattr_set() acl: add vfs_set_acl_prepare() acl: return EOPNOTSUPP in posix_acl_fix_xattr_common() ntfs3: rework xattr handlers and switch to POSIX ACL VFS helpers
2022-10-04Merge tag 'lsm-pr-20221003' of ↵Linus Torvalds5-14/+18
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull LSM updates from Paul Moore: "Seven patches for the LSM layer and we've got a mix of trivial and significant patches. Highlights below, starting with the smaller bits first so they don't get lost in the discussion of the larger items: - Remove some redundant NULL pointer checks in the common LSM audit code. - Ratelimit the lockdown LSM's access denial messages. With this change there is a chance that the last visible lockdown message on the console is outdated/old, but it does help preserve the initial series of lockdown denials that started the denial message flood and my gut feeling is that these might be the more valuable messages. - Open userfaultfds as readonly instead of read/write. While this code obviously lives outside the LSM, it does have a noticeable impact on the LSMs with Ondrej explaining the situation in the commit description. It is worth noting that this patch languished on the VFS list for over a year without any comments (objections or otherwise) so I took the liberty of pulling it into the LSM tree after giving fair notice. It has been in linux-next since the end of August without any noticeable problems. - Add a LSM hook for user namespace creation, with implementations for both the BPF LSM and SELinux. Even though the changes are fairly small, this is the bulk of the diffstat as we are also including BPF LSM selftests for the new hook. It's also the most contentious of the changes in this pull request with Eric Biederman NACK'ing the LSM hook multiple times during its development and discussion upstream. While I've never taken NACK's lightly, I'm sending these patches to you because it is my belief that they are of good quality, satisfy a long-standing need of users and distros, and are in keeping with the existing nature of the LSM layer and the Linux Kernel as a whole. The patches in implement a LSM hook for user namespace creation that allows for a granular approach, configurable at runtime, which enables both monitoring and control of user namespaces. The general consensus has been that this is far preferable to the other solutions that have been adopted downstream including outright removal from the kernel, disabling via system wide sysctls, or various other out-of-tree mechanisms that users have been forced to adopt since we haven't been able to provide them an upstream solution for their requests. Eric has been steadfast in his objections to this LSM hook, explaining that any restrictions on the user namespace could have significant impact on userspace. While there is the possibility of impacting userspace, it is important to note that this solution only impacts userspace when it is requested based on the runtime configuration supplied by the distro/admin/user. Frederick (the pathset author), the LSM/security community, and myself have tried to work with Eric during development of this patchset to find a mutually acceptable solution, but Eric's approach and unwillingness to engage in a meaningful way have made this impossible. I have CC'd Eric directly on this pull request so he has a chance to provide his side of the story; there have been no objections outside of Eric's" * tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lockdown: ratelimit denial messages userfaultfd: open userfaultfds with O_RDONLY selinux: Implement userns_create hook selftests/bpf: Add tests verifying bpf lsm userns_create hook bpf-lsm: Make bpf_lsm_userns_create() sleepable security, lsm: Introduce security_create_user_ns() lsm: clean up redundant NULL pointer check
2022-10-04Merge tag 'selinux-pr-20221003' of ↵Linus Torvalds6-53/+46
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull SELinux updates from Paul Moore: "Six SELinux patches, all are simple and easily understood, but a list of the highlights is below: - Use 'grep -E' instead of 'egrep' in the SELinux policy install script. Fun fact, this seems to be GregKH's *second* dedicated SELinux patch since we transitioned to git (ignoring merges, the SPDX stuff, and a trivial fs reference removal when lustre was yanked); the first was back in 2011 when selinuxfs was placed in /sys/fs/selinux. Oh, the memories ... - Convert the SELinux policy boolean values to use signed integer types throughout the SELinux kernel code. Prior to this we were using a mix of signed and unsigned integers which was probably okay in this particular case, but it is definitely not a good idea in general. - Remove a reference to the SELinux runtime disable functionality in /etc/selinux/config as we are in the process of deprecating that. See [1] for more background on this if you missed the previous notes on the deprecation. - Minor cleanups: remove unneeded variables and function parameter constification" Link: https://github.com/SELinuxProject/selinux-kernel/wiki/DEPRECATE-runtime-disable [1] * tag 'selinux-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: remove runtime disable message in the install_policy.sh script selinux: use "grep -E" instead of "egrep" selinux: remove the unneeded result variable selinux: declare read-only parameters const selinux: use int arrays for boolean values selinux: remove an unneeded variable in sel_make_class_dir_entries()
2022-10-04Merge tag 'integrity-v6.1' of ↵Linus Torvalds2-5/+9
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity updates from Mimi Zohar: "Just two bug fixes" * tag 'integrity-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: efi: Correct Macmini DMI match in uefi cert quirk ima: fix blocking of security.ima xattrs of unsupported algorithms
2022-10-04Merge tag 'Smack-for-6.1' of https://github.com/cschaufler/smack-nextLinus Torvalds2-12/+17
Pull smack updates from Casey Schaufler: "Two minor code clean-ups: one removes constants left over from the old mount API, while the other gets rid of an unneeded variable. The other change fixes a flaw in handling IPv6 labeling" * tag 'Smack-for-6.1' of https://github.com/cschaufler/smack-next: smack: cleanup obsolete mount option flags smack: lsm: remove the unneeded result variable SMACK: Add sk_clone_security LSM hook
2022-10-04Merge tag 'hardening-v6.1-rc1' of ↵Linus Torvalds3-6/+31
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kernel hardening updates from Kees Cook: "Most of the collected changes here are fixes across the tree for various hardening features (details noted below). The most notable new feature here is the addition of the memcpy() overflow warning (under CONFIG_FORTIFY_SOURCE), which is the next step on the path to killing the common class of "trivially detectable" buffer overflow conditions (i.e. on arrays with sizes known at compile time) that have resulted in many exploitable vulnerabilities over the years (e.g. BleedingTooth). This feature is expected to still have some undiscovered false positives. It's been in -next for a full development cycle and all the reported false positives have been fixed in their respective trees. All the known-bad code patterns we could find with Coccinelle are also either fixed in their respective trees or in flight. The commit message in commit 54d9469bc515 ("fortify: Add run-time WARN for cross-field memcpy()") for the feature has extensive details, but I'll repeat here that this is a warning _only_, and is not intended to actually block overflows (yet). The many patches fixing array sizes and struct members have been landing for several years now, and we're finally able to turn this on to find any remaining stragglers. Summary: Various fixes across several hardening areas: - loadpin: Fix verity target enforcement (Matthias Kaehlcke). - zero-call-used-regs: Add missing clobbers in paravirt (Bill Wendling). - CFI: clean up sparc function pointer type mismatches (Bart Van Assche). - Clang: Adjust compiler flag detection for various Clang changes (Sami Tolvanen, Kees Cook). - fortify: Fix warnings in arch-specific code in sh, ARM, and xen. Improvements to existing features: - testing: improve overflow KUnit test, introduce fortify KUnit test, add more coverage to LKDTM tests (Bart Van Assche, Kees Cook). - overflow: Relax overflow type checking for wider utility. New features: - string: Introduce strtomem() and strtomem_pad() to fill a gap in strncpy() replacement needs. - um: Enable FORTIFY_SOURCE support. - fortify: Enable run-time struct member memcpy() overflow warning" * tag 'hardening-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (27 commits) Makefile.extrawarn: Move -Wcast-function-type-strict to W=1 hardening: Remove Clang's enable flag for -ftrivial-auto-var-init=zero sparc: Unbreak the build x86/paravirt: add extra clobbers with ZERO_CALL_USED_REGS enabled x86/paravirt: clean up typos and grammaros fortify: Convert to struct vs member helpers fortify: Explicitly check bounds are compile-time constants x86/entry: Work around Clang __bdos() bug ARM: decompressor: Include .data.rel.ro.local fortify: Adjust KUnit test for modular build sh: machvec: Use char[] for section boundaries kunit/memcpy: Avoid pathological compile-time string size lib: Improve the is_signed_type() kunit test LoadPin: Require file with verity root digests to have a header dm: verity-loadpin: Only trust verity targets with enforcement LoadPin: Fix Kconfig doc about format of file with verity digests um: Enable FORTIFY_SOURCE lkdtm: Update tests for memcpy() run-time warnings fortify: Add run-time WARN for cross-field memcpy() fortify: Use SIZE_MAX instead of (size_t)-1 ...
2022-10-04security: kmsan: fix interoperability with auto-initializationAlexander Potapenko1-0/+4
Heap and stack initialization is great, but not when we are trying uses of uninitialized memory. When the kernel is built with KMSAN, having kernel memory initialization enabled may introduce false negatives. We disable CONFIG_INIT_STACK_ALL_PATTERN and CONFIG_INIT_STACK_ALL_ZERO under CONFIG_KMSAN, making it impossible to auto-initialize stack variables in KMSAN builds. We also disable CONFIG_INIT_ON_ALLOC_DEFAULT_ON and CONFIG_INIT_ON_FREE_DEFAULT_ON to prevent accidental use of heap auto-initialization. We however still let the users enable heap auto-initialization at boot-time (by setting init_on_alloc=1 or init_on_free=1), in which case a warning is printed. Link: https://lkml.kernel.org/r/20220915150417.722975-31-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski1-2/+0
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-10-03 We've added 143 non-merge commits during the last 27 day(s) which contain a total of 151 files changed, 8321 insertions(+), 1402 deletions(-). The main changes are: 1) Add kfuncs for PKCS#7 signature verification from BPF programs, from Roberto Sassu. 2) Add support for struct-based arguments for trampoline based BPF programs, from Yonghong Song. 3) Fix entry IP for kprobe-multi and trampoline probes under IBT enabled, from Jiri Olsa. 4) Batch of improvements to veristat selftest tool in particular to add CSV output, a comparison mode for CSV outputs and filtering, from Andrii Nakryiko. 5) Add preparatory changes needed for the BPF core for upcoming BPF HID support, from Benjamin Tissoires. 6) Support for direct writes to nf_conn's mark field from tc and XDP BPF program types, from Daniel Xu. 7) Initial batch of documentation improvements for BPF insn set spec, from Dave Thaler. 8) Add a new BPF_MAP_TYPE_USER_RINGBUF map which provides single-user-space-producer / single-kernel-consumer semantics for BPF ring buffer, from David Vernet. 9) Follow-up fixes to BPF allocator under RT to always use raw spinlock for the BPF hashtab's bucket lock, from Hou Tao. 10) Allow creating an iterator that loops through only the resources of one task/thread instead of all, from Kui-Feng Lee. 11) Add support for kptrs in the per-CPU arraymap, from Kumar Kartikeya Dwivedi. 12) Add a new kfunc helper for nf to set src/dst NAT IP/port in a newly allocated CT entry which is not yet inserted, from Lorenzo Bianconi. 13) Remove invalid recursion check for struct_ops for TCP congestion control BPF programs, from Martin KaFai Lau. 14) Fix W^X issue with BPF trampoline and BPF dispatcher, from Song Liu. 15) Fix percpu_counter leakage in BPF hashtab allocation error path, from Tetsuo Handa. 16) Various cleanups in BPF selftests to use preferred ASSERT_* macros, from Wang Yufen. 17) Add invocation for cgroup/connect{4,6} BPF programs for ICMP pings, from YiFei Zhu. 18) Lift blinding decision under bpf_jit_harden = 1 to bpf_capable(), from Yauheni Kaliuta. 19) Various libbpf fixes and cleanups including a libbpf NULL pointer deref, from Xin Liu. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (143 commits) net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c Documentation: bpf: Add implementation notes documentations to table of contents bpf, docs: Delete misformatted table. selftests/xsk: Fix double free bpftool: Fix error message of strerror libbpf: Fix overrun in netlink attribute iteration selftests/bpf: Fix spelling mistake "unpriviledged" -> "unprivileged" samples/bpf: Fix typo in xdp_router_ipv4 sample bpftool: Remove unused struct event_ring_info bpftool: Remove unused struct btf_attach_point bpf, docs: Add TOC and fix formatting. bpf, docs: Add Clang note about BPF_ALU bpf, docs: Move Clang notes to a separate file bpf, docs: Linux byteswap note bpf, docs: Move legacy packet instructions to a separate file selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION) bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt() bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline ... ==================== Link: https://lore.kernel.org/r/20221003194915.11847-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30efi: Correct Macmini DMI match in uefi cert quirkOrlando Chamberlain1-1/+1
It turns out Apple doesn't capitalise the "mini" in "Macmini" in DMI, which is inconsistent with other model line names. Correct the capitalisation of Macmini in the quirk for skipping loading platform certs on T2 Macs. Currently users get: ------------[ cut here ]------------ [Firmware Bug]: Page fault caused by firmware at PA: 0xffffa30640054000 WARNING: CPU: 1 PID: 8 at arch/x86/platform/efi/quirks.c:735 efi_crash_gracefully_on_page_fault+0x55/0xe0 Modules linked in: CPU: 1 PID: 8 Comm: kworker/u12:0 Not tainted 5.18.14-arch1-2-t2 #1 4535eb3fc40fd08edab32a509fbf4c9bc52d111e Hardware name: Apple Inc. Macmini8,1/Mac-7BA5B2DFE22DDD8C, BIOS 1731.120.10.0.0 (iBridge: 19.16.15071.0.0,0) 04/24/2022 Workqueue: efi_rts_wq efi_call_rts ... ---[ end trace 0000000000000000 ]--- efi: Froze efi_rts_wq and disabled EFI Runtime Services integrity: Couldn't get size: 0x8000000000000015 integrity: MODSIGN: Couldn't get UEFI db list efi: EFI Runtime Services are disabled! integrity: Couldn't get size: 0x8000000000000015 integrity: Couldn't get UEFI dbx list Fixes: 155ca952c7ca ("efi: Do not import certificates from UEFI Secure Boot for T2 Macs") Cc: stable@vger.kernel.org Cc: Aditya Garg <gargaditya08@live.com> Tested-by: Samuel Jiang <chyishian.jiang@gmail.com> Signed-off-by: Orlando Chamberlain <redecorating@protonmail.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-09-30hardening: Remove Clang's enable flag for -ftrivial-auto-var-init=zeroKees Cook1-4/+10
Now that Clang's -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang option is no longer required, remove it from the command line. Clang 16 and later will warn when it is used, which will cause Kconfig to think it can't use -ftrivial-auto-var-init=zero at all. Check for whether it is required and only use it when so. Cc: Nathan Chancellor <nathan@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: linux-kbuild@vger.kernel.org Cc: llvm@lists.linux.dev Cc: stable@vger.kernel.org Fixes: f02003c860d9 ("hardening: Avoid harmless Clang option under CONFIG_INIT_STACK_ALL_ZERO") Signed-off-by: Kees Cook <keescook@chromium.org>
2022-09-29landlock: Fix documentation styleMickaël Salaün1-20/+20
It seems that all code should use double backquotes, which is also used to convert "%" defines. Let's use an homogeneous style and remove all use of simple backquotes (which should only be used for emphasis). Cc: Günther Noack <gnoack3000@gmail.com> Cc: Paul Moore <paul@paul-moore.com> Signed-off-by: Mickaël Salaün <mic@digikod.net> Link: https://lore.kernel.org/r/20220923154207.3311629-4-mic@digikod.net
2022-09-29landlock: Slightly improve documentation and fix spellingMickaël Salaün1-1/+1
Now that we have more than one ABI version, make limitation explanation more consistent by replacing "ABI 1" with "ABI < 2". This also indicates which ABIs support such past limitation. Improve documentation consistency by not using contractions. Fix spelling in fs.c . Cc: Paul Moore <paul@paul-moore.com> Signed-off-by: Mickaël Salaün <mic@digikod.net> Reviewed-by: Günther Noack <gnoack3000@gmail.com> Link: https://lore.kernel.org/r/20220923154207.3311629-3-mic@digikod.net
2022-09-28powerpc/rtas: block error injection when locked downNathan Lynch1-0/+1
The error injection facility on pseries VMs allows corruption of arbitrary guest memory, potentially enabling a sufficiently privileged user to disable lockdown or perform other modifications of the running kernel via the rtas syscall. Block the PAPR error injection facility from being opened or called when locked down. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Acked-by: Paul Moore <paul@paul-moore.com> (LSM) Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926131643.146502-3-nathanl@linux.ibm.com
2022-09-28powerpc/pseries: block untrusted device tree changes when locked downNathan Lynch1-0/+1
The /proc/powerpc/ofdt interface allows the root user to freely alter the in-kernel device tree, enabling arbitrary physical address writes via drivers that could bind to malicious device nodes, thus making it possible to disable lockdown. Historically this interface has been used on the pseries platform to facilitate the runtime addition and removal of processor, memory, and device resources (aka Dynamic Logical Partitioning or DLPAR). Years ago, the processor and memory use cases were migrated to designs that happen to be lockdown-friendly: device tree updates are communicated directly to the kernel from firmware without passing through untrusted user space. I/O device DLPAR via the "drmgr" command in powerpc-utils remains the sole legitimate user of /proc/powerpc/ofdt, but it is already broken in lockdown since it uses /dev/mem to allocate argument buffers for the rtas syscall. So only illegitimate uses of the interface should see a behavior change when running on a locked down kernel. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Acked-by: Paul Moore <paul@paul-moore.com> (LSM) Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926131643.146502-2-nathanl@linux.ibm.com
2022-09-27smack: cleanup obsolete mount option flagsXiu Jianfeng1-9/+0
These mount option flags are obsolete since commit 12085b14a444 ("smack: switch to private smack_mnt_opts"), remove them. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2022-09-27smack: lsm: remove the unneeded result variableXu Panda1-3/+1
Return the value smk_ptrace_rule_check() directly instead of storing it in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Xu Panda <xu.panda@zte.com.cn> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2022-09-27SMACK: Add sk_clone_security LSM hookLontke Michael1-0/+16
Using smk_of_current() during sk_alloc_security hook leads in rare cases to a faulty initialization of the security context of the created socket. By adding the LSM hook sk_clone_security to SMACK this initialization fault is corrected by copying the security context of the old socket pointer to the newly cloned one. Co-authored-by: Martin Ostertag: <martin.ostertag@elektrobit.com> Signed-off-by: Lontke Michael <michael.lontke@elektrobit.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2022-09-22KEYS: Move KEY_LOOKUP_ to include/linux/key.h and define KEY_LOOKUP_ALLRoberto Sassu1-2/+0
In preparation for the patch that introduces the bpf_lookup_user_key() eBPF kfunc, move KEY_LOOKUP_ definitions to include/linux/key.h, to be able to validate the kfunc parameters. Add them to enum key_lookup_flag, so that all the current ones and the ones defined in the future are automatically exported through BTF and available to eBPF programs. Also, add KEY_LOOKUP_ALL to the enum, with the logical OR of currently defined flags as value, to facilitate checking whether a variable contains only those flags. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lore.kernel.org/r/20220920075951.929132-7-roberto.sassu@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-14selinux: remove the unneeded result variableXu Panda1-15/+9
Return the value avc_has_perm() directly instead of storing it in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Xu Panda <xu.panda@zte.com.cn> [PM: subject line tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-09-14lockdown: ratelimit denial messagesNathan Lynch1-1/+1
User space can flood the log with lockdown denial messages: [ 662.555584] Lockdown: bash: debugfs access is restricted; see man kernel_lockdown.7 [ 662.563237] Lockdown: bash: debugfs access is restricted; see man kernel_lockdown.7 [ 662.571134] Lockdown: bash: debugfs access is restricted; see man kernel_lockdown.7 [ 662.578668] Lockdown: bash: debugfs access is restricted; see man kernel_lockdown.7 [ 662.586021] Lockdown: bash: debugfs access is restricted; see man kernel_lockdown.7 [ 662.593398] Lockdown: bash: debugfs access is restricted; see man kernel_lockdown.7 Ratelimiting these shouldn't meaningfully degrade the quality of the information logged. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-09-08LoadPin: Require file with verity root digests to have a headerMatthias Kaehlcke2-2/+21
LoadPin expects the file with trusted verity root digests to be an ASCII file with one digest (hex value) per line. A pinned root could contain files that meet these format requirements, even though the hex values don't represent trusted root digests. Add a new requirement to the file format which consists in the first line containing a fixed string. This prevents attackers from feeding files with an otherwise valid format to LoadPin. Suggested-by: Sarthak Kukreti <sarthakkukreti@chromium.org> Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220906181725.1.I3f51d1bb0014e5a5951be4ad3c5ad7c7ca1dfc32@changeid
2022-09-08LoadPin: Fix Kconfig doc about format of file with verity digestsMatthias Kaehlcke1-1/+1
The doc for CONFIG_SECURITY_LOADPIN_VERITY says that the file with verity digests must contain a comma separated list of digests. That was the case at some stage of the development, but was changed during the review process to one digest per line. Update the Kconfig doc accordingly. Reported-by: Jae Hoon Kim <kimjae@chromium.org> Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Fixes: 3f805f8cc23b ("LoadPin: Enable loading from trusted dm-verity devices") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220829174557.1.I5d202d1344212a3800d9828f936df6511eb2d0d1@changeid
2022-09-03Merge tag 'landlock-6.0-rc4' of ↵Linus Torvalds1-23/+25
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux Pull landlock fix from Mickaël Salaün: "This fixes a mis-handling of the LANDLOCK_ACCESS_FS_REFER right when multiple rulesets/domains are stacked. The expected behaviour was that an additional ruleset can only restrict the set of permitted operations, but in this particular case, it was potentially possible to re-gain the LANDLOCK_ACCESS_FS_REFER right" * tag 'landlock-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: landlock: Fix file reparenting without explicit LANDLOCK_ACCESS_FS_REFER
2022-09-02landlock: Fix file reparenting without explicit LANDLOCK_ACCESS_FS_REFERMickaël Salaün1-23/+25
This change fixes a mis-handling of the LANDLOCK_ACCESS_FS_REFER right when multiple rulesets/domains are stacked. The expected behaviour was that an additional ruleset can only restrict the set of permitted operations, but in this particular case, it was potentially possible to re-gain the LANDLOCK_ACCESS_FS_REFER right. With the introduction of LANDLOCK_ACCESS_FS_REFER, we added the first globally denied-by-default access right. Indeed, this lifted an initial Landlock limitation to rename and link files, which was initially always denied when the source or the destination were different directories. This led to an inconsistent backward compatibility behavior which was only taken into account if no domain layer were using the new LANDLOCK_ACCESS_FS_REFER right. However, when restricting a thread with a new ruleset handling LANDLOCK_ACCESS_FS_REFER, all inherited parent rulesets/layers not explicitly handling LANDLOCK_ACCESS_FS_REFER would behave as if they were handling this access right and with all their rules allowing it. This means that renaming and linking files could became allowed by these parent layers, but all the other required accesses must also be granted: all layers must allow file removal or creation, and renaming and linking operations cannot lead to privilege escalation according to the Landlock policy. See detailed explanation in commit b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER"). To say it another way, this bug may lift the renaming and linking limitations of the initial Landlock version, and a same ruleset can enforce different restrictions depending on previous or next enforced ruleset (i.e. inconsistent behavior). The LANDLOCK_ACCESS_FS_REFER right cannot give access to data not already allowed, but this doesn't follow the contract of the first Landlock ABI. This fix puts back the limitation for sandboxes that didn't opt-in for this additional right. For instance, if a first ruleset allows LANDLOCK_ACCESS_FS_MAKE_REG on /dst and LANDLOCK_ACCESS_FS_REMOVE_FILE on /src, renaming /src/file to /dst/file is denied. However, without this fix, stacking a new ruleset which allows LANDLOCK_ACCESS_FS_REFER on / would now permit the sandboxed thread to rename /src/file to /dst/file . This change fixes the (absolute) rule access rights, which now always forbid LANDLOCK_ACCESS_FS_REFER except when it is explicitly allowed when creating a rule. Making all domain handle LANDLOCK_ACCESS_FS_REFER was an initial approach but there is two downsides: * it makes the code more complex because we still want to check that a rule allowing LANDLOCK_ACCESS_FS_REFER is legitimate according to the ruleset's handled access rights (i.e. ABI v1 != ABI v2); * it would not allow to identify if the user created a ruleset explicitly handling LANDLOCK_ACCESS_FS_REFER or not, which will be an issue to audit Landlock. Instead, this change adds an ACCESS_INITIALLY_DENIED list of denied-by-default rights, which (only) contains LANDLOCK_ACCESS_FS_REFER. All domains are treated as if they are also handling this list, but without modifying their fs_access_masks field. A side effect is that the errno code returned by rename(2) or link(2) *may* be changed from EXDEV to EACCES according to the enforced restrictions. Indeed, we now have the mechanic to identify if an access is denied because of a required right (e.g. LANDLOCK_ACCESS_FS_MAKE_REG, LANDLOCK_ACCESS_FS_REMOVE_FILE) or if it is denied because of missing LANDLOCK_ACCESS_FS_REFER rights. This may result in different errno codes than for the initial Landlock version, but this approach is more consistent and better for rename/link compatibility reasons, and it wasn't possible before (hence no backport to ABI v1). The layout1.rename_file test reflects this change. Add 4 layout1.refer_denied_by_default* test suites to check that the behavior of a ruleset not handling LANDLOCK_ACCESS_FS_REFER (ABI v1) is unchanged even if another layer handles LANDLOCK_ACCESS_FS_REFER (i.e. ABI v1 precedence). Make sure rule's absolute access rights are correct by testing with and without a matching path. Add test_rename() and test_exchange() helpers. Extend layout1.inval tests to check that a denied-by-default access right is not necessarily part of a domain's handled access rights. Test coverage for security/landlock is 95.3% of 599 lines according to gcc/gcov-11. Fixes: b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER") Reviewed-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Günther Noack <gnoack3000@gmail.com> Link: https://lore.kernel.org/r/20220831203840.1370732-1-mic@digikod.net Cc: stable@vger.kernel.org [mic: Constify and slightly simplify test helpers] Signed-off-by: Mickaël Salaün <mic@digikod.net>
2022-09-02->getprocattr(): attribute name is const char *, TYVM...Al Viro4-5/+5
cast of ->d_name.name to char * is completely wrong - nothing is allowed to modify its contents. Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-31Merge tag 'lsm-pr-20220829' of ↵Linus Torvalds4-1/+61
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull LSM support for IORING_OP_URING_CMD from Paul Moore: "Add SELinux and Smack controls to the io_uring IORING_OP_URING_CMD. These are necessary as without them the IORING_OP_URING_CMD remains outside the purview of the LSMs (Luis' LSM patch, Casey's Smack patch, and my SELinux patch). They have been discussed at length with the io_uring folks, and Jens has given his thumbs-up on the relevant patches (see the commit descriptions). There is one patch that is not strictly necessary, but it makes testing much easier and is very trivial: the /dev/null IORING_OP_URING_CMD patch." * tag 'lsm-pr-20220829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: Smack: Provide read control for io_uring_cmd /dev/null: add IORING_OP_URING_CMD support selinux: implement the security_uring_cmd() LSM hook lsm,io_uring: add LSM hooks for the new uring_cmd file op
2022-08-31acl: move idmapping handling into posix_acl_xattr_set()Christian Brauner1-3/+14
The uapi POSIX ACL struct passed through the value argument during setxattr() contains {g,u}id values encoded via ACL_{GROUP,USER} entries that should actually be stored in the form of k{g,u}id_t (See [1] for a long explanation of the issue.). In 0c5fd887d2bb ("acl: move idmapped mount fixup into vfs_{g,s}etxattr()") we took the mount's idmapping into account in order to let overlayfs handle POSIX ACLs on idmapped layers correctly. The fixup is currently performed directly in vfs_setxattr() which piles on top of the earlier hackiness by handling the mount's idmapping and stuff the vfs{g,u}id_t values into the uapi struct as well. While that is all correct and works fine it's just ugly. Now that we have introduced vfs_make_posix_acl() earlier move handling idmapped mounts out of vfs_setxattr() and into the POSIX ACL handler where it belongs. Note that we also need to call vfs_make_posix_acl() for EVM which interpretes POSIX ACLs during security_inode_setxattr(). Leave them a longer comment for future reference. All filesystems that support idmapped mounts via FS_ALLOW_IDMAP use the standard POSIX ACL xattr handlers and are covered by this change. This includes overlayfs which simply calls vfs_{g,s}etxattr(). The following filesystems use custom POSIX ACL xattr handlers: 9p, cifs, ecryptfs, and ntfs3 (and overlayfs but we've covered that in the paragraph above) and none of them support idmapped mounts yet. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org/ [1] Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
2022-08-31selinux: declare read-only parameters constChristian Göttsche4-29/+31
Declare ebitmap, mls_level and mls_context parameters const where they are only read from. This allows callers to supply pointers to const as arguments and increases readability. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-08-31selinux: use int arrays for boolean valuesChristian Göttsche1-5/+5
Do not cast pointers of signed integers to pointers of unsigned integers and vice versa. It should currently not be an issue since they hold SELinux boolean values which should only contain either 0's or 1's, which should have the same representation. Reported by sparse: .../selinuxfs.c:1485:30: warning: incorrect type in assignment (different signedness) .../selinuxfs.c:1485:30: expected unsigned int * .../selinuxfs.c:1485:30: got int *[addressable] values .../selinuxfs.c:1402:48: warning: incorrect type in argument 3 (different signedness) .../selinuxfs.c:1402:48: expected int *values .../selinuxfs.c:1402:48: got unsigned int *bool_pending_values Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: minor whitespace fixes, sparse output cleanup] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-08-30selinux: remove an unneeded variable in sel_make_class_dir_entries()ye xingchen1-4/+1
Return the value sel_make_perm_files() directly instead of storing it in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> [PM: subject line tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-08-26Smack: Provide read control for io_uring_cmdCasey Schaufler1-0/+32
Limit io_uring "cmd" options to files for which the caller has Smack read access. There may be cases where the cmd option may be closer to a write access than a read, but there is no way to make that determination. Cc: stable@vger.kernel.org Fixes: ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd") Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-08-26selinux: implement the security_uring_cmd() LSM hookPaul Moore2-1/+25
Add a SELinux access control for the iouring IORING_OP_URING_CMD command. This includes the addition of a new permission in the existing "io_uring" object class: "cmd". The subject of the new permission check is the domain of the process requesting access, the object is the open file which points to the device/file that is the target of the IORING_OP_URING_CMD operation. A sample policy rule is shown below: allow <domain> <file>:io_uring { cmd }; Cc: stable@vger.kernel.org Fixes: ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd") Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-08-26lsm,io_uring: add LSM hooks for the new uring_cmd file opLuis Chamberlain1-0/+4
io-uring cmd support was added through ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd"), this extended the struct file_operations to allow a new command which each subsystem can use to enable command passthrough. Add an LSM specific for the command passthrough which enables LSMs to inspect the command details. This was discussed long ago without no clear pointer for something conclusive, so this enables LSMs to at least reject this new file operation. [0] https://lkml.kernel.org/r/8adf55db-7bab-f59d-d612-ed906b948d19@schaufler-ca.com Cc: stable@vger.kernel.org Fixes: ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd") Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-08-23ima: fix blocking of security.ima xattrs of unsupported algorithmsMimi Zohar1-4/+8
Limit validating the hash algorithm to just security.ima xattr, not the security.evm xattr or any of the protected EVM security xattrs, nor posix acls. Fixes: 50f742dd9147 ("IMA: block writes of the security.ima xattr with unsupported algorithms") Reported-by: Christian Brauner <brauner@kernel.org> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-08-21tomoyo: struct path it might get from LSM callers won't have NULL dentry or mntAl Viro2-8/+3
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-21tomoyo: use vsnprintf() properlyAl Viro2-2/+2
Idiomatic way to find how much space sprintf output would take is len = snprintf(NULL, 0, ...) + 1; Once upon a time there'd been libc implementations that blew chunks on that and somebody had come up with the following "cute" trick: len = snprintf((char *) &len, 1, ...) + 1; for doing the same. However, that's unidiomatic, harder to follow *and* any such libc implementation would violate both C99 and POSIX (since 2001). IOW, this kludge is best buried along with such libc implementations, nevermind getting cargo-culted into newer code. Our vsnprintf() does not suffer that braindamage, TYVM. Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-19Merge tag 'hardening-v6.0-rc2' of ↵Linus Torvalds1-4/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - Also undef LATENT_ENTROPY_PLUGIN for per-file disabling (Andrew Donnellan) - Return EFAULT on copy_from_user() failures in LoadPin (Kees Cook) * tag 'hardening-v6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: gcc-plugins: Undefine LATENT_ENTROPY_PLUGIN when plugin disabled for a file LoadPin: Return EFAULT on copy_from_user() failures
2022-08-17selinux: Implement userns_create hookFrederick Lawler2-0/+11
Unprivileged user namespace creation is an intended feature to enable sandboxing, however this feature is often used to as an initial step to perform a privilege escalation attack. This patch implements a new user_namespace { create } access control permission to restrict which domains allow or deny user namespace creation. This is necessary for system administrators to quickly protect their systems while waiting for vulnerability patches to be applied. This permission can be used in the following way: allow domA_t domA_t : user_namespace { create }; Signed-off-by: Frederick Lawler <fred@cloudflare.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-08-17security, lsm: Introduce security_create_user_ns()Frederick Lawler1-0/+5
User namespaces are an effective tool to allow programs to run with permission without requiring the need for a program to run as root. User namespaces may also be used as a sandboxing technique. However, attackers sometimes leverage user namespaces as an initial attack vector to perform some exploit. [1,2,3] While it is not the unprivileged user namespace functionality, which causes the kernel to be exploitable, users/administrators might want to more granularly limit or at least monitor how various processes use this functionality, while vulnerable kernel subsystems are being patched. Preventing user namespace already creation comes in a few of forms in order of granularity: 1. /proc/sys/user/max_user_namespaces sysctl 2. Distro specific patch(es) 3. CONFIG_USER_NS To block a task based on its attributes, the LSM hook cred_prepare is a decent candidate for use because it provides more granular control, and it is called before create_user_ns(): cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred) Since security_prepare_creds() is meant for LSMs to copy and prepare credentials, access control is an unintended use of the hook. [4] Further, security_prepare_creds() will always return a ENOMEM if the hook returns any non-zero error code. This hook also does not handle the clone3 case which requires us to access a user space pointer to know if we're in the CLONE_NEW_USER call path which may be subject to a TOCTTOU attack. Lastly, cred_prepare is called in many call paths, and a targeted hook further limits the frequency of calls which is a beneficial outcome. Therefore introduce a new function security_create_user_ns() with an accompanying userns_create LSM hook. With the new userns_create hook, users will have more control over the observability and access control over user namespace creation. Users should expect that normal operation of user namespaces will behave as usual, and only be impacted when controls are implemented by users or administrators. This hook takes the prepared creds for LSM authors to write policy against. On success, the new namespace is applied to credentials, otherwise an error is returned. Links: 1. https://nvd.nist.gov/vuln/detail/CVE-2022-0492 2. https://nvd.nist.gov/vuln/detail/CVE-2022-25636 3. https://nvd.nist.gov/vuln/detail/CVE-2022-34918 4. https://lore.kernel.org/all/1c4b1c0d-12f6-6e9e-a6a3-cdce7418110c@schaufler-ca.com/ Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: KP Singh <kpsingh@kernel.org> Signed-off-by: Frederick Lawler <fred@cloudflare.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-08-16LoadPin: Return EFAULT on copy_from_user() failuresKees Cook1-4/+2
The copy_from_user() function returns the number of bytes remaining to be copied on a failure. Such failures should return -EFAULT to high levels. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 3f805f8cc23b ("LoadPin: Enable loading from trusted dm-verity devices") Cc: Matthias Kaehlcke <mka@chromium.org> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: linux-security-module@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2022-08-16lsm: clean up redundant NULL pointer checkXiu Jianfeng1-13/+1
The implements of {ip,tcp,udp,dccp,sctp,ipv6}_hdr(skb) guarantee that they will never return NULL, and elsewhere users don't do the check as well, so remove the check here. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> [PM: subject line tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>