kernel/linux.git/security/selinux/include/classmap.h, branch v6.6.131

Merge tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

2022-10-04T00:51:52+00:00

Pull LSM updates from Paul Moore: "Seven patches for the LSM layer and we've got a mix of trivial and significant patches. Highlights below, starting with the smaller bits first so they don't get lost in the discussion of the larger items: - Remove some redundant NULL pointer checks in the common LSM audit code. - Ratelimit the lockdown LSM's access denial messages. With this change there is a chance that the last visible lockdown message on the console is outdated/old, but it does help preserve the initial series of lockdown denials that started the denial message flood and my gut feeling is that these might be the more valuable messages. - Open userfaultfds as readonly instead of read/write. While this code obviously lives outside the LSM, it does have a noticeable impact on the LSMs with Ondrej explaining the situation in the commit description. It is worth noting that this patch languished on the VFS list for over a year without any comments (objections or otherwise) so I took the liberty of pulling it into the LSM tree after giving fair notice. It has been in linux-next since the end of August without any noticeable problems. - Add a LSM hook for user namespace creation, with implementations for both the BPF LSM and SELinux. Even though the changes are fairly small, this is the bulk of the diffstat as we are also including BPF LSM selftests for the new hook. It's also the most contentious of the changes in this pull request with Eric Biederman NACK'ing the LSM hook multiple times during its development and discussion upstream. While I've never taken NACK's lightly, I'm sending these patches to you because it is my belief that they are of good quality, satisfy a long-standing need of users and distros, and are in keeping with the existing nature of the LSM layer and the Linux Kernel as a whole. The patches in implement a LSM hook for user namespace creation that allows for a granular approach, configurable at runtime, which enables both monitoring and control of user namespaces. The general consensus has been that this is far preferable to the other solutions that have been adopted downstream including outright removal from the kernel, disabling via system wide sysctls, or various other out-of-tree mechanisms that users have been forced to adopt since we haven't been able to provide them an upstream solution for their requests. Eric has been steadfast in his objections to this LSM hook, explaining that any restrictions on the user namespace could have significant impact on userspace. While there is the possibility of impacting userspace, it is important to note that this solution only impacts userspace when it is requested based on the runtime configuration supplied by the distro/admin/user. Frederick (the pathset author), the LSM/security community, and myself have tried to work with Eric during development of this patchset to find a mutually acceptable solution, but Eric's approach and unwillingness to engage in a meaningful way have made this impossible. I have CC'd Eric directly on this pull request so he has a chance to provide his side of the story; there have been no objections outside of Eric's" * tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lockdown: ratelimit denial messages userfaultfd: open userfaultfds with O_RDONLY selinux: Implement userns_create hook selftests/bpf: Add tests verifying bpf lsm userns_create hook bpf-lsm: Make bpf_lsm_userns_create() sleepable security, lsm: Introduce security_create_user_ns() lsm: clean up redundant NULL pointer check

selinux: implement the security_uring_cmd() LSM hook

2022-08-26T15:19:43+00:00

Add a SELinux access control for the iouring IORING_OP_URING_CMD command. This includes the addition of a new permission in the existing "io_uring" object class: "cmd". The subject of the new permission check is the domain of the process requesting access, the object is the open file which points to the device/file that is the target of the IORING_OP_URING_CMD operation. A sample policy rule is shown below: allow :io_uring { cmd }; Cc: stable@vger.kernel.org Fixes: ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd") Signed-off-by: Paul Moore

selinux: Implement userns_create hook

2022-08-16T21:44:44+00:00

Unprivileged user namespace creation is an intended feature to enable sandboxing, however this feature is often used to as an initial step to perform a privilege escalation attack. This patch implements a new user_namespace { create } access control permission to restrict which domains allow or deny user namespace creation. This is necessary for system administrators to quickly protect their systems while waiting for vulnerability patches to be applied. This permission can be used in the following way: allow domA_t domA_t : user_namespace { create }; Signed-off-by: Frederick Lawler Signed-off-by: Paul Moore

selinux: declare data arrays const

2022-05-03T19:53:49+00:00

The arrays for the policy capability names, the initial sid identifiers and the class and permission names are not changed at runtime. Declare them const to avoid accidental modification. Do not override the classmap and the initial sid list in the build time script genheaders. Check flose(3) is successful in genheaders.c, otherwise the written data might be corrupted or incomplete. Signed-off-by: Christian Göttsche [PM: manual merge due to fuzz, minor style tweaks] Signed-off-by: Paul Moore

selinux: remove the SELinux lockdown implementation

2021-09-30T14:12:33+00:00

NOTE: This patch intentionally omits any "Fixes:" metadata or stable tagging since it removes a SELinux access control check; while removing the control point is the right thing to do moving forward, removing it in stable kernels could be seen as a regression. The original SELinux lockdown implementation in 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown") used the current task's credentials as both the subject and object in the SELinux lockdown hook, selinux_lockdown(). Unfortunately that proved to be incorrect in a number of cases as the core kernel was calling the LSM lockdown hook in places where the credentials from the "current" task_struct were not the correct credentials to use in the SELinux access check. Attempts were made to resolve this by adding a credential pointer to the LSM lockdown hook as well as suggesting that the single hook be split into two: one for user tasks, one for kernel tasks; however neither approach was deemed acceptable by Linus. Faced with the prospect of either changing the subj/obj in the access check to a constant context (likely the kernel's label) or removing the SELinux lockdown check entirely, the SELinux community decided that removing the lockdown check was preferable. The supporting changes to the general LSM layer are left intact, this patch only removes the SELinux implementation. Acked-by: Ondrej Mosnacek Signed-off-by: Paul Moore

selinux: add support for the io_uring access controls

2021-09-20T02:40:32+00:00

This patch implements two new io_uring access controls, specifically support for controlling the io_uring "personalities" and IORING_SETUP_SQPOLL. Controlling the sharing of io_urings themselves is handled via the normal file/inode labeling and sharing mechanisms. The io_uring { override_creds } permission restricts which domains the subject domain can use to override it's own credentials. Granting a domain the io_uring { override_creds } permission allows it to impersonate another domain in io_uring operations. The io_uring { sqpoll } permission restricts which domains can create asynchronous io_uring polling threads. This is important from a security perspective as operations queued by this asynchronous thread inherit the credentials of the thread creator by default; if an io_uring is shared across process/domain boundaries this could result in one domain impersonating another. Controlling the creation of sqpoll threads, and the sharing of io_urings across processes, allow policy authors to restrict the ability of one domain to impersonate another via io_uring. As a quick summary, this patch adds a new object class with two permissions: io_uring { override_creds sqpoll } These permissions can be seen in the two simple policy statements below: allow domA_t domB_t : io_uring { override_creds }; allow domA_t self : io_uring { sqpoll }; Signed-off-by: Paul Moore

mctp: Add MCTP base

2021-07-29T14:06:49+00:00

Add basic Kconfig, an initial (empty) af_mctp source object, and {AF,PF}_MCTP definitions, and the required definitions for a new protocol type. Signed-off-by: Jeremy Kerr Signed-off-by: David S. Miller

selinux: add proper NULL termination to the secclass_map permissions

2021-04-22T01:43:25+00:00

This patch adds the missing NULL termination to the "bpf" and "perf_event" object class permission lists. This missing NULL termination should really only affect the tools under scripts/selinux, with the most important being genheaders.c, although in practice this has not been an issue on any of my dev/test systems. If the problem were to manifest itself it would likely result in bogus permissions added to the end of the object class; thankfully with no access control checks using these bogus permissions and no policies defining these permissions the impact would likely be limited to some noise about undefined permissions during policy load. Cc: stable@vger.kernel.org Fixes: ec27c3568a34 ("selinux: bpf: Add selinux check for eBPF syscall operations") Fixes: da97e18458fb ("perf_event: Add support for LSM and SELinux checks") Signed-off-by: Paul Moore

selinux: teach SELinux about anonymous inodes

2021-01-14T22:38:10+00:00

This change uses the anon_inodes and LSM infrastructure introduced in the previous patches to give SELinux the ability to control anonymous-inode files that are created using the new anon_inode_getfd_secure() function. A SELinux policy author detects and controls these anonymous inodes by adding a name-based type_transition rule that assigns a new security type to anonymous-inode files created in some domain. The name used for the name-based transition is the name associated with the anonymous inode for file listings --- e.g., "[userfaultfd]" or "[perf_event]". Example: type uffd_t; type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]"; allow sysadm_t uffd_t:anon_inode { create }; (The next patch in this series is necessary for making userfaultfd support this new interface. The example above is just for exposition.) Signed-off-by: Daniel Colascione Signed-off-by: Lokesh Gidra Signed-off-by: Paul Moore

capabilities: Introduce CAP_CHECKPOINT_RESTORE

2020-07-19T18:14:42+00:00

This patch introduces CAP_CHECKPOINT_RESTORE, a new capability facilitating checkpoint/restore for non-root users. Over the last years, The CRIU (Checkpoint/Restore In Userspace) team has been asked numerous times if it is possible to checkpoint/restore a process as non-root. The answer usually was: 'almost'. The main blocker to restore a process as non-root was to control the PID of the restored process. This feature available via the clone3 system call, or via /proc/sys/kernel/ns_last_pid is unfortunately guarded by CAP_SYS_ADMIN. In the past two years, requests for non-root checkpoint/restore have increased due to the following use cases: * Checkpoint/Restore in an HPC environment in combination with a resource manager distributing jobs where users are always running as non-root. There is a desire to provide a way to checkpoint and restore long running jobs. * Container migration as non-root * We have been in contact with JVM developers who are integrating CRIU into a Java VM to decrease the startup time. These checkpoint/restore applications are not meant to be running with CAP_SYS_ADMIN. We have seen the following workarounds: * Use a setuid wrapper around CRIU: See https://github.com/FredHutch/slurm-examples/blob/master/checkpointer/lib/checkpointer/checkpointer-suid.c * Use a setuid helper that writes to ns_last_pid. Unfortunately, this helper delegation technique is impossible to use with clone3, and is thus prone to races. See https://github.com/twosigma/set_ns_last_pid * Cycle through PIDs with fork() until the desired PID is reached: This has been demonstrated to work with cycling rates of 100,000 PIDs/s See https://github.com/twosigma/set_ns_last_pid * Patch out the CAP_SYS_ADMIN check from the kernel * Run the desired application in a new user and PID namespace to provide a local CAP_SYS_ADMIN for controlling PIDs. This technique has limited use in typical container environments (e.g., Kubernetes) as /proc is typically protected with read-only layers (e.g., /proc/sys) for hardening purposes. Read-only layers prevent additional /proc mounts (due to proc's SB_I_USERNS_VISIBLE property), making the use of new PID namespaces limited as certain applications need access to /proc matching their PID namespace. The introduced capability allows to: * Control PIDs when the current user is CAP_CHECKPOINT_RESTORE capable for the corresponding PID namespace via ns_last_pid/clone3. * Open files in /proc/pid/map_files when the current user is CAP_CHECKPOINT_RESTORE capable in the root namespace, useful for recovering files that are unreachable via the file system such as deleted files, or memfd files. See corresponding selftest for an example with clone3(). Signed-off-by: Adrian Reber Signed-off-by: Nicolas Viennot Reviewed-by: Serge Hallyn Acked-by: Christian Brauner Link: https://lore.kernel.org/r/20200719100418.2112740-2-areber@redhat.com Signed-off-by: Christian Brauner