summaryrefslogtreecommitdiff
path: root/security/integrity/ima/ima_main.c
AgeCommit message (Collapse)AuthorFilesLines
2024-04-10ima: re-evaluate file integrity on file metadata changeStefan Berger1-1/+13
Force a file's integrity to be re-evaluated on file metadata change by resetting both the IMA and EVM status flags. Co-developed-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2024-04-10ima: Move file-change detection variables into new structureStefan Berger1-4/+3
Move all the variables used for file change detection into a structure that can be used by IMA and EVM. Implement an inline function for storing the identification of an inode and one for detecting changes to an inode based on this new structure. Co-developed-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2024-04-10ima: Rename backing_inode to real_inodeStefan Berger1-8/+10
Rename the backing_inode variable to real_inode since it gets its value from real_inode(). Suggested-by: Amir Goldstein <amir73il@gmail.com> Co-developed-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Acked-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2024-04-08integrity: Avoid -Wflex-array-member-not-at-end warningsGustavo A. R. Silva1-2/+4
-Wflex-array-member-not-at-end is coming in GCC-14, and we are getting ready to enable it globally. There is currently an object (`hdr)` in `struct ima_max_digest_data` that contains a flexible structure (`struct ima_digest_data`): struct ima_max_digest_data { struct ima_digest_data hdr; u8 digest[HASH_MAX_DIGESTSIZE]; } __packed; So, in order to avoid ending up with a flexible-array member in the middle of a struct, we use the `__struct_group()` helper to separate the flexible array from the rest of the members in the flexible structure: struct ima_digest_data { __struct_group(ima_digest_data_hdr, hdr, __packed, ... the rest of the members ); u8 digest[]; } __packed; And similarly for `struct evm_ima_xattr_data`. With the change described above, we can now declare an object of the type of the tagged `struct ima_digest_data_hdr`, without embedding the flexible array in the middle of another struct: struct ima_max_digest_data { struct ima_digest_data_hdr hdr; u8 digest[HASH_MAX_DIGESTSIZE]; } __packed; And similarly for `struct evm_digest` and `struct evm_xattr`. We also use `container_of()` whenever we need to retrieve a pointer to the flexible structure. So, with these changes, fix the following warnings: security/integrity/evm/evm.h:64:32: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] security/integrity/evm/../integrity.h:40:35: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] security/integrity/evm/../integrity.h:68:32: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] security/integrity/ima/../integrity.h:40:35: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] security/integrity/ima/../integrity.h:68:32: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] security/integrity/integrity.h:40:35: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] security/integrity/integrity.h:68:32: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] security/integrity/platform_certs/../integrity.h:40:35: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] security/integrity/platform_certs/../integrity.h:68:32: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Link: https://github.com/KSPP/linux/issues/202 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2024-04-08ima: define an init_module critical data recordMimi Zohar1-0/+7
The init_module syscall loads an ELF image into kernel space without measuring the buffer containing the ELF image. To close this kernel module integrity gap, define a new critical-data record which includes the hash of the ELF image. Instead of including the buffer data in the IMA measurement list, include the hash of the buffer data to avoid large IMA measurement list records. The buffer data hash would be the same value as the finit_module syscall file hash. To enable measuring the init_module buffer and other critical data from boot, define "ima_policy=critical_data" on the boot command line. Since builtin policies are not persistent, a custom IMA policy must include the rule as well: measure func=CRITICAL_DATA label=modules To verify the template data hash value, first convert the buffer data hash to binary: grep "init_module" \ /sys/kernel/security/integrity/ima/ascii_runtime_measurements | \ tail -1 | cut -d' ' -f 6 | xxd -r -p | sha256sum Reported-by: Ken Goldman <kgold@linux.ibm.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2024-02-16ima: Make it independent from 'integrity' LSMRoberto Sassu1-15/+21
Make the 'ima' LSM independent from the 'integrity' LSM by introducing IMA own integrity metadata (ima_iint_cache structure, with IMA-specific fields from the integrity_iint_cache structure), and by managing it directly from the 'ima' LSM. Create ima_iint.c and introduce the same integrity metadata management functions found in iint.c (renamed with ima_). However, instead of putting metadata in an rbtree, reserve space from IMA in the inode security blob for a pointer, and introduce the ima_inode_set_iint()/ima_inode_get_iint() primitives to store/retrieve that pointer. This improves search time from logarithmic to constant. Consequently, don't include the inode pointer as field in the ima_iint_cache structure, since the association with the inode is clear. Since the inode field is missing in ima_iint_cache, pass the extra inode parameter to ima_get_verity_digest(). Prefer storing the pointer instead of the entire ima_iint_cache structure, to avoid too much memory pressure. Use the same mechanism as before, a cache named ima_iint_cache (renamed from iint_cache), to quickly allocate a new ima_iint_cache structure when requested by the IMA policy. Create the new ima_iint_cache in ima_iintcache_init(), called by init_ima_lsm(), during the initialization of the 'ima' LSM. And, register ima_inode_free_security() to free the ima_iint_cache structure, if exists. Replace integrity_iint_cache with ima_iint_cache in various places of the IMA code. Also, replace integrity_inode_get() and integrity_iint_find(), respectively with ima_inode_get() and ima_iint_find(). Finally, move the remaining IMA-specific flags to security/integrity/ima/ima.h, since they are now unnecessary in the common integrity layer. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Acked-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-02-16ima: Move IMA-Appraisal to LSM infrastructureRoberto Sassu1-0/+1
A few additional IMA hooks are needed to reset the cached appraisal status, causing the file's integrity to be re-evaluated on next access. Register these IMA-appraisal only functions separately from the rest of IMA functions, as appraisal is a separate feature not necessarily enabled in the kernel configuration. Reuse the same approach as for other IMA functions, move hardcoded calls from various places in the kernel to the LSM infrastructure. Declare the functions as static and register them as hook implementations in init_ima_appraise_lsm(), called by init_ima_lsm(). Also move the inline function ima_inode_remove_acl() from the public ima.h header to ima_appraise.c. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-02-16ima: Move to LSM infrastructureRoberto Sassu1-21/+57
Move hardcoded IMA function calls (not appraisal-specific functions) from various places in the kernel to the LSM infrastructure, by introducing a new LSM named 'ima' (at the end of the LSM list and always enabled like 'integrity'). Having IMA before EVM in the Makefile is sufficient to preserve the relative order of the new 'ima' LSM in respect to the upcoming 'evm' LSM, and thus the order of IMA and EVM function calls as when they were hardcoded. Make moved functions as static (except ima_post_key_create_or_update(), which is not in ima_main.c), and register them as implementation of the respective hooks in the new function init_ima_lsm(). Select CONFIG_SECURITY_PATH, to ensure that the path-based LSM hook path_post_mknod is always available and ima_post_path_mknod() is always executed to mark files as new, as before the move. A slight difference is that IMA and EVM functions registered for the inode_post_setattr, inode_post_removexattr, path_post_mknod, inode_post_create_tmpfile, inode_post_set_acl and inode_post_remove_acl won't be executed for private inodes. Since those inodes are supposed to be fs-internal, they should not be of interest to IMA or EVM. The S_PRIVATE flag is used for anonymous inodes, hugetlbfs, reiserfs xattrs, XFS scrub and kernel-internal tmpfs files. Conditionally register ima_post_key_create_or_update() if CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS is enabled. Also, conditionally register ima_kernel_module_request() if CONFIG_INTEGRITY_ASYMMETRIC_KEYS is enabled. Finally, add the LSM_ID_IMA case in lsm_list_modules_test.c. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Acked-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-02-16integrity: Move integrity_kernel_module_request() to IMARoberto Sassu1-0/+33
In preparation for removing the 'integrity' LSM, move integrity_kernel_module_request() to IMA, and rename it to ima_kernel_module_request(). Rewrite the function documentation, to explain better what the problem is. Compile it conditionally if CONFIG_INTEGRITY_ASYMMETRIC_KEYS is enabled, and call it from security.c (removed afterwards with the move of IMA to the LSM infrastructure). Adding this hook cannot be avoided, since IMA has no control on the flags passed to crypto_alloc_sig() in public_key_verify_signature(), and thus cannot pass CRYPTO_NOLOAD, which solved the problem for EVM hashing with commit e2861fa71641 ("evm: Don't deadlock if a crypto algorithm is unavailable"). EVM alone does not need to implement this hook, first because there is no mutex to deadlock, and second because even if it had it, there should be a recursive call. However, since verification from EVM can be initiated only by setting inode metadata, deadlock would occur if modprobe would do the same while loading a kernel module (which is unlikely). Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Acked-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-02-16ima: Align ima_post_read_file() definition with LSM infrastructureRoberto Sassu1-1/+1
Change ima_post_read_file() definition, by making "void *buf" a "char *buf", so that it can be registered as implementation of the post_read_file hook. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-02-16ima: Align ima_file_mprotect() definition with LSM infrastructureRoberto Sassu1-2/+4
Change ima_file_mprotect() definition, so that it can be registered as implementation of the file_mprotect hook. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Acked-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-10-31ima: detect changes to the backing overlay fileMimi Zohar1-1/+15
Commit 18b44bc5a672 ("ovl: Always reevaluate the file signature for IMA") forced signature re-evaulation on every file access. Instead of always re-evaluating the file's integrity, detect a change to the backing file, by comparing the cached file metadata with the backing file's metadata. Verifying just the i_version has not changed is insufficient. In addition save and compare the i_ino and s_dev as well. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Tested-by: Eric Snowberg <eric.snowberg@oracle.com> Tested-by: Raul E Rangel <rrangel@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2023-05-24IMA: use vfs_getattr_nosec to get the i_versionJeff Layton1-4/+8
IMA currently accesses the i_version out of the inode directly when it does a measurement. This is fine for most simple filesystems, but can be problematic with more complex setups (e.g. overlayfs). Make IMA instead call vfs_getattr_nosec to get this info. This allows the filesystem to determine whether and how to report the i_version, and should allow IMA to work properly with a broader class of filesystems in the future. Reported-and-Tested-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2023-02-22Merge tag 'integrity-v6.3' of ↵Linus Torvalds1-9/+29
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity update from Mimi Zohar: "One doc and one code cleanup, and two bug fixes" * tag 'integrity-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: ima: Introduce MMAP_CHECK_REQPROT hook ima: Align ima_file_mmap() parameters with mmap_file LSM hook evm: call dump_security_xattr() in all cases to remove code duplication ima: fix ima_delete_rules() kernel-doc warning ima: return IMA digest value only when IMA_COLLECTED flag is set ima: fix error handling logic when file measurement failed
2023-01-31ima: Introduce MMAP_CHECK_REQPROT hookRoberto Sassu1-5/+22
Commit 98de59bfe4b2f ("take calculation of final prot in security_mmap_file() into a helper") caused ima_file_mmap() to receive the protections requested by the application and not those applied by the kernel. After restoring the original MMAP_CHECK behavior, existing attestation servers might be broken due to not being ready to handle new entries (previously missing) in the IMA measurement list. Restore the original correct MMAP_CHECK behavior, instead of keeping the current buggy one and introducing a new hook with the correct behavior. Otherwise, there would have been the risk of IMA users not noticing the problem at all, as they would actively have to update the IMA policy, to switch to the correct behavior. Also, introduce the new MMAP_CHECK_REQPROT hook to keep the current behavior, so that IMA users could easily fix a broken attestation server, although this approach is discouraged due to potentially missing measurements. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2023-01-31ima: Align ima_file_mmap() parameters with mmap_file LSM hookRoberto Sassu1-2/+5
Commit 98de59bfe4b2f ("take calculation of final prot in security_mmap_file() into a helper") moved the code to update prot, to be the actual protections applied to the kernel, to a new helper called mmap_prot(). However, while without the helper ima_file_mmap() was getting the updated prot, with the helper ima_file_mmap() gets the original prot, which contains the protections requested by the application. A possible consequence of this change is that, if an application calls mmap() with only PROT_READ, and the kernel applies PROT_EXEC in addition, that application would have access to executable memory without having this event recorded in the IMA measurement list. This situation would occur for example if the application, before mmap(), calls the personality() system call with READ_IMPLIES_EXEC as the first argument. Align ima_file_mmap() parameters with those of the mmap_file LSM hook, so that IMA can receive both the requested prot and the final prot. Since the requested protections are stored in a new variable, and the final protections are stored in the existing variable, this effectively restores the original behavior of the MMAP_CHECK hook. Cc: stable@vger.kernel.org Fixes: 98de59bfe4b2 ("take calculation of final prot in security_mmap_file() into a helper") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2023-01-19fs: port xattr to mnt_idmapChristian Brauner1-13/+13
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-18ima: return IMA digest value only when IMA_COLLECTED flag is setMatt Bobrowski1-1/+1
The IMA_COLLECTED flag indicates whether the IMA subsystem has successfully collected a measurement for a given file object. Ensure that we return the respective digest value stored within the iint entry only when this flag has been set. Failing to check for the presence of this flag exposes consumers of this IMA API to receive potentially undesired IMA digest values when an erroneous condition has been experienced in some of the lower level IMA API code. Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2023-01-18ima: fix error handling logic when file measurement failedMatt Bobrowski1-1/+1
Restore the error handling logic so that when file measurement fails, the respective iint entry is not left with the digest data being populated with zeroes. Fixes: 54f03916fb89 ("ima: permit fsverity's file digests in the IMA measurement list") Cc: stable@vger.kernel.org # 5.19 Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-12-14Merge tag 'integrity-v6.2' of ↵Linus Torvalds1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity updates from Mimi Zohar: "Aside from the one cleanup, the other changes are bug fixes: Cleanup: - Include missing iMac Pro 2017 in list of Macs with T2 security chip Bug fixes: - Improper instantiation of "encrypted" keys with user provided data - Not handling delay in updating LSM label based IMA policy rules (-ESTALE) - IMA and integrity memory leaks on error paths - CONFIG_IMA_DEFAULT_HASH_SM3 hash algorithm renamed" * tag 'integrity-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: ima: Fix hash dependency to correct algorithm ima: Fix misuse of dereference of pointer in template_desc_init_fields() integrity: Fix memory leakage in keyring allocation error path ima: Fix memory leak in __ima_inode_hash() ima: Handle -ESTALE returned by ima_filter_rule_match() ima: Simplify ima_lsm_copy_rule ima: Fix a potential NULL pointer access in ima_restore_measurement_list efi: Add iMac Pro 2017 to uefi skip cert quirk KEYS: encrypted: fix key instantiation with user-provided data
2022-11-19lsm,fs: fix vfs_getxattr_alloc() return type and caller error pathsPaul Moore1-2/+4
The vfs_getxattr_alloc() function currently returns a ssize_t value despite the fact that it only uses int values internally for return values. Fix this by converting vfs_getxattr_alloc() to return an int type and adjust the callers as necessary. As part of these caller modifications, some of the callers are fixed to properly free the xattr value buffer on both success and failure to ensure that memory is not leaked in the failure case. Reviewed-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-11-03ima: Fix memory leak in __ima_inode_hash()Roberto Sassu1-1/+6
Commit f3cc6b25dcc5 ("ima: always measure and audit files in policy") lets measurement or audit happen even if the file digest cannot be calculated. As a result, iint->ima_hash could have been allocated despite ima_collect_measurement() returning an error. Since ima_hash belongs to a temporary inode metadata structure, declared at the beginning of __ima_inode_hash(), just add a kfree() call if ima_collect_measurement() returns an error different from -ENOMEM (in that case, ima_hash should not have been allocated). Cc: stable@vger.kernel.org Fixes: 280fe8367b0d ("ima: Always return a file measurement in ima_file_hash()") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-05-05ima: permit fsverity's file digests in the IMA measurement listMimi Zohar1-1/+1
Permit fsverity's file digest (a hash of struct fsverity_descriptor) to be included in the IMA measurement list, based on the new measurement policy rule 'digest_type=verity' option. To differentiate between a regular IMA file hash from an fsverity's file digest, use the new d-ngv2 format field included in the ima-ngv2 template. The following policy rule requires fsverity file digests and specifies the new 'ima-ngv2' template, which contains the new 'd-ngv2' field. The policy rule may be constrained, for example based on a fsuuid or LSM label. measure func=FILE_CHECK digest_type=verity template=ima-ngv2 Acked-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-04-04ima: remove redundant initialization of pointer 'file'.Colin Ian King1-1/+1
The pointer 'file' is being initialized with a value that is never read, it is being re-assigned the same value later on closer to where it is being first used. The initialization is redundant and can be removed. Cleans up clang scan build warning: security/integrity/ima/ima_main.c:434:15: warning: Value stored to 'file' during its initialization is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-03-24Merge tag 'net-next-5.18' of ↵Linus Torvalds1-18/+39
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "The sprinkling of SPI drivers is because we added a new one and Mark sent us a SPI driver interface conversion pull request. Core ---- - Introduce XDP multi-buffer support, allowing the use of XDP with jumbo frame MTUs and combination with Rx coalescing offloads (LRO). - Speed up netns dismantling (5x) and lower the memory cost a little. Remove unnecessary per-netns sockets. Scope some lists to a netns. Cut down RCU syncing. Use batch methods. Allow netdev registration to complete out of order. - Support distinguishing timestamp types (ingress vs egress) and maintaining them across packet scrubbing points (e.g. redirect). - Continue the work of annotating packet drop reasons throughout the stack. - Switch netdev error counters from an atomic to dynamically allocated per-CPU counters. - Rework a few preempt_disable(), local_irq_save() and busy waiting sections problematic on PREEMPT_RT. - Extend the ref_tracker to allow catching use-after-free bugs. BPF --- - Introduce "packing allocator" for BPF JIT images. JITed code is marked read only, and used to be allocated at page granularity. Custom allocator allows for more efficient memory use, lower iTLB pressure and prevents identity mapping huge pages from getting split. - Make use of BTF type annotations (e.g. __user, __percpu) to enforce the correct probe read access method, add appropriate helpers. - Convert the BPF preload to use light skeleton and drop the user-mode-driver dependency. - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling its use as a packet generator. - Allow local storage memory to be allocated with GFP_KERNEL if called from a hook allowed to sleep. - Introduce fprobe (multi kprobe) to speed up mass attachment (arch bits to come later). - Add unstable conntrack lookup helpers for BPF by using the BPF kfunc infra. - Allow cgroup BPF progs to return custom errors to user space. - Add support for AF_UNIX iterator batching. - Allow iterator programs to use sleepable helpers. - Support JIT of add, and, or, xor and xchg atomic ops on arm64. - Add BTFGen support to bpftool which allows to use CO-RE in kernels without BTF info. - Large number of libbpf API improvements, cleanups and deprecations. Protocols --------- - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev. - Adjust TSO packet sizes based on min_rtt, allowing very low latency links (data centers) to always send full-sized TSO super-frames. - Make IPv6 flow label changes (AKA hash rethink) more configurable, via sysctl and setsockopt. Distinguish between server and client behavior. - VxLAN support to "collect metadata" devices to terminate only configured VNIs. This is similar to VLAN filtering in the bridge. - Support inserting IPv6 IOAM information to a fraction of frames. - Add protocol attribute to IP addresses to allow identifying where given address comes from (kernel-generated, DHCP etc.) - Support setting socket and IPv6 options via cmsg on ping6 sockets. - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS. Define dscp_t and stop taking ECN bits into account in fib-rules. - Add support for locked bridge ports (for 802.1X). - tun: support NAPI for packets received from batched XDP buffs, doubling the performance in some scenarios. - IPv6 extension header handling in Open vSwitch. - Support IPv6 control message load balancing in bonding, prevent neighbor solicitation and advertisement from using the wrong port. Support NS/NA monitor selection similar to existing ARP monitor. - SMC - improve performance with TCP_CORK and sendfile() - support auto-corking - support TCP_NODELAY - MCTP (Management Component Transport Protocol) - add user space tag control interface - I2C binding driver (as specified by DMTF DSP0237) - Multi-BSSID beacon handling in AP mode for WiFi. - Bluetooth: - handle MSFT Monitor Device Event - add MGMT Adv Monitor Device Found/Lost events - Multi-Path TCP: - add support for the SO_SNDTIMEO socket option - lots of selftest cleanups and improvements - Increase the max PDU size in CAN ISOTP to 64 kB. Driver API ---------- - Add HW counters for SW netdevs, a mechanism for devices which offload packet forwarding to report packet statistics back to software interfaces such as tunnels. - Select the default NIC queue count as a fraction of number of physical CPU cores, instead of hard-coding to 8. - Expose devlink instance locks to drivers. Allow device layer of drivers to use that lock directly instead of creating their own which always runs into ordering issues in devlink callbacks. - Add header/data split indication to guide user space enabling of TCP zero-copy Rx. - Allow configuring completion queue event size. - Refactor page_pool to enable fragmenting after allocation. - Add allocation and page reuse statistics to page_pool. - Improve Multiple Spanning Trees support in the bridge to allow reuse of topologies across VLANs, saving HW resources in switches. - DSA (Distributed Switch Architecture): - replay and offload of host VLAN entries - offload of static and local FDB entries on LAG interfaces - FDB isolation and unicast filtering New hardware / drivers ---------------------- - Ethernet: - LAN937x T1 PHYs - Davicom DM9051 SPI NIC driver - Realtek RTL8367S, RTL8367RB-VB switch and MDIO - Microchip ksz8563 switches - Netronome NFP3800 SmartNICs - Fungible SmartNICs - MediaTek MT8195 switches - WiFi: - mt76: MediaTek mt7916 - mt76: MediaTek mt7921u USB adapters - brcmfmac: Broadcom BCM43454/6 - Mobile: - iosm: Intel M.2 7360 WWAN card Drivers ------- - Convert many drivers to the new phylink API built for split PCS designs but also simplifying other cases. - Intel Ethernet NICs: - add TTY for GNSS module for E810T device - improve AF_XDP performance - GTP-C and GTP-U filter offload - QinQ VLAN support - Mellanox Ethernet NICs (mlx5): - support xdp->data_meta - multi-buffer XDP - offload tc push_eth and pop_eth actions - Netronome Ethernet NICs (nfp): - flow-independent tc action hardware offload (police / meter) - AF_XDP - Other Ethernet NICs: - at803x: fiber and SFP support - xgmac: mdio: preamble suppression and custom MDC frequencies - r8169: enable ASPM L1.2 if system vendor flags it as safe - macb/gem: ZynqMP SGMII - hns3: add TX push mode - dpaa2-eth: software TSO - lan743x: multi-queue, mdio, SGMII, PTP - axienet: NAPI and GRO support - Mellanox Ethernet switches (mlxsw): - source and dest IP address rewrites - RJ45 ports - Marvell Ethernet switches (prestera): - basic routing offload - multi-chain TC ACL offload - NXP embedded Ethernet switches (ocelot & felix): - PTP over UDP with the ocelot-8021q DSA tagging protocol - basic QoS classification on Felix DSA switch using dcbnl - port mirroring for ocelot switches - Microchip high-speed industrial Ethernet (sparx5): - offloading of bridge port flooding flags - PTP Hardware Clock - Other embedded switches: - lan966x: PTP Hardward Clock - qca8k: mdio read/write operations via crafted Ethernet packets - Qualcomm 802.11ax WiFi (ath11k): - add LDPC FEC type and 802.11ax High Efficiency data in radiotap - enable RX PPDU stats in monitor co-exist mode - Intel WiFi (iwlwifi): - UHB TAS enablement via BIOS - band disablement via BIOS - channel switch offload - 32 Rx AMPDU sessions in newer devices - MediaTek WiFi (mt76): - background radar detection - thermal management improvements on mt7915 - SAR support for more mt76 platforms - MBSSID and 6 GHz band on mt7915 - RealTek WiFi: - rtw89: AP mode - rtw89: 160 MHz channels and 6 GHz band - rtw89: hardware scan - Bluetooth: - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS) - Microchip CAN (mcp251xfd): - multiple RX-FIFOs and runtime configurable RX/TX rings - internal PLL, runtime PM handling simplification - improve chip detection and error handling after wakeup" * tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits) llc: fix netdevice reference leaks in llc_ui_bind() drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool ice: don't allow to run ice_send_event_to_aux() in atomic ctx ice: fix 'scheduling while atomic' on aux critical err interrupt net/sched: fix incorrect vlan_push_eth dest field net: bridge: mst: Restrict info size queries to bridge ports net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init() drivers: net: xgene: Fix regression in CRC stripping net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT net: dsa: fix missing host-filtered multicast addresses net/mlx5e: Fix build warning, detected write beyond size of field iwlwifi: mvm: Don't fail if PPAG isn't supported selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" netdevice: add missing dm_private kdoc net: bridge: mst: prevent NULL deref in br_mst_info_size() selftests: forwarding: Use same VRF for port and VLAN upper ...
2022-03-11ima: Always return a file measurement in ima_file_hash()Roberto Sassu1-13/+33
__ima_inode_hash() checks if a digest has been already calculated by looking for the integrity_iint_cache structure associated to the passed inode. Users of ima_file_hash() (e.g. eBPF) might be interested in obtaining the information without having to setup an IMA policy so that the digest is always available at the time they call this function. In addition, they likely expect the digest to be fresh, e.g. recalculated by IMA after a file write. Although getting the digest from the bprm_committed_creds hook (as in the eBPF test) ensures that the digest is fresh, as the IMA hook is executed before that hook, this is not always the case (e.g. for the mmap_file hook). Call ima_collect_measurement() in __ima_inode_hash(), if the file descriptor is available (passed by ima_file_hash()) and the digest is not available/not fresh, and store the file measurement in a temporary integrity_iint_cache structure. This change does not cause memory usage increase, due to using the temporary integrity_iint_cache structure, and due to freeing the ima_digest_data structure inside integrity_iint_cache before exiting from __ima_inode_hash(). For compatibility reasons, the behavior of ima_inode_hash() remains unchanged. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Link: https://lore.kernel.org/bpf/20220302111404.193900-3-roberto.sassu@huawei.com
2022-03-11ima: Fix documentation-related warnings in ima_main.cRoberto Sassu1-5/+6
Fix the following warnings in ima_main.c, displayed with W=n make argument: security/integrity/ima/ima_main.c:432: warning: Function parameter or member 'vma' not described in 'ima_file_mprotect' security/integrity/ima/ima_main.c:636: warning: Function parameter or member 'inode' not described in 'ima_post_create_tmpfile' security/integrity/ima/ima_main.c:636: warning: Excess function parameter 'file' description in 'ima_post_create_tmpfile' security/integrity/ima/ima_main.c:843: warning: Function parameter or member 'load_id' not described in 'ima_post_load_data' security/integrity/ima/ima_main.c:843: warning: Excess function parameter 'id' description in 'ima_post_load_data' Also, fix some style issues in the description of ima_post_create_tmpfile() and ima_post_path_mknod(). Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Link: https://lore.kernel.org/bpf/20220302111404.193900-2-roberto.sassu@huawei.com
2022-02-15ima: define ima_max_digest_data struct without a flexible array variableMimi Zohar1-4/+1
To support larger hash digests in the 'iint' cache, instead of defining the 'digest' field as the maximum digest size, the 'digest' field was defined as a flexible array variable. The "ima_digest_data" struct was wrapped inside a local structure with the maximum digest size. But before adding the record to the iint cache, memory for the exact digest size was dynamically allocated. The original reason for defining the 'digest' field as a flexible array variable is still valid for the 'iint' cache use case. Instead of wrapping the 'ima_digest_data' struct in a local structure define 'ima_max_digest_data' struct. Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-02-15ima: rename IMA_ACTION_FLAGS to IMA_NONACTION_FLAGSMimi Zohar1-1/+1
Simple policy rule options, such as fowner, uid, or euid, can be checked immediately, while other policy rule options, such as requiring a file signature, need to be deferred. The 'flags' field in the integrity_iint_cache struct contains the policy action', 'subaction', and non action/subaction. action: measure/measured, appraise/appraised, (collect)/collected, audit/audited subaction: appraise status for each hook (e.g. file, mmap, bprm, read, creds) non action/subaction: deferred policy rule options and state Rename the IMA_ACTION_FLAGS to IMA_NONACTION_FLAGS. Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-02-15ima: Fix trivial typos in the commentsAustin Kim1-1/+1
There are a few minor typos in the comments. Fix these. Signed-off-by: Austin Kim <austindh.kim@gmail.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-11-23lsm: security_task_getsecid_subj() -> security_current_getsecid_subj()Paul Moore1-7/+7
The security_task_getsecid_subj() LSM hook invites misuse by allowing callers to specify a task even though the hook is only safe when the current task is referenced. Fix this by removing the task_struct argument to the hook, requiring LSM implementations to use the current task. While we are changing the hook declaration we also rename the function to security_current_getsecid_subj() in an effort to reinforce that the hook captures the subjective credentials of the current task and not an arbitrary task on the system. Reviewed-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-09-02Merge tag 'integrity-v5.15' of ↵Linus Torvalds1-25/+64
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity subsystem updates from Mimi Zohar: - Limit the allowed hash algorithms when writing security.ima xattrs or verifying them, based on the IMA policy and the configured hash algorithms. - Return the calculated "critical data" measurement hash and size to avoid code duplication. (Preparatory change for a proposed LSM.) - and a single patch to address a compiler warning. * tag 'integrity-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: IMA: reject unknown hash algorithms in ima_get_hash_algo IMA: prevent SETXATTR_CHECK policy rules with unavailable algorithms IMA: introduce a new policy option func=SETXATTR_CHECK IMA: add a policy option to restrict xattr hash algorithms on appraisal IMA: add support to restrict the hash algorithms used for file appraisal IMA: block writes of the security.ima xattr with unsupported algorithms IMA: remove the dependency on CRYPTO_MD5 ima: Add digest and digest_len params to the functions to measure a buffer ima: Return int in the functions to measure a buffer ima: Introduce ima_get_current_hash_algo() IMA: remove -Wmissing-prototypes warning
2021-08-17IMA: introduce a new policy option func=SETXATTR_CHECKTHOBY Simon1-1/+1
While users can restrict the accepted hash algorithms for the security.ima xattr file signature when appraising said file, users cannot restrict the algorithms that can be set on that attribute: any algorithm built in the kernel is accepted on a write. Define a new value for the ima policy option 'func' that restricts globally the hash algorithms accepted when writing the security.ima xattr. When a policy contains a rule of the form appraise func=SETXATTR_CHECK appraise_algos=sha256,sha384,sha512 only values corresponding to one of these three digest algorithms will be accepted for writing the security.ima xattr. Attempting to write the attribute using another algorithm (or "free-form" data) will be denied with an audit log message. In the absence of such a policy rule, the default is still to only accept hash algorithms built in the kernel (with all the limitations that entails). Signed-off-by: THOBY Simon <Simon.THOBY@viveris.fr> Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-08-17IMA: add support to restrict the hash algorithms used for file appraisalTHOBY Simon1-3/+15
The kernel accepts any hash algorithm as a value for the security.ima xattr. Users may wish to restrict the accepted algorithms to only support strong cryptographic ones. Provide the plumbing to restrict the permitted set of hash algorithms used for verifying file hashes and signatures stored in security.ima xattr. Signed-off-by: THOBY Simon <Simon.THOBY@viveris.fr> Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-08-10dm ima: measure data on table loadTushar Sugandhi1-0/+1
DM configures a block device with various target specific attributes passed to it as a table. DM loads the table, and calls each target’s respective constructors with the attributes as input parameters. Some of these attributes are critical to ensure the device meets certain security bar. Thus, IMA should measure these attributes, to ensure they are not tampered with, during the lifetime of the device. So that the external services can have high confidence in the configuration of the block-devices on a given system. Some devices may have large tables. And a given device may change its state (table-load, suspend, resume, rename, remove, table-clear etc.) many times. Measuring these attributes each time when the device changes its state will significantly increase the size of the IMA logs. Further, once configured, these attributes are not expected to change unless a new table is loaded, or a device is removed and recreated. Therefore the clear-text of the attributes should only be measured during table load, and the hash of the active/inactive table should be measured for the remaining device state changes. Export IMA function ima_measure_critical_data() to allow measurement of DM device parameters, as well as target specific attributes, during table load. Compute the hash of the inactive table and store it for measurements during future state change. If a load is called multiple times, update the inactive table hash with the hash of the latest populated table. So that the correct inactive table hash is measured when the device transitions to different states like resume, remove, rename, etc. Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> # leak fix Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-07-23ima: Add digest and digest_len params to the functions to measure a bufferRoberto Sassu1-10/+26
This patch performs the final modification necessary to pass the buffer measurement to callers, so that they provide a functionality similar to ima_file_hash(). It adds the 'digest' and 'digest_len' parameters to ima_measure_critical_data() and process_buffer_measurement(). These functions calculate the digest even if there is no suitable rule in the IMA policy and, in this case, they simply return 1 before generating a new measurement entry. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-07-23ima: Return int in the functions to measure a bufferRoberto Sassu1-17/+23
ima_measure_critical_data() and process_buffer_measurement() currently don't return a result as, unlike appraisal-related functions, the result is not used by callers to deny an operation. Measurement-related functions instead rely on the audit subsystem to notify the system administrator when an error occurs. However, ima_measure_critical_data() and process_buffer_measurement() are a special case, as these are the only functions that can return a buffer measurement (for files, there is ima_file_hash()). In a subsequent patch, they will be modified to return the calculated digest. In preparation to return the result of the digest calculation, this patch modifies the return type from void to int, and returns 0 if the buffer has been successfully measured, a negative value otherwise. Given that the result of the measurement is still not necessary, this patch does not modify the behavior of existing callers by processing the returned value. For those, the return value is ignored. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> Acked-by: Paul Moore <paul@paul-moore.com> (for the SELinux bits) Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-07-23ima: Introduce ima_get_current_hash_algo()Roberto Sassu1-1/+6
Buffer measurements, unlike file measurements, are not accessible after the measurement is done, as buffers are not suitable for use with the integrity_iint_cache structure (there is no index, for files it is the inode number). In the subsequent patches, the measurement (digest) will be returned directly by the functions that perform the buffer measurement, ima_measure_critical_data() and process_buffer_measurement(). A caller of those functions also needs to know the algorithm used to calculate the digest. Instead of adding the algorithm as a new parameter to the functions, this patch provides it separately with the new function ima_get_current_hash_algo(). Since the hash algorithm does not change after the IMA setup phase, there is no risk of races (obtaining a digest calculated with a different algorithm than the one returned). Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> [zohar@linux.ibm.com: annotate ima_hash_algo as __ro_after_init] Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-06-08ima: Pass NULL instead of 0 to ima_get_action() in ima_file_mprotect()Roberto Sassu1-1/+1
This patch fixes the sparse warning: sparse: warning: Using plain integer as NULL pointer Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-05-02Merge tag 'integrity-v5.13' of ↵Linus Torvalds1-1/+8
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull IMA updates from Mimi Zohar: "In addition to loading the kernel module signing key onto the builtin keyring, load it onto the IMA keyring as well. Also six trivial changes and bug fixes" * tag 'integrity-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: ima: ensure IMA_APPRAISE_MODSIG has necessary dependencies ima: Fix fall-through warnings for Clang integrity: Add declarations to init_once void arguments. ima: Fix function name error in comment. ima: enable loading of build time generated key on .ima keyring ima: enable signing of modules with build time generated key keys: cleanup build time module signing keys ima: Fix the error code for restoring the PCR value ima: without an IMA policy loaded, return quickly
2021-04-20ima: Fix fall-through warnings for ClangGustavo A. R. Silva1-0/+1
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple warnings by explicitly adding multiple break statements instead of just letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-04-09ima: Fix function name error in comment.Jiele Zhao1-1/+1
The original function name was ima_path_check(). The policy parsing still supports PATH_CHECK. Commit 9bbb6cad0173 ("ima: rename ima_path_check to ima_file_check") renamed the function to ima_file_check(), but missed modifying the function name in the comment. Fixes: 9bbb6cad0173 ("ima: rename ima_path_check to ima_file_check"). Signed-off-by: Jiele Zhao <unclexiaole@gmail.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-03-22lsm: separate security_task_getsecid() into subjective and objective variantsPaul Moore1-7/+7
Of the three LSMs that implement the security_task_getsecid() LSM hook, all three LSMs provide the task's objective security credentials. This turns out to be unfortunate as most of the hook's callers seem to expect the task's subjective credentials, although a small handful of callers do correctly expect the objective credentials. This patch is the first step towards fixing the problem: it splits the existing security_task_getsecid() hook into two variants, one for the subjective creds, one for the objective creds. void security_task_getsecid_subj(struct task_struct *p, u32 *secid); void security_task_getsecid_obj(struct task_struct *p, u32 *secid); While this patch does fix all of the callers to use the correct variant, in order to keep this patch focused on the callers and to ease review, the LSMs continue to use the same implementation for both hooks. The net effect is that this patch should not change the behavior of the kernel in any way, it will be up to the latter LSM specific patches in this series to change the hook implementations and return the correct credentials. Acked-by: Mimi Zohar <zohar@linux.ibm.com> (IMA) Acked-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-03-22ima: without an IMA policy loaded, return quicklyMimi Zohar1-0/+6
Unless an IMA policy is loaded, don't bother checking for an appraise policy rule. Return immediately. Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-02-24Merge tag 'idmapped-mounts-v5.12' of ↵Linus Torvalds1-15/+25
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. https://systemd.io/HOME_DIRECTORY/ - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: https://github.com/containerd/containerd/pull/4734 - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf https://fosdem.org/2021/schedule/event/containers_idmap/ This comes with an extensive xfstests suite covering both ext4 and xfs: https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8 In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at https://github.com/brauner/mount-idmapped that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...
2021-01-24ima: handle idmapped mountsChristian Brauner1-13/+24
IMA does sometimes access the inode's i_uid and compares it against the rules' fowner. Enable IMA to handle idmapped mounts by passing down the mount's user namespace. We simply make use of the helpers we introduced before. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-27-christian.brauner@ubuntu.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-15IMA: extend critical data hook to limit the measurement based on a labelTushar Sugandhi1-3/+5
The IMA hook ima_measure_critical_data() does not support a way to specify the source of the critical data provider. Thus, the data measurement cannot be constrained based on the data source label in the IMA policy. Extend the IMA hook ima_measure_critical_data() to support passing the data source label as an input parameter, so that the policy rule can be used to limit the measurements based on the label. Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-01-15IMA: define a hook to measure kernel integrity critical dataTushar Sugandhi1-0/+24
IMA provides capabilities to measure file and buffer data. However, various data structures, policies, and states stored in kernel memory also impact the integrity of the system. Several kernel subsystems contain such integrity critical data. These kernel subsystems help protect the integrity of the system. Currently, IMA does not provide a generic function for measuring kernel integrity critical data. Define ima_measure_critical_data, a new IMA hook, to measure kernel integrity critical data. Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-01-15IMA: add support to measure buffer data hashTushar Sugandhi1-5/+24
The original IMA buffer data measurement sizes were small (e.g. boot command line), but the new buffer data measurement use cases have data sizes that are a lot larger. Just as IMA measures the file data hash, not the file data, IMA should similarly support the option for measuring buffer data hash. Introduce a boolean parameter to support measuring buffer data hash, which would be much smaller, instead of the buffer itself. Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-01-15IMA: generalize keyring specific measurement constructsTushar Sugandhi1-3/+3
IMA functions such as ima_match_keyring(), process_buffer_measurement(), ima_match_policy() etc. handle data specific to keyrings. Currently, these constructs are not generic to handle any func specific data. This makes it harder to extend them without code duplication. Refactor the keyring specific measurement constructs to be generic and reusable in other measurement scenarios. Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>