kernel/linux.git/drivers/xen/privcmd.c, branch v7.0.11

xen/privcmd: fix double free via VMA splitting

2026-04-30T09:13:05+00:00

commit 24daca4fc07f3ff8cd0e3f629cd982187f48436a upstream. privcmd_vm_ops defines .close (privcmd_close), but neither .may_split nor .open. When userspace does a partial munmap() on a privcmd mapping, the kernel splits the VMA via __split_vma(). Since may_split is NULL, the split is allowed. vm_area_dup() copies vm_private_data (a pages array allocated in alloc_empty_pages()) into the new VMA without any fixup, because there is no .open callback. Both VMAs now point to the same pages array. When the unmapped portion is closed, privcmd_close() calls: - xen_unmap_domain_gfn_range() - xen_free_unpopulated_pages() - kvfree(pages) The surviving VMA still holds the dangling pointer. When it is later destroyed, the same sequence runs again, which leads to a double free. Fix this issue by adding a .may_split callback denying the VMA split. This is XSA-487 / CVE-2026-31787 Fixes: d71f513985c2 ("xen: privcmd: support autotranslated physmap guests.") Reported-by: Atharva Vartak Suggested-by: Atharva Vartak Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich Signed-off-by: Greg Kroah-Hartman

xen/privcmd: unregister xenstore notifier on module exit

2026-03-26T07:57:51+00:00

Commit 453b8fb68f36 ("xen/privcmd: restrict usage in unprivileged domU") added a xenstore notifier to defer setting the restriction target until Xenstore is ready. XEN_PRIVCMD can be built as a module, but privcmd_exit() leaves that notifier behind. Balance the notifier lifecycle by unregistering it on module exit. This is harmless even if xenstore was already ready at registration time and the notifier was never queued on the chain. Fixes: 453b8fb68f3641fe ("xen/privcmd: restrict usage in unprivileged domU") Signed-off-by: GuoHan Zhao Reviewed-by: Juergen Gross Signed-off-by: Juergen Gross Message-ID: <20260325120246.252899-1-zhaoguohan@kylinos.cn>

xen/privcmd: add boot control for restricted usage in domU

2026-03-20T11:06:01+00:00

When running in an unprivileged domU under Xen, the privcmd driver is restricted to allow only hypercalls against a target domain, for which the current domU is acting as a device model. Add a boot parameter "unrestricted" to allow all hypercalls (the hypervisor will still refuse destructive hypercalls affecting other guests). Make this new parameter effective only in case the domU wasn't started using secure boot, as otherwise hypercalls targeting the domU itself might result in violating the secure boot functionality. This is achieved by adding another lockdown reason, which can be tested to not being set when applying the "unrestricted" option. This is part of XSA-482 Signed-off-by: Juergen Gross --- V2: - new patch

xen/privcmd: restrict usage in unprivileged domU

2026-03-20T11:05:41+00:00

The Xen privcmd driver allows to issue arbitrary hypercalls from user space processes. This is normally no problem, as access is usually limited to root and the hypervisor will deny any hypercalls affecting other domains. In case the guest is booted using secure boot, however, the privcmd driver would be enabling a root user process to modify e.g. kernel memory contents, thus breaking the secure boot feature. The only known case where an unprivileged domU is really needing to use the privcmd driver is the case when it is acting as the device model for another guest. In this case all hypercalls issued via the privcmd driver will target that other guest. Fortunately the privcmd driver can already be locked down to allow only hypercalls targeting a specific domain, but this mode can be activated from user land only today. The target domain can be obtained from Xenstore, so when not running in dom0 restrict the privcmd driver to that target domain from the beginning, resolving the potential problem of breaking secure boot. This is XSA-482 Reported-by: Teddy Astie Fixes: 1c5de1939c20 ("xen: add privcmd driver") Signed-off-by: Juergen Gross --- V2: - defer reading from Xenstore if Xenstore isn't ready yet (Jan Beulich) - wait in open() if target domain isn't known yet - issue message in case no target domain found (Jan Beulich)

Convert 'alloc_obj' family to use the new default GFP_KERNEL argument

2026-02-22T01:09:51+00:00

This was done entirely with mindless brute force, using git grep -l '\

treewide: Replace kmalloc with kmalloc_obj for non-scalar types

2026-02-21T09:02:28+00:00

This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook

xen: privcmd: WQ_PERCPU added to alloc_workqueue users

2026-01-12T10:28:46+00:00

Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This continues the effort to refactor workqueue APIs, which began with the introduction of new workqueues and a new alloc_workqueue flag in: commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq") commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag") This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. Suggested-by: Tejun Heo Signed-off-by: Marco Crivellari Reviewed-by: Juergen Gross Signed-off-by: Juergen Gross Message-ID: <20251106155831.306248-3-marco.crivellari@suse.com>

xen: replace XENFEAT_auto_translated_physmap with xen_pv_domain()

2025-09-08T15:01:36+00:00

Instead of testing the XENFEAT_auto_translated_physmap feature, just use !xen_pv_domain() which is equivalent. This has the advantage that a kernel not built with CONFIG_XEN_PV will be smaller due to dead code elimination. Reviewed-by: Jason Andryuk Signed-off-by: Juergen Gross Message-ID: <20250826145608.10352-3-jgross@suse.com>

Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2024-11-18T20:24:06+00:00

Pull 'struct fd' class updates from Al Viro: "The bulk of struct fd memory safety stuff Making sure that struct fd instances are destroyed in the same scope where they'd been created, getting rid of reassignments and passing them by reference, converting to CLASS(fd{,_pos,_raw}). We are getting very close to having the memory safety of that stuff trivial to verify" * tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits) deal with the last remaing boolean uses of fd_file() css_set_fork(): switch to CLASS(fd_raw, ...) memcg_write_event_control(): switch to CLASS(fd) assorted variants of irqfd setup: convert to CLASS(fd) do_pollfd(): convert to CLASS(fd) convert do_select() convert vfs_dedupe_file_range(). convert cifs_ioctl_copychunk() convert media_request_get_by_fd() convert spu_run(2) switch spufs_calls_{get,put}() to CLASS() use convert cachestat(2) convert do_preadv()/do_pwritev() fdget(), more trivial conversions fdget(), trivial conversions privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget() o2hb_region_dev_store(): avoid goto around fdget()/fdput() introduce "fd_pos" class, convert fdget_pos() users to it. fdget_raw() users: switch to CLASS(fd_raw) convert vmsplice() to CLASS(fd) ...

assorted variants of irqfd setup: convert to CLASS(fd)

2024-11-03T06:28:07+00:00

in all of those failure exits prior to fdget() are plain returns and the only thing done after fdput() is (on failure exits) a kfree(), which can be done before fdput() just fine. NOTE: in acrn_irqfd_assign() 'fail:' failure exit is wrong for eventfd_ctx_fileget() failure (we only want fdput() there) and once we stop doing that, it doesn't need to check if eventfd is NULL or ERR_PTR(...) there. NOTE: in privcmd we move fdget() up before the allocation - more to the point, before the copy_from_user() attempt. Signed-off-by: Al Viro