summaryrefslogtreecommitdiff
path: root/arch/um/include
AgeCommit message (Collapse)AuthorFilesLines
2021-07-09Merge tag 'for-linus-5.14-rc1' of ↵Linus Torvalds20-31/+336
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - Support for optimized routines based on the host CPU - Support for PCI via virtio - Various fixes * tag 'for-linus-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: remove unneeded semicolon in um_arch.c um: Remove the repeated declaration um: fix error return code in winch_tramp() um: fix error return code in slip_open() um: Fix stack pointer alignment um: implement flush_cache_vmap/flush_cache_vunmap um: add a UML specific futex implementation um: enable the use of optimized xor routines in UML um: Add support for host CPU flags and alignment um: allow not setting extra rpaths in the linux binary um: virtio/pci: enable suspend/resume um: add PCI over virtio emulation driver um: irqs: allow invoking time-travel handler multiple times um: time-travel/signals: fix ndelay() in interrupt um: expose time-travel mode to userspace side um: export signals_enabled directly um: remove unused smp_sigio_handler() declaration lib: add iomem emulation (logic_iomem) um: allow disabling NO_IOMEM
2021-07-08mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *Aneesh Kumar K.V1-1/+1
No functional change in this patch. [aneesh.kumar@linux.ibm.com: fix] Link: https://lkml.kernel.org/r/87wnqtnb60.fsf@linux.ibm.com [sfr@canb.auug.org.au: another fix] Link: https://lkml.kernel.org/r/20210619134410.89559-1-aneesh.kumar@linux.ibm.com Link: https://lkml.kernel.org/r/20210615110859.320299-1-aneesh.kumar@linux.ibm.com Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01mm/thp: define default pmd_pgtable()Anshuman Khandual1-1/+0
Currently most platforms define pmd_pgtable() as pmd_page() duplicating the same code all over. Instead just define a default value i.e pmd_page() for pmd_pgtable() and let platforms override when required via <asm/pgtable.h>. All the existing platform that override pmd_pgtable() have been moved into their respective <asm/pgtable.h> header in order to precede before the new generic definition. This makes it much cleaner with reduced code. Link: https://lkml.kernel.org/r/1623646133-20306-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Nick Hu <nickhu@andestech.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Brian Cain <bcain@codeaurora.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Stafford Horne <shorne@gmail.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01mm: define default value for FIRST_USER_ADDRESSAnshuman Khandual2-2/+0
Currently most platforms define FIRST_USER_ADDRESS as 0UL duplication the same code all over. Instead just define a generic default value (i.e 0UL) for FIRST_USER_ADDRESS and let the platforms override when required. This makes it much cleaner with reduced code. The default FIRST_USER_ADDRESS here would be skipped in <linux/pgtable.h> when the given platform overrides its value via <asm/pgtable.h>. Link: https://lkml.kernel.org/r/1620615725-24623-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Guo Ren <guoren@kernel.org> [csky] Acked-by: Stafford Horne <shorne@gmail.com> [openrisc] Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> [RISC-V] Cc: Richard Henderson <rth@twiddle.net> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Brian Cain <bcain@codeaurora.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Stafford Horne <shorne@gmail.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-17um: Remove the repeated declarationShaokun Zhang1-1/+0
Function 'os_flush_stdout' is declared twice, so remove the repeated declaration. Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-06-17um: implement flush_cache_vmap/flush_cache_vunmapJohannes Berg2-1/+10
vmalloc() heavy workloads in UML are extremely slow, due to flushing the entire kernel VM space (flush_tlb_kernel_vm()) on the first segfault. Implement flush_cache_vmap() to avoid that, and while at it also add flush_cache_vunmap() since it's trivial. This speeds up my vmalloc() heavy test of copying files out from /sys/kernel/debug/gcov/ by 30x (from 30s to 1s.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-06-17um: add a UML specific futex implementationAnton Ivanov2-1/+14
The generic asm futex implementation emulates atomic access to memory by doing a get_user followed by put_user. These translate to two mapping operations on UML with paging enabled in the meantime. This, in turn may end up changing interrupts, invoking the signal loop, etc. This replaces the generic implementation by a mapping followed by an operation on the mapped segment. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-06-17um: enable the use of optimized xor routines in UMLAnton Ivanov2-1/+36
This patch enables the use of optimized xor routines from the x86 tree as well as the necessary fpu api shims so they can work on UML. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-06-17um: Add support for host CPU flags and alignmentAnton Ivanov3-0/+168
1. Reflect host cpu flags into the UML instance so they can be used to select the correct implementations for xor, crypto, etc. 2. Reflect host cache alignment into UML instance. This is important when running 32 bit on a 64 bit host as 32 bit by default aligns to 32 while the actual alignment should be 64. Ditto for some Xeons which align at 128. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-06-17um: virtio/pci: enable suspend/resumeJohannes Berg1-0/+13
The UM virtual PCI devices currently cannot be suspended properly since the virtio driver already disables VQs well before the PCI bus's suspend_noirq wants to complete the transition by writing to PCI config space. After trying around for a long time with moving the devices on the DPM list, trying to create dependencies between them, etc. I gave up and instead added UML specific cross-driver API that lets the virt-pci code enable not suspending/resuming VQs for its devices. This then allows the PCI bus suspend_noirq to still talk to the device, and suspend/resume works properly. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-06-17um: add PCI over virtio emulation driverJohannes Berg5-2/+54
To support testing of PCI/PCIe drivers in UML, add a PCI bus support driver. This driver uses virtio, which in UML is really just vhost-user, to talk to devices, and adds the devices to the virtual PCI bus in the system. Since virtio already allows DMA/bus mastering this really isn't all that hard, of course we need the logic_iomem infrastructure that was added by a previous patch. The protocol to talk to the device is has a few fairly simple messages for reading to/writing from config and IO spaces, and messages for the device to send the various interrupts (INT#, MSI/MSI-X and while suspended PME#). Note that currently no offical virtio device ID is assigned for this protocol, as a consequence this patch requires defining it in the Kconfig, with a default that makes the driver refuse to work at all. Finally, in order to add support for MSI/MSI-X interrupts, some small changes are needed in the UML IRQ code, it needs to have more interrupts, changing NR_IRQS from 64 to 128 if this driver is enabled, but not actually use them for anything so that the generic IRQ domain/MSI infrastructure can allocate IRQ numbers. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-06-17um: time-travel/signals: fix ndelay() in interruptJohannes Berg2-0/+4
We should be able to ndelay() from any context, even from an interrupt context! However, this is broken (not functionally, but locking-wise) in time-travel because we'll get into the time-travel code and enable interrupts to handle messages on other time-travel aware subsystems (only virtio for now). Luckily, I've already reworked the time-travel aware signal (interrupt) delivery for suspend/resume to have a time travel handler, which runs directly in the context of the signal and not from the Linux interrupt. In order to fix this time-travel issue then, we need to do a few things: 1) rework the signal handling code to call time-travel handlers (only) if interrupts are disabled but signals aren't blocked, instead of marking it only pending there. This is needed to not deadlock other communication. 2) rework time-travel to not enable interrupts while it's waiting for a message; 3) rework time-travel to not (just) disable interrupts but rather block signals at a lower level while it needs them disabled for communicating with the controller. Finally, since now we can actually spend even virtual time in interrupts-disabled sections, the delay warning when we deliver a time-travel delayed interrupt is no longer valid, things can (and should) now get delayed. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-06-17um: expose time-travel mode to userspace sideJohannes Berg2-11/+23
This will be necessary in the userspace side to fix the signal/interrupt handling in time-travel=ext mode, which is the next patch. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-06-17um: export signals_enabled directlyJohannes Berg3-13/+12
Use signals_enabled instead of always jumping through a function call to read it, there's not much point in that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-06-17um: remove unused smp_sigio_handler() declarationJohannes Berg1-1/+0
This function doesn't exist, remove its declaration. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-06-17um: allow disabling NO_IOMEMJohannes Berg1-0/+2
Adjust the kconfig a little to allow disabling NO_IOMEM in UML. To make an "allyesconfig" with CONFIG_NO_IOMEM=n build, adjust a few Kconfig things elsewhere and add dummy asm/fb.h and asm/vga.h files. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-04-16um: Fix W=1 missing-include-dirs warningsRandy Dunlap1-0/+1
Currently when using "W=1" with UML builds, there are over 700 warnings like so: CC arch/um/drivers/stderr_console.o cc1: warning: ./arch/um/include/uapi: No such file or directory [-Wmissing-include-dirs] but arch/um/ does not have include/uapi/ at all, so add that subdir and put one Kbuild file into it (since git does not track empty subdirs). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Michal Marek <michal.lkml@markovi.net> Cc: linux-kbuild@vger.kernel.org Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: linux-um@lists.infradead.org Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-04-16um: pgtable.h: Fix W=1 warning for empty body in 'do' statementRandy Dunlap1-1/+1
Use the common kernel style to eliminate a warning: ./arch/um/include/asm/pgtable.h:305:47: warning: suggest braces around empty body in ‘do’ statement [-Wempty-body] #define update_mmu_cache(vma,address,ptep) do ; while (0) ^ mm/filemap.c:3212:3: note: in expansion of macro ‘update_mmu_cache’ update_mmu_cache(vma, addr, vmf->pte); ^~~~~~~~~~~~~~~~ Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: linux-um@lists.infradead.org Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-25Merge tag 'x86-entry-2021-02-24' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 irq entry updates from Thomas Gleixner: "The irq stack switching was moved out of the ASM entry code in course of the entry code consolidation. It ended up being suboptimal in various ways. This reworks the X86 irq stack handling: - Make the stack switching inline so the stackpointer manipulation is not longer at an easy to find place. - Get rid of the unnecessary indirect call. - Avoid the double stack switching in interrupt return and reuse the interrupt stack for softirq handling. - A objtool fix for CONFIG_FRAME_POINTER=y builds where it got confused about the stack pointer manipulation" * tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix stack-swizzle for FRAME_POINTER=y um: Enforce the usage of asm-generic/softirq_stack.h x86/softirq/64: Inline do_softirq_own_stack() softirq: Move do_softirq_own_stack() to generic asm header softirq: Move __ARCH_HAS_DO_SOFTIRQ to Kconfig x86: Select CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK x86/softirq: Remove indirection in do_softirq_own_stack() x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcall x86/entry: Convert device interrupts to inline stack switching x86/entry: Convert system vectors to irq stack macro x86/irq: Provide macro for inlining irq stack switching x86/apic: Split out spurious handling code x86/irq/64: Adjust the per CPU irq stack pointer by 8 x86/irq: Sanitize irq stack tracking x86/entry: Fix instrumentation annotation
2021-02-16um: Enforce the usage of asm-generic/softirq_stack.hThomas Gleixner1-0/+1
The recent rework of the X86 irq stack switching mechanism broke UM as UM pulls in the X86 specific variant of softirq_stack.h. Enforce the usage of the asm-generic variant. Fixes: 72f40a2823d6 ("x86/softirq/64: Inline do_softirq_own_stack()") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Weinberger <richard@nod.at>
2021-02-12um: irq.h: include <asm-generic/irq.h>Johannes Berg1-0/+1
This will get the (no-op) definition of irq_canonicalize() which some code might want. We could define that ourselves, but it seems like we'd likely want generic extensions in the future, if any. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: io.h: include <linux/types.h>Johannes Berg1-0/+1
This may be needed for size_t if something doesn't get it included elsewhere before including <asm/io.h>, so add the include. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: add a pseudo RTCJohannes Berg1-0/+11
Add a pseudo RTC that simply is able to send an alarm signal waking up the system at a given time in the future. Since apparently timerfd_create() FDs don't support SIGIO, we use the sigio-creating helper thread, which just learned to do suspend/resume properly in the previous patch. For time-travel mode, OTOH, just add an event at the specified time in the future, and that's already sufficient to wake up the system at that point in time since suspend will just be in an "endless wait". For s2idle support also call pm_system_wakeup(). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: remove process stub VMAJohannes Berg3-29/+4
This mostly reverts the old commit 3963333fe676 ("uml: cover stubs with a VMA") which had added a VMA to the existing PTEs. However, there's no real reason to have the PTEs in the first place and the VMA cannot be 'fixed' in place, which leads to bugs that userspace could try to unmap them and be forcefully killed, or such. Also, there's a bit of an ugly hole in userspace's address space. Simplify all this: just install the stub code/page at the top of the (inner) address space, i.e. put it just above TASK_SIZE. The pages are simply hard-coded to be mapped in the userspace process we use to implement an mm context, and they're out of reach of the inner mmap/munmap/mprotect etc. since they're above TASK_SIZE. Getting rid of the VMA also makes vma_merge() no longer hit one of the VM_WARN_ON()s there because we installed a VMA while the code assumes the stack VMA is the first one. It also removes a lockdep warning about mmap_sem usage since we no longer have uml_setup_stubs() and thus no longer need to do any manipulation that would require mmap_sem in activate_mm(). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: rework userspace stubs to not hard-code stub locationJohannes Berg2-12/+10
The userspace stacks mostly have a stack (and in the case of the syscall stub we can just set their stack pointer) that points to the location of the stub data page already. Rework the stubs to use the stack pointer to derive the start of the data page, rather than requiring it to be hard-coded. In the clone stub, also integrate the int3 into the stack remap, since we really must not use the stack while we remap it. This prepares for putting the stub at a variable location that's not part of the normal address space of the userspace processes running inside the UML machine. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: separate child and parent errors in clone stubJohannes Berg1-1/+1
If the two are mixed up, then it looks as though the parent returned an error if the child failed (before) the mmap(), and then the resulting process never gets killed. Fix this by splitting the child and parent errors, reporting and using them appropriately. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: defer killing userspace on page table update failuresJohannes Berg1-0/+1
In some cases we can get to fix_range_common() with mmap_sem held, and in others we get there without it being held. For example, we get there with it held from sys_mprotect(), and without it held from fork_handler(). Avoid any issues in this and simply defer killing the task until it runs the next time. Do it on the mm so that another task that shares the same mm can't continue running afterwards. Cc: stable@vger.kernel.org Fixes: 468f65976a8d ("um: Fix hung task in fix_range_common()") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12mm: Remove arch_remap() and mm-arch-hooks.hChristophe Leroy1-1/+0
powerpc was the last provider of arch_remap() and the last user of mm-arch-hooks.h. Since commit 526a9c4a7234 ("powerpc/vdso: Provide vdso_remap()"), arch_remap() hence mm-arch-hooks.h are not used anymore. Remove them. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: time-travel: rework interrupt handling in ext modeJohannes Berg2-0/+66
In external time-travel mode, where time is controlled via the controller application socket, interrupt handling is a little tricky. For example on virtio, the following happens: * we receive a message (that requires an ACK) on the vhost-user socket * we add a time-travel event to handle the interrupt (this causes communication on the time socket) * we ACK the original vhost-user message * we then handle the interrupt once the event is triggered This protocol ensures that the sender of the interrupt only continues to run in the simulation when the time-travel event has been added. So far, this was only done in the virtio driver, but it was actually wrong, because only virtqueue interrupts were handled this way, and config change interrupts were handled immediately. Additionally, the messages were actually handled in the real Linux interrupt handler, but Linux interrupt handlers are part of the simulation and shouldn't run while there's no time event. To really do this properly and only handle all kinds of interrupts in the time-travel event when we are scheduled to run in the simulation, rework this to plug in to the lower interrupt layers in UML directly: Add a um_request_irq_tt() function that let's a time-travel aware driver request an interrupt with an additional timetravel_handler() that is called outside of the context of the simulation, to handle the message only. It then adds an event to the time-travel calendar if necessary, and no "real" Linux code runs outside of the time simulation. This also hooks in with suspend/resume properly now, since this new timetravel_handler() can run while Linux is suspended and interrupts are disabled, and decide to wake up (or not) the system based on the message it received. Importantly in this case, it ACKs the message before the system even resumes and interrupts are re-enabled, thus allowing the simulation to progress properly. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-01-27Revert "um: support some of ARCH_HAS_SET_MEMORY"Johannes Berg2-4/+0
This reverts commit 963285b0b47a ("um: support some of ARCH_HAS_SET_MEMORY"), as it turns out that it's not only not working (due to um never using the protection bits in the page tables) but also corrupts the page tables if used on a non-vmalloc page, since um never allocates proper page tables for the 'physmem' in the first place. Fixing all this will take more effort, so for now revert it. Reported-by: Benjamin Berg <benjamin@sipsolutions.net> Fixes: 963285b0b47a ("um: support some of ARCH_HAS_SET_MEMORY") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-01-27Revert "um: allocate a guard page to helper threads"Johannes Berg1-1/+1
This reverts commit ef4459a6da09 ("um: allocate a guard page to helper threads"), it's broken in multiple ways: 1) the free no longer matches the alloc; and 2) more importantly, the set_memory_ro() causes allocation of page tables for the normal memory that doesn't have any, and that later causes corruption and crashes (usually but not always in vfree()). We could fix the first bug and use vmalloc() to work around the second, but set_memory_ro() actually doesn't do anything either so I'll just revert that as well. Reported-by: Benjamin Berg <benjamin@sipsolutions.net> Fixes: ef4459a6da09 ("um: allocate a guard page to helper threads") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-01-27um: return error from ioremap()Johannes Berg1-1/+1
Back a few years ago, ioremap() was added to UML so that we'd not break the build for everything all the time. However, for some reason, v1 of the patch got applied, rather than the v2 that returned NULL, which was discussed here: https://lore.kernel.org/lkml/1495726955-27497-1-git-send-email-logang@deltatee.com/ Fix that now. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-18Merge tag 'for-linus-5.11-rc1' of ↵Linus Torvalds9-35/+47
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - IRQ handling cleanups - Support for suspend - Various fixes for UML specific drivers: ubd, vector, xterm * tag 'for-linus-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (32 commits) um: Fix build w/o CONFIG_PM_SLEEP um: time-travel: Correct time event IRQ delivery um: irq/sigio: Support suspend/resume handling of workaround IRQs um: time-travel: Actually apply "free-until" optimisation um: chan_xterm: Fix fd leak um: tty: Fix handling of close in tty lines um: Monitor error events in IRQ controller um: allocate a guard page to helper threads um: support some of ARCH_HAS_SET_MEMORY um: time-travel: avoid multiple identical propagations um: Fetch registers only for signals which need them um: Support suspend to RAM um: Allow PM with suspend-to-idle um: time: Fix read_persistent_clock64() in time-travel um: Simplify os_idle_sleep() and sleep longer um: Simplify IRQ handling code um: Remove IRQ_NONE type um: irq: Reduce irq_reg allocation um: irq: Clean up and rename struct irq_fd um: Clean up alarm IRQ chip name ...
2020-12-16Merge tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+2
Pull TIF_NOTIFY_SIGNAL updates from Jens Axboe: "This sits on top of of the core entry/exit and x86 entry branch from the tip tree, which contains the generic and x86 parts of this work. Here we convert the rest of the archs to support TIF_NOTIFY_SIGNAL. With that done, we can get rid of JOBCTL_TASK_WORK from task_work and signal.c, and also remove a deadlock work-around in io_uring around knowing that signal based task_work waking is invoked with the sighand wait queue head lock. The motivation for this work is to decouple signal notify based task_work, of which io_uring is a heavy user of, from sighand. The sighand lock becomes a huge contention point, particularly for threaded workloads where it's shared between threads. Even outside of threaded applications it's slower than it needs to be. Roman Gershman <romger@amazon.com> reported that his networked workload dropped from 1.6M QPS at 80% CPU to 1.0M QPS at 100% CPU after io_uring was changed to use TIF_NOTIFY_SIGNAL. The time was all spent hammering on the sighand lock, showing 57% of the CPU time there [1]. There are further cleanups possible on top of this. One example is TIF_PATCH_PENDING, where a patch already exists to use TIF_NOTIFY_SIGNAL instead. Hopefully this will also lead to more consolidation, but the work stands on its own as well" [1] https://github.com/axboe/liburing/issues/215 * tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-block: (28 commits) io_uring: remove 'twa_signal_ok' deadlock work-around kernel: remove checking for TIF_NOTIFY_SIGNAL signal: kill JOBCTL_TASK_WORK io_uring: JOBCTL_TASK_WORK is no longer used by task_work task_work: remove legacy TWA_SIGNAL path sparc: add support for TIF_NOTIFY_SIGNAL riscv: add support for TIF_NOTIFY_SIGNAL nds32: add support for TIF_NOTIFY_SIGNAL ia64: add support for TIF_NOTIFY_SIGNAL h8300: add support for TIF_NOTIFY_SIGNAL c6x: add support for TIF_NOTIFY_SIGNAL alpha: add support for TIF_NOTIFY_SIGNAL xtensa: add support for TIF_NOTIFY_SIGNAL arm: add support for TIF_NOTIFY_SIGNAL microblaze: add support for TIF_NOTIFY_SIGNAL hexagon: add support for TIF_NOTIFY_SIGNAL csky: add support for TIF_NOTIFY_SIGNAL openrisc: add support for TIF_NOTIFY_SIGNAL sh: add support for TIF_NOTIFY_SIGNAL um: add support for TIF_NOTIFY_SIGNAL ...
2020-12-16Merge tag 'asm-generic-mmu-context-5.11' of ↵Linus Torvalds1-7/+5
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic mmu-context cleanup from Arnd Bergmann: "This is a cleanup series from Nicholas Piggin, preparing for later changes. The asm/mmu_context.h header are generalized and common code moved to asm-gneneric/mmu_context.h. This saves a bit of code and makes it easier to change in the future" * tag 'asm-generic-mmu-context-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (25 commits) h8300: Fix generic mmu_context build m68k: mmu_context: Fix Sun-3 build xtensa: use asm-generic/mmu_context.h for no-op implementations x86: use asm-generic/mmu_context.h for no-op implementations um: use asm-generic/mmu_context.h for no-op implementations sparc: use asm-generic/mmu_context.h for no-op implementations sh: use asm-generic/mmu_context.h for no-op implementations s390: use asm-generic/mmu_context.h for no-op implementations riscv: use asm-generic/mmu_context.h for no-op implementations powerpc: use asm-generic/mmu_context.h for no-op implementations parisc: use asm-generic/mmu_context.h for no-op implementations openrisc: use asm-generic/mmu_context.h for no-op implementations nios2: use asm-generic/mmu_context.h for no-op implementations nds32: use asm-generic/mmu_context.h for no-op implementations mips: use asm-generic/mmu_context.h for no-op implementations microblaze: use asm-generic/mmu_context.h for no-op implementations m68k: use asm-generic/mmu_context.h for no-op implementations ia64: use asm-generic/mmu_context.h for no-op implementations hexagon: use asm-generic/mmu_context.h for no-op implementations csky: use asm-generic/mmu_context.h for no-op implementations ...
2020-12-16Merge tag 'irq-core-2020-12-15' of ↵Linus Torvalds1-16/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Generic interrupt and irqchips subsystem updates. Unusually, there is not a single completely new irq chip driver, just new DT bindings and extensions of existing drivers to accomodate new variants! Core: - Consolidation and robustness changes for irq time accounting - Cleanup and consolidation of irq stats - Remove the fasteoi IPI flow which has been proved useless - Provide an interface for converting legacy interrupt mechanism into irqdomains Drivers: - Preliminary support for managed interrupts on platform devices - Correctly identify allocation of MSIs proxyied by another device - Generalise the Ocelot support to new SoCs - Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation - Work around spurious interrupts on Qualcomm PDC - Random fixes and cleanups" * tag 'irq-core-2020-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) irqchip/qcom-pdc: Fix phantom irq when changing between rising/falling driver core: platform: Add devm_platform_get_irqs_affinity() ACPI: Drop acpi_dev_irqresource_disabled() resource: Add irqresource_disabled() genirq/affinity: Add irq_update_affinity_desc() irqchip/gic-v3-its: Flag device allocation as proxied if behind a PCI bridge irqchip/gic-v3-its: Tag ITS device as shared if allocating for a proxy device platform-msi: Track shared domain allocation irqchip/ti-sci-intr: Fix freeing of irqs irqchip/ti-sci-inta: Fix printing of inta id on probe success drivers/irqchip: Remove EZChip NPS interrupt controller Revert "genirq: Add fasteoi IPI flow" irqchip/hip04: Make IPIs use handle_percpu_devid_irq() irqchip/bcm2836: Make IPIs use handle_percpu_devid_irq() irqchip/armada-370-xp: Make IPIs use handle_percpu_devid_irq() irqchip/gic, gic-v3: Make SGIs use handle_percpu_devid_irq() irqchip/ocelot: Add support for Jaguar2 platforms irqchip/ocelot: Add support for Serval platforms irqchip/ocelot: Add support for Luton platforms irqchip/ocelot: prepare to support more SoC ...
2020-12-15Merge tag 'irqchip-5.11' of ↵Thomas Gleixner1-1/+7
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates for 5.11 from Marc Zyngier: - Preliminary support for managed interrupts on platform devices - Correctly identify allocation of MSIs proxyied by another device - Remove the fasteoi IPI flow which has been proved useless - Generalise the Ocelot support to new SoCs - Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation - Work around spurious interrupts on Qualcomm PDC - Random fixes and cleanups Link: https://lore.kernel.org/r/20201212135626.1479884-1-maz@kernel.org
2020-12-15Merge tag 'core-mm-2020-12-14' of ↵Linus Torvalds2-14/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull kmap updates from Thomas Gleixner: "The new preemtible kmap_local() implementation: - Consolidate all kmap_atomic() internals into a generic implementation which builds the base for the kmap_local() API and make the kmap_atomic() interface wrappers which handle the disabling/enabling of preemption and pagefaults. - Switch the storage from per-CPU to per task and provide scheduler support for clearing mapping when scheduling out and restoring them when scheduling back in. - Merge the migrate_disable/enable() code, which is also part of the scheduler pull request. This was required to make the kmap_local() interface available which does not disable preemption when a mapping is established. It has to disable migration instead to guarantee that the virtual address of the mapped slot is the same across preemption. - Provide better debug facilities: guard pages and enforced utilization of the mapping mechanics on 64bit systems when the architecture allows it. - Provide the new kmap_local() API which can now be used to cleanup the kmap_atomic() usage sites all over the place. Most of the usage sites do not require the implicit disabling of preemption and pagefaults so the penalty on 64bit and 32bit non-highmem systems is removed and quite some of the code can be simplified. A wholesale conversion is not possible because some usage depends on the implicit side effects and some need to be cleaned up because they work around these side effects. The migrate disable side effect is only effective on highmem systems and when enforced debugging is enabled. On 64bit and 32bit non-highmem systems the overhead is completely avoided" * tag 'core-mm-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) ARM: highmem: Fix cache_is_vivt() reference x86/crashdump/32: Simplify copy_oldmem_page() io-mapping: Provide iomap_local variant mm/highmem: Provide kmap_local* sched: highmem: Store local kmaps in task struct x86: Support kmap_local() forced debugging mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP mm/highmem: Provide and use CONFIG_DEBUG_KMAP_LOCAL microblaze/mm/highmem: Add dropped #ifdef back xtensa/mm/highmem: Make generic kmap_atomic() work correctly mm/highmem: Take kmap_high_get() properly into account highmem: High implementation details and document API Documentation/io-mapping: Remove outdated blurb io-mapping: Cleanup atomic iomap mm/highmem: Remove the old kmap_atomic cruft highmem: Get rid of kmap_types.h xtensa/mm/highmem: Switch to generic kmap atomic sparc/mm/highmem: Switch to generic kmap atomic powerpc/mm/highmem: Switch to generic kmap atomic nds32/mm/highmem: Switch to generic kmap atomic ...
2020-12-14um: time-travel: Correct time event IRQ deliveryJohannes Berg2-0/+6
Lockdep (on 5.10-rc) points out that we're delivering IRQs while IRQs are not even enabled, which clearly shouldn't happen. Defer the time event IRQ delivery until they actually are enabled. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-14um: irq/sigio: Support suspend/resume handling of workaround IRQsJohannes Berg1-0/+8
If the sigio workaround needed to be applied to a file descriptor, set_irq_wake() wouldn't work for it since it would get polled by the thread instead of causing SIGIO, and thus could never really cause a wakeup, since the thread notification FD wasn't marked as being able to wake up the system. Fix this by marking the thread's notification FD explicitly as a wake source FD, i.e. not suppressing SIGIO for it in suspend. In order to not cause spurious wakeups, we then need to remove all FDs that shouldn't wake up the system from the polling thread. In order to do this, add unlocked versions of ignore_sigio_fd() and add_sigio_fd() (nothing else is happening in suspend, so this is fine), and also modify ignore_sigio_fd() to return -ENOENT if the FD wasn't originally in there. This doesn't matter because nothing else currently checks the return value, but the irq code needs to know which ones to restore the workaround for. All told, this lets us use a timerfd for the RTC clock in the next patch, which doesn't send SIGIO. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-14um: allocate a guard page to helper threadsJohannes Berg1-1/+1
We've been running into stack overflows in helper threads corrupting memory (e.g. because somebody put printf() or os_info() there), so to avoid those causing hard-to-debug issues later on, allocate a guard page for helper thread stacks and mark it read-only. Unfortunately, the crash dump at that point is useless as the stack tracer will try to backtrace the *kernel* thread, not the helper thread, but at least we don't survive to a random issue caused by corruption. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-14um: support some of ARCH_HAS_SET_MEMORYJohannes Berg2-0/+4
For now, only support set_memory_ro()/rw() which we need for the stack protection in the next patch. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-14um: Support suspend to RAMJohannes Berg2-0/+4
With all the previous bits in place, we can now also support suspend to RAM, in the sense that everything is suspended, not just most, including userspace, processes like in s2idle. Since um_idle_sleep() now waits forever, we can simply call that to "suspend" the system. As before, you can wake it up using SIGUSR1 since we're just in a pause() call that only needs to return. In order to implement selective resume from certain devices, and not have any arbitrary device interrupt wake up, suspend interrupts by removing SIGIO notification (O_ASYNC) from all the FDs that are not supposed to wake up the system. However, swap out the handler so we don't actually handle the SIGIO as an interrupt. Since we're in pause(), the mere act of receiving SIGIO wakes us up, and then after things have been restored enough, re-set O_ASYNC for all previously suspended FDs, reinstall the proper SIGIO handler, and send SIGIO to self to process anything that might now be pending. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-14um: Allow PM with suspend-to-idleJohannes Berg2-0/+3
In order to be able to experiment with suspend in UML, add the minimal work to be able to suspend (s2idle) an instance of UML, and be able to wake it back up from that state with the USR1 signal sent to the main UML process. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-14um: Simplify os_idle_sleep() and sleep longerJohannes Berg2-3/+3
There really is no reason to pass the amount of time we should sleep, especially since it's just hard-coded to one second. Additionally, one second isn't really all that long, and as we are expecting to be woken up by a signal, we can sleep longer and avoid doing some work every second, so replace the current clock_nanosleep() with just an empty select() that can _only_ be woken up by a signal. We can also remove the deliver_alarm() since we don't need to do that when we got e.g. SIGIO that woke us up, and if we got SIGALRM the signal handler will actually (have) run, so it's just unnecessary extra work. Similarly, in time-travel mode, just program the wakeup event from idle to be S64_MAX, which is basically the most you could ever simulate to. Of course, you should already have an event in the list that's earlier and will cause a wakeup, normally that's the regular timer interrupt, though in suspend it may (later) also be an RTC event. Since actually getting to this point would be a bug and you can't ever get out again, panic() on it in the time control code. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-14um: Remove IRQ_NONE typeJohannes Berg3-10/+12
We don't actually use this in um_request_irq(), so it can never be assigned. It's also not clear what that would be useful for, so just remove it. This results in quite a number of cleanups, all the way to removing the "SIGIO on close" startup check, since the data it assigns (pty_close_sigio) is not used anymore. While at it, also make this an enum so we get a minimum of type checking, and remove the IRQ_NONE hack in virtio since we now no longer have the name twice. Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-14um: irq: Reduce irq_reg allocationJohannes Berg1-1/+1
We don't need an array of 4 entries to capture three and the name 'MAX_IRQ_TYPE' really gets confusing as well. Remove it and add a correct NUM_IRQ_TYPES, and use that correctly. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-14um: irq: Clean up and rename struct irq_fdJohannes Berg1-14/+0
This really shouldn't be called "irq_fd" since it doesn't carry an fd. Well, it used to, apparently, but that struct member is unused. Rename it to "irq_reg" since it more accurately reflects a registered interrupt, and remove the unused 'next' and 'fd' members from the struct as well. While at it, also move it to the implementation, it's not used anywhere else, and the header file is shared with the userspace components. Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-14um: virtio: Use dynamic IRQ allocationJohannes Berg1-3/+2
This separates the devices, which is better for debug and for later suspend/resume and wakeup support, since there we'll have to separate which IRQs can wake up the system and which cannot. Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-14um: Support dynamic IRQ allocationJohannes Berg2-9/+9
It's cumbersome and error-prone to keep adding fixed IRQ numbers, and for proper device wakeup support for the virtio/vhost-user support we need to have different IRQs for each device. Even if in theory two IRQs (with and without wake) might be sufficient, it's much easier to reason about it when we have dynamic number assignment. It also makes it easier to add new devices that may dynamically exist or depending on the configuration, etc. Add support for this, up to 64 IRQs (the same limit as epoll FDs we have right now). Since it's not easy to port all the existing places to dynamic allocation (some data is statically initialized) keep the low numbers are reserved for the existing hard-coded IRQ numbers. Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>