starfive-tech/linux.git - StarFive Tech Linux Kernel for VisionFive (JH7110) boards (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2021-02-13	s390/vdso: use union tod_clock	Heiko Carstens	1	-3/+3
	Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13	s390/time: convert tod_clock_base to union	Heiko Carstens	4	-38/+22
	Convert tod_clock_base to union tod_clock. This simplifies quite a bit of code and also fixes a bug in read_persistent_clock64(); void read_persistent_clock64(struct timespec64 ts) { __u64 delta; delta = initial_leap_seconds + TOD_UNIX_EPOCH; get_tod_clock_ext(clk); (__u64 ) &clk[1] -= delta; if ((__u64 ) &clk[1] > delta) clk[0]--; ext_to_timespec64(clk, ts); } Assume &clk[1] == 3 and delta == 2; then after the substraction the if condition becomes true and the epoch part of the clock is decremented by one because of an assumed overflow, even though there is none. Fix this by using 128 bit arithmetics and let the compiler do the right thing: void read_persistent_clock64(struct timespec64 ts) { union tod_clock clk; u64 delta; delta = initial_leap_seconds + TOD_UNIX_EPOCH; store_tod_clock_ext(&clk); clk.eitod -= delta; ext_to_timespec64(&clk, ts); } Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13	s390/time: rename store_tod_clock_ext() and use union tod_clock	Heiko Carstens	1	-3/+3
	Rename store_tod_clock_ext() to store_tod_clock_ext_cc() to reflect that it returns a condition code and also use union tod_clock as parameter. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13	s390: split cleanup_sie	Sven Schnelle	1	-10/+7
	The current code uses the address in %r11 to figure out whether it was called from the machine check handler or from a normal interrupt handler. Instead of doing this implicit logic (which is mostly a leftover from the old critical cleanup approach) just add a second label and use that. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13	s390: use r13 in cleanup_sie as temp register	Sven Schnelle	1	-2/+2
	Instead of thrashing r11 which is normally our pointer to struct pt_regs on the stack, use r13 as temporary register in the BR_EX macro. r13 is already used in cleanup_sie, so no need to thrash another register. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13	s390: fix kernel asce loading when sie is interrupted	Sven Schnelle	1	-2/+1
	If a machine check is coming in during sie, the PU saves the control registers to the machine check save area. Afterwards mcck_int_handler is called, which loads __LC_KERNEL_ASCE into %cr1. Later the code restores %cr1 from the machine check area, but that is wrong when SIE was interrupted because the machine check area still contains the gmap asce. Instead it should return with either __KERNEL_ASCE in %cr1 when interrupted in SIE or the previous %cr1 content saved in the machine check save area. Fixes: 87d598634521 ("s390/mm: remove set_fs / rework address space handling") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Cc: <stable@kernel.org> # v5.8+ Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13	s390: add stack for machine check handler	Sven Schnelle	4	-18/+35
	The previous code used the normal kernel stack for machine checks. This is problematic when a machine check interrupts a system call or interrupt handler right at the beginning where registers are set up. Assume system_call is interrupted at the first instruction and a machine check is triggered. The machine check handler is called, checks the PSW to see whether it is coming from user space, notices that it is already in kernel mode but %r15 still contains the user space stack. This would lead to a kernel crash. There are basically two ways of fixing that: Either using the 'critical cleanup' approach which compares the address in the PSW to see whether it is already at a point where the stack has been set up, or use an extra stack for the machine check handler. For simplicity, we will go with the second approach and allocate an extra stack. This adds some memory overhead for large systems, but usually large system have plenty of memory so this isn't really a concern. But it keeps the mchk stack setup simple and less error prone. Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Cc: <stable@kernel.org> # v5.8+ Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13	s390: use WRITE_ONCE when re-allocating async stack	Sven Schnelle	1	-1/+1
	The code does: S390_lowcore.async_stack = new + STACK_INIT_OFFSET; But the compiler is free to first assign one value and add the other value later. If a IRQ would be coming in between these two operations, it would run with an invalid stack. Prevent this by using WRITE_ONCE. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13	s390: open code SWITCH_KERNEL macro	Sven Schnelle	1	-28/+46
	This is a preparation patch for two later bugfixes. In the past both int_handler and machine check handler used SWITCH_KERNEL to switch to the kernel stack. However, SWITCH_KERNEL doesn't work properly in machine check context. So instead of adding more complexity to this macro, just remove it. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Cc: <stable@kernel.org> # v5.8+ Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-12	Merge branch 'x86/paravirt' into x86/entry	Ingo Molnar	1	-1/+1
	Merge in the recent paravirt changes to resolve conflicts caused by objtool annotations. Conflicts: arch/x86/xen/xen-asm.S Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-02-11	softirq: Move do_softirq_own_stack() to generic asm header	Thomas Gleixner	1	-0/+1
	To avoid include recursion hell move the do_softirq_own_stack() related content into a generic asm header and include it from all places in arch/ which need the prototype. This allows architectures to provide an inline implementation of do_softirq_own_stack() without introducing a lot of #ifdeffery all over the place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210210002513.289960691@linutronix.de
2021-02-09	s390/vtime: use cpu alternative for stck/stckf	Heiko Carstens	1	-11/+8
	Use a cpu alternative to switch between stck and stckf instead of making it compile time dependent. This will also make kernels compiled for old machines, but running on newer machines, use stckf. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09	s390/entry: use cpu alternative for stck/stckf	Heiko Carstens	1	-5/+3
	Use a cpu alternative to switch between stck and stckf instead of making it compile time dependent. This will also make kernels compiled for old machines, but running on newer machines, use stckf. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09	s390/time: use stcke instead of stck	Heiko Carstens	1	-3/+3
	Use STORE CLOCK EXTENDED instead of STORE CLOCK in early tod clock setup. This is just to remove another usage of stck, trying to remove all usages of STORE CLOCK. This doesn't fix anything. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09	s390/cpum_cf_diag: use get_tod_clock_fast()	Heiko Carstens	1	-1/+1
	Use get_tod_clock_fast() instead of store_tod_clock(), since store_tod_clock() can be very slow. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09	s390/vtime: fix inline assembly clobber list	Heiko Carstens	1	-1/+2
	The stck/stckf instruction used within the inline assembly within do_account_vtime() changes the condition code. This is not reflected with the clobber list, and therefore might result in incorrect code generation. It seems unlikely that the compiler could generate incorrect code considering the surrounding C code, but it must still be fixed. Cc: <stable@vger.kernel.org> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09	s390/vdso: on timens page fault prefault also VVAR page	Heiko Carstens	1	-4/+13
	This is the s390 variant of commit e6b28ec65b6d ("x86/vdso: On timens page fault prefault also VVAR page"). Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09	s390/vdso: implement generic vdso time namespace support	Heiko Carstens	2	-8/+99
	Implement generic vdso time namespace support which also enables time namespaces for s390. This is quite similar to what arm64 has. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09	s390/vdso: move data page before code pages	Heiko Carstens	2	-17/+15
	For consistency with x86 and arm64 move the data page before code pages. Similar to commit 601255ae3c98 ("arm64: vdso: move data page before code pages"). Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09	s390/vdso: put vdso datapage in a separate vma	Heiko Carstens	1	-20/+35
	Add a separate "[vvar]" mapping for the vdso datapage, since it doesn't need to be executable or COW-able. This is actually the s390 implementation of commit 871549385278 ("arm64: vdso: put vdso datapage in a separate vma") Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09	s390/vdso: get rid of vdso_fault	Heiko Carstens	1	-26/+9
	Implement vdso mapping similar to arm64 and powerpc. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09	s390/vdso: misc simple code changes	Heiko Carstens	1	-72/+30
	- remove unneeded includes - move functions around - remove obvious and/or incorrect comments - shorten some if conditions No functional change. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09	s390/vdso: remove superfluous variables	Heiko Carstens	1	-22/+13
	A few local variables exist only so the contents of a global variable can be copied to them, and use that value only for reading. Just remove them and rename some global variables. Also change vdso64_[start\|end] to be character arrays to be consistent with other architectures, and get rid of the global variable vdso64_kbase. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09	s390/vdso: remove superfluous check	Heiko Carstens	1	-7/+0
	vdso_pages (aka vdso64_pages) is never 0, therefore remove the check. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09	s390/vdso: remove BUG_ON()	Heiko Carstens	1	-1/+4
	Handle allocation error gracefully and simply disable vdso instead of leaving the system in an undefined state. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09	s390/vdso: simplify vdso size calculation	Heiko Carstens	1	-2/+1
	The vdso is (and must) be page aligned and its size must also be a multiple of PAGE_SIZE. Therefore no need to round upwards. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09	s390/vdso: convert vdso_init() to arch_initcall	Heiko Carstens	1	-3/+4
	Convert vdso_init() to arch_initcall like it is on all other architectures. This requires to remove the vdso_getcpu_init() call from vdso_init() since it must be called before smp is enabled. vdso_getcpu_init() is now an early_initcall like on powerpc. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-09	s390/vdso: fix vdso data page definition	Heiko Carstens	1	-2/+2
	The vdso data page actually contains an array. Fix that. This doesn't fix a real bug, just reflects reality. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-27	s390: add missing include to arch/s390/kernel/signal.c	Sven Schnelle	1	-0/+1
	This fixes the following warning: CHECK linux/arch/s390/kernel/signal.c linux/arch/s390/kernel/signal.c:465:6: warning: symbol 'arch_do_signal_or_restart' was not declared. Should it be static? Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-27	s390: uv: Fix sysfs max number of VCPUs reporting	Janosch Frank	1	-1/+1
	The number reported by the query is N-1 and I think people reading the sysfs file would expect N instead. For users creating VMs there's no actual difference because KVM's limit is currently below the UV's limit. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Fixes: a0f60f8431999 ("s390/protvirt: Add sysfs firmware interface for Ultravisor information") Cc: stable@vger.kernel.org Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-24	fs: add mount_setattr()	Christian Brauner	1	-0/+1
	This implements the missing mount_setattr() syscall. While the new mount api allows to change the properties of a superblock there is currently no way to change the properties of a mount or a mount tree using file descriptors which the new mount api is based on. In addition the old mount api has the restriction that mount options cannot be applied recursively. This hasn't changed since changing mount options on a per-mount basis was implemented in [1] and has been a frequent request not just for convenience but also for security reasons. The legacy mount syscall is unable to accommodate this behavior without introducing a whole new set of flags because MS_REC \| MS_REMOUNT \| MS_BIND \| MS_RDONLY \| MS_NOEXEC \| [...] only apply the mount option to the topmost mount. Changing MS_REC to apply to the whole mount tree would mean introducing a significant uapi change and would likely cause significant regressions. The new mount_setattr() syscall allows to recursively clear and set mount options in one shot. Multiple calls to change mount options requesting the same changes are idempotent: int mount_setattr(int dfd, const char path, unsigned flags, struct mount_attr uattr, size_t usize); Flags to modify path resolution behavior are specified in the @flags argument. Currently, AT_EMPTY_PATH, AT_RECURSIVE, AT_SYMLINK_NOFOLLOW, and AT_NO_AUTOMOUNT are supported. If useful, additional lookup flags to restrict path resolution as introduced with openat2() might be supported in the future. The mount_setattr() syscall can be expected to grow over time and is designed with extensibility in mind. It follows the extensible syscall pattern we have used with other syscalls such as openat2(), clone3(), sched_{set,get}attr(), and others. The set of mount options is passed in the uapi struct mount_attr which currently has the following layout: struct mount_attr { __u64 attr_set; __u64 attr_clr; __u64 propagation; __u64 userns_fd; }; The @attr_set and @attr_clr members are used to clear and set mount options. This way a user can e.g. request that a set of flags is to be raised such as turning mounts readonly by raising MOUNT_ATTR_RDONLY in @attr_set while at the same time requesting that another set of flags is to be lowered such as removing noexec from a mount tree by specifying MOUNT_ATTR_NOEXEC in @attr_clr. Note, since the MOUNT_ATTR_<atime> values are an enum starting from 0, not a bitmap, users wanting to transition to a different atime setting cannot simply specify the atime setting in @attr_set, but must also specify MOUNT_ATTR__ATIME in the @attr_clr field. So we ensure that MOUNT_ATTR__ATIME can't be partially set in @attr_clr and that @attr_set can't have any atime bits set if MOUNT_ATTR__ATIME isn't set in @attr_clr. The @propagation field lets callers specify the propagation type of a mount tree. Propagation is a single property that has four different settings and as such is not really a flag argument but an enum. Specifically, it would be unclear what setting and clearing propagation settings in combination would amount to. The legacy mount() syscall thus forbids the combination of multiple propagation settings too. The goal is to keep the semantics of mount propagation somewhat simple as they are overly complex as it is. The @userns_fd field lets user specify a user namespace whose idmapping becomes the idmapping of the mount. This is implemented and explained in detail in the next patch. [1]: commit 2e4b7fcd9260 ("[PATCH] r/o bind mounts: honor mount writer counts at remount") Link: https://lore.kernel.org/r/20210121131959.646623-35-christian.brauner@ubuntu.com Cc: David Howells <dhowells@redhat.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-api@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-19	s390: pass struct pt_regs instead of registers to syscalls	Sven Schnelle	1	-6/+2
	Instead of fetching all registers from struct pt_regs and passing them to the syscall wrappers, let the system call wrappers only fetch the values really required. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-19	s390: convert to generic entry	Sven Schnelle	17	-873/+489
	This patch converts s390 to use the generic entry infrastructure from kernel/entry/*. There are a few special things on s390: - PIF_PER_TRAP is moved to TIF_PER_TRAP as the generic code doesn't know about our PIF flags in exit_to_user_mode_loop(). - The old code had several ways to restart syscalls: a) PIF_SYSCALL_RESTART, which was only set during execve to force a restart after upgrading a process (usually qemu-kvm) to pgste page table extensions. b) PIF_SYSCALL, which is set by do_signal() to indicate that the current syscall should be restarted. This is changed so that do_signal() now also uses PIF_SYSCALL_RESTART. Continuing to use PIF_SYSCALL doesn't work with the generic code, and changing it to PIF_SYSCALL_RESTART makes PIF_SYSCALL and PIF_SYSCALL_RESTART more unique. - On s390 calling sys_sigreturn or sys_rt_sigreturn is implemented by executing a svc instruction on the process stack which causes a fault. While handling that fault the fault code sets PIF_SYSCALL to hand over processing to the syscall code on exit to usermode. The patch introduces PIF_SYSCALL_RET_SET, which is set if ptrace sets a return value for a syscall. The s390x ptrace ABI uses r2 both for the syscall number and return value, so ptrace cannot set the syscall number + return value at the same time. The flag makes handling that a bit easier. do_syscall() will just skip executing the syscall if PIF_SYSCALL_RET_SET is set. CONFIG_DEBUG_ASCE was removd in favour of the generic CONFIG_DEBUG_ENTRY. CR1/7/13 will be checked both on kernel entry and exit to contain the correct asces. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-06	elf_prstatus: collect the common part (everything before pr_reg) into a struct	Al Viro	1	-1/+1
	Preparations to doing i386 compat elf_prstatus sanely - rather than duplicating the beginning of compat_elf_prstatus, take these fields into a separate structure (compat_elf_prstatus_common), so that it could be reused. Due to the incestous relationship between binfmt_elf.c and compat_binfmt_elf.c we need the same shape change done to native struct elf_prstatus, gathering the fields prior to pr_reg into a new structure (struct elf_prstatus_common). Fortunately, offset of pr_reg is always a multiple of 16 with no padding right before it, so it's possible to turn all the stuff prior to it into a single member without disturbing the layout. [build fix from Geert Uytterhoeven folded in] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-12-25	Merge tag 'irq-core-2020-12-23' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "This is the second attempt after the first one failed miserably and got zapped to unblock the rest of the interrupt related patches. A treewide cleanup of interrupt descriptor (ab)use with all sorts of racy accesses, inefficient and disfunctional code. The goal is to remove the export of irq_to_desc() to prevent these things from creeping up again" * tag 'irq-core-2020-12-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) genirq: Restrict export of irq_to_desc() xen/events: Implement irq distribution xen/events: Reduce irq_info:: Spurious_cnt storage size xen/events: Only force affinity mask for percpu interrupts xen/events: Use immediate affinity setting xen/events: Remove disfunct affinity spreading xen/events: Remove unused bind_evtchn_to_irq_lateeoi() net/mlx5: Use effective interrupt affinity net/mlx5: Replace irq_to_desc() abuse net/mlx4: Use effective interrupt affinity net/mlx4: Replace irq_to_desc() abuse PCI: mobiveil: Use irq_data_get_irq_chip_data() PCI: xilinx-nwl: Use irq_data_get_irq_chip_data() NTB/msi: Use irq_has_action() mfd: ab8500-debugfs: Remove the racy fiddling with irq_desc pinctrl: nomadik: Use irq_has_action() drm/i915/pmu: Replace open coded kstat_irqs() copy drm/i915/lpe_audio: Remove pointless irq_to_desc() usage s390/irq: Use irq_desc_kstat_cpu() in show_msi_interrupt() parisc/irq: Use irq_desc_kstat_cpu() in show_interrupts() ...
2020-12-20	epoll: fix compat syscall wire up of epoll_pwait2	Heiko Carstens	1	-1/+1
	Commit b0a0c2615f6f ("epoll: wire up syscall epoll_pwait2") wired up the 64 bit syscall instead of the compat variant in a couple of places. Fixes: b0a0c2615f6f ("epoll: wire up syscall epoll_pwait2") Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Willem de Bruijn <willemb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-19	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds	1	-0/+1
	Merge still more updates from Andrew Morton: "18 patches. Subsystems affected by this patch series: mm (memcg and cleanups) and epoll" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/Kconfig: fix spelling mistake "whats" -> "what's" selftests/filesystems: expand epoll with epoll_pwait2 epoll: wire up syscall epoll_pwait2 epoll: add syscall epoll_pwait2 epoll: convert internal api to timespec64 epoll: eliminate unnecessary lock for zero timeout epoll: replace gotos with a proper loop epoll: pull all code between fetch_events and send_event into the loop epoll: simplify and optimize busy loop logic epoll: move eavail next to the list_empty_careful check epoll: pull fatal signal checks into ep_send_events() epoll: simplify signal handling epoll: check for events when removing a timed out thread from the wait queue mm/memcontrol:rewrite mem_cgroup_page_lruvec() mm, kvm: account kvm_vcpu_mmap to kmemcg mm/memcg: remove unused definitions mm/memcg: warning on !memcg after readahead page charged mm/memcg: bail early from swap accounting if memcg disabled
2020-12-19	epoll: wire up syscall epoll_pwait2	Willem de Bruijn	1	-0/+1
	Split off from prev patch in the series that implements the syscall. Link: https://lkml.kernel.org/r/20201121144401.3727659-4-willemdebruijn.kernel@gmail.com Signed-off-by: Willem de Bruijn <willemb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-18	Merge tag 's390-5.11-2' of ↵	Linus Torvalds	4	-26/+11
	git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Heiko Carstens: "This is mainly to decouple udelay() and arch_cpu_idle() and simplify both of them. Summary: - Always initialize kernel stack backchain when entering the kernel, so that unwinding works properly. - Fix stack unwinder test case to avoid rare interrupt stack corruption. - Simplify udelay() and just let it busy loop instead of implementing a complex logic. - arch_cpu_idle() cleanup. - Some other minor improvements" * tag 's390-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/zcrypt: convert comma to semicolon s390/idle: allow arch_cpu_idle() to be kprobed s390/idle: remove raw_local_irq_save()/restore() from arch_cpu_idle() s390/idle: merge enabled_wait() and arch_cpu_idle() s390/delay: remove udelay_simple() s390/irq: select HAVE_IRQ_EXIT_ON_IRQ_STACK s390/delay: simplify udelay s390/test_unwind: use timer instead of udelay s390/test_unwind: fix CALL_ON_STACK tests s390: make calls to TRACE_IRQS_OFF/TRACE_IRQS_ON balanced s390: always clear kernel stack backchain before calling functions
2020-12-18	Merge tag 'trace-v5.11' of ↵	Linus Torvalds	1	-4/+16
	git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The major update to this release is that there's a new arch config option called CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS. Currently, only x86_64 enables it. All the ftrace callbacks now take a struct ftrace_regs instead of a struct pt_regs. If the architecture has HAVE_DYNAMIC_FTRACE_WITH_ARGS enabled, then the ftrace_regs will have enough information to read the arguments of the function being traced, as well as access to the stack pointer. This way, if a user (like live kernel patching) only cares about the arguments, then it can avoid using the heavier weight "regs" callback, that puts in enough information in the struct ftrace_regs to simulate a breakpoint exception (needed for kprobes). A new config option that audits the timestamps of the ftrace ring buffer at most every event recorded. Ftrace recursion protection has been cleaned up to move the protection to the callback itself (this saves on an extra function call for those callbacks). Perf now handles its own RCU protection and does not depend on ftrace to do it for it (saving on that extra function call). New debug option to add "recursed_functions" file to tracefs that lists all the places that triggered the recursion protection of the function tracer. This will show where things need to be fixed as recursion slows down the function tracer. The eval enum mapping updates done at boot up are now offloaded to a work queue, as it caused a noticeable pause on slow embedded boards. Various clean ups and last minute fixes" * tag 'trace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits) tracing: Offload eval map updates to a work queue Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS" ring-buffer: Add rb_check_bpage in __rb_allocate_pages ring-buffer: Fix two typos in comments tracing: Drop unneeded assignment in ring_buffer_resize() tracing: Disable ftrace selftests when any tracer is running seq_buf: Avoid type mismatch for seq_buf_init ring-buffer: Fix a typo in function description ring-buffer: Remove obsolete rb_event_is_commit() ring-buffer: Add test to validate the time stamp deltas ftrace/documentation: Fix RST C code blocks tracing: Clean up after filter logic rewriting tracing: Remove the useless value assignment in test_create_synth_event() livepatch: Use the default ftrace_ops instead of REGS when ARGS is available ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs MAINTAINERS: assign ./fs/tracefs to TRACING tracing: Fix some typos in comments ftrace: Remove unused varible 'ret' ring-buffer: Add recording of ring buffer recursion into recursed_functions ...
2020-12-16	Merge tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-block	Linus Torvalds	2	-6/+7
	Pull TIF_NOTIFY_SIGNAL updates from Jens Axboe: "This sits on top of of the core entry/exit and x86 entry branch from the tip tree, which contains the generic and x86 parts of this work. Here we convert the rest of the archs to support TIF_NOTIFY_SIGNAL. With that done, we can get rid of JOBCTL_TASK_WORK from task_work and signal.c, and also remove a deadlock work-around in io_uring around knowing that signal based task_work waking is invoked with the sighand wait queue head lock. The motivation for this work is to decouple signal notify based task_work, of which io_uring is a heavy user of, from sighand. The sighand lock becomes a huge contention point, particularly for threaded workloads where it's shared between threads. Even outside of threaded applications it's slower than it needs to be. Roman Gershman <romger@amazon.com> reported that his networked workload dropped from 1.6M QPS at 80% CPU to 1.0M QPS at 100% CPU after io_uring was changed to use TIF_NOTIFY_SIGNAL. The time was all spent hammering on the sighand lock, showing 57% of the CPU time there [1]. There are further cleanups possible on top of this. One example is TIF_PATCH_PENDING, where a patch already exists to use TIF_NOTIFY_SIGNAL instead. Hopefully this will also lead to more consolidation, but the work stands on its own as well" [1] https://github.com/axboe/liburing/issues/215 * tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-block: (28 commits) io_uring: remove 'twa_signal_ok' deadlock work-around kernel: remove checking for TIF_NOTIFY_SIGNAL signal: kill JOBCTL_TASK_WORK io_uring: JOBCTL_TASK_WORK is no longer used by task_work task_work: remove legacy TWA_SIGNAL path sparc: add support for TIF_NOTIFY_SIGNAL riscv: add support for TIF_NOTIFY_SIGNAL nds32: add support for TIF_NOTIFY_SIGNAL ia64: add support for TIF_NOTIFY_SIGNAL h8300: add support for TIF_NOTIFY_SIGNAL c6x: add support for TIF_NOTIFY_SIGNAL alpha: add support for TIF_NOTIFY_SIGNAL xtensa: add support for TIF_NOTIFY_SIGNAL arm: add support for TIF_NOTIFY_SIGNAL microblaze: add support for TIF_NOTIFY_SIGNAL hexagon: add support for TIF_NOTIFY_SIGNAL csky: add support for TIF_NOTIFY_SIGNAL openrisc: add support for TIF_NOTIFY_SIGNAL sh: add support for TIF_NOTIFY_SIGNAL um: add support for TIF_NOTIFY_SIGNAL ...
2020-12-16	s390/idle: allow arch_cpu_idle() to be kprobed	Heiko Carstens	1	-2/+0
	Remove NOKPROBE_SYMBOL() for arch_cpu_idle(). This might have made sense when enabled_wait() (aka arch_cpu_idle()) was called from udelay. But now there shouldn't be a reason why s390 should be the only architecture which doesn't allow arch_cpu_idle() to be probed. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16	s390/idle: remove raw_local_irq_save()/restore() from arch_cpu_idle()	Heiko Carstens	1	-5/+2
	arch_cpu_idle() gets called with interrupts disabled, and psw_idle() returns with interrupts disabled. No reason to use raw_local_irq_save() / restore(). Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16	s390/idle: merge enabled_wait() and arch_cpu_idle()	Heiko Carstens	1	-8/+3
	The only caller of enabled_wait() besides arch_cpu_idle() was udelay(). Since that call doesn't exist anymore, merge enabled_wait() and arch_cpu_idle(). Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16	s390/delay: remove udelay_simple()	Heiko Carstens	2	-2/+1
	udelay_simple() callers can make use of the now simplified udelay() implementation. No need to keep it. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16	s390/delay: simplify udelay	Heiko Carstens	1	-4/+0
	udelay is implemented by using quite subtle details to make it possible to load an idle psw and waiting for an interrupt even in irq context or when interrupts are disabled. Also handling (or better: no handling) of softirqs is taken into account. All this is done to optimize for something which should in normal circumstances never happen: calling udelay to busy wait. Therefore get rid of the whole complexity and just busy loop like other architectures are doing it also. It could have been possible to use diag 0x44 instead of cpu_relax() in the busy loop, however we have seen too many bad things happen with diag 0x44 that it seems to be better to simply busy loop. Also note that with this new implementation kernel preemption does work when within the udelay loop. This did not work before. To get a feeling what the former code optimizes for: IPL'ing a kernel with 'defconfig' and afterwards compiling a kernel ends with a total of zero udelay calls. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16	s390: make calls to TRACE_IRQS_OFF/TRACE_IRQS_ON balanced	Heiko Carstens	1	-2/+2
	In case of udelay CIF_IGNORE_IRQ is set. This leads to an unbalanced call of TRACE_IRQS_OFF and TRACE_IRQS_ON. That is: from lockdep's point of view TRACE_IRQS_ON is called one time too often. This doesn't fix any real bug, just makes the calls balanced. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16	s390: always clear kernel stack backchain before calling functions	Heiko Carstens	1	-6/+6
	Clear the kernel stack backchain before potentially calling the lockdep trace_hardirqs_off/on functions. Without this walking the kernel backchain, e.g. during a panic, might stop too early. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16	Merge tag 'irq-core-2020-12-15' of ↵	Linus Torvalds	1	-18/+33
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Generic interrupt and irqchips subsystem updates. Unusually, there is not a single completely new irq chip driver, just new DT bindings and extensions of existing drivers to accomodate new variants! Core: - Consolidation and robustness changes for irq time accounting - Cleanup and consolidation of irq stats - Remove the fasteoi IPI flow which has been proved useless - Provide an interface for converting legacy interrupt mechanism into irqdomains Drivers: - Preliminary support for managed interrupts on platform devices - Correctly identify allocation of MSIs proxyied by another device - Generalise the Ocelot support to new SoCs - Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation - Work around spurious interrupts on Qualcomm PDC - Random fixes and cleanups" * tag 'irq-core-2020-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) irqchip/qcom-pdc: Fix phantom irq when changing between rising/falling driver core: platform: Add devm_platform_get_irqs_affinity() ACPI: Drop acpi_dev_irqresource_disabled() resource: Add irqresource_disabled() genirq/affinity: Add irq_update_affinity_desc() irqchip/gic-v3-its: Flag device allocation as proxied if behind a PCI bridge irqchip/gic-v3-its: Tag ITS device as shared if allocating for a proxy device platform-msi: Track shared domain allocation irqchip/ti-sci-intr: Fix freeing of irqs irqchip/ti-sci-inta: Fix printing of inta id on probe success drivers/irqchip: Remove EZChip NPS interrupt controller Revert "genirq: Add fasteoi IPI flow" irqchip/hip04: Make IPIs use handle_percpu_devid_irq() irqchip/bcm2836: Make IPIs use handle_percpu_devid_irq() irqchip/armada-370-xp: Make IPIs use handle_percpu_devid_irq() irqchip/gic, gic-v3: Make SGIs use handle_percpu_devid_irq() irqchip/ocelot: Add support for Jaguar2 platforms irqchip/ocelot: Add support for Serval platforms irqchip/ocelot: Add support for Luton platforms irqchip/ocelot: prepare to support more SoC ...
2020-12-15	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds	1	-10/+1
	Merge misc updates from Andrew Morton: - a few random little subsystems - almost all of the MM patches which are staged ahead of linux-next material. I'll trickle to post-linux-next work in as the dependents get merged up. Subsystems affected by this patch series: kthread, kbuild, ide, ntfs, ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation, kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction, oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc, uaccess, zram, and cleanups). * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (200 commits) mm: cleanup kstrto() usage mm: fix fall-through warnings for Clang mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at mm: shmem: convert shmem_enabled_show to use sysfs_emit_at mm:backing-dev: use sysfs_emit in macro defining functions mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening mm: use sysfs_emit for struct kobject uses mm: fix kernel-doc markups zram: break the strict dependency from lzo zram: add stat to gather incompressible pages since zram set up zram: support page writeback mm/process_vm_access: remove redundant initialization of iov_r mm/zsmalloc.c: rework the list_add code in insert_zspage() mm/zswap: move to use crypto_acomp API for hardware acceleration mm/zswap: fix passing zero to 'PTR_ERR' warning mm/zswap: make struct kernel_param_ops definitions const userfaultfd/selftests: hint the test runner on required privilege userfaultfd/selftests: fix retval check for userfaultfd_open() userfaultfd/selftests: always dump something in modes userfaultfd: selftests: make __{s,u}64 format specifiers portable ...