Age | Commit message (Collapse) | Author | Files | Lines |
|
En/disabling softirqs from asm code turned out to be trickier than
expected, so vfp_support_entry now returns by tail calling
__local_enable_bh_ip() and passing the same arguments that a C call to
local_bh_enable() would pass. However, this is slightly hacky, as we
don't want to carry our own implementation of local_bh_enable().
So let's bite the bullet, and get rid of the asm logic in
vfp_support_entry that reasons about whether or not to save and/or
reload the VFP state, and about whether or not an FP exception is
pending, and only keep the VFP loading logic as a function that is
callable from C.
Replicate the removed logic in vfp_entry(), and use the exact same
reasoning as in the asm code. To emphasize the correspondence, retain
some of the asm comments in the C version as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Feroceon CPUs have a non-standard implementation of VFP which reports
synchronous VFP exceptions using the async VFP flag. This requires a
workaround which is difficult to reconcile with other implementations,
making it tricky to support both versions in a single image.
Since this is a v5 CPU, it is not supported by armhf and so the
likelihood that anybody is using this with recent distros/kernels and
rely on the VFP at the same time is extremely low. So let's just disable
VFP support on these cores, so we can remove the workaround.
This will help future development to support v5 and v6 CPUs with a
single kernel image.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Commit c76c6c4ecbec0deb5 ("ARM: 9294/2: vfp: Fix broken softirq handling
with instrumentation enabled") updated the VFP exception entry logic to
go via a C function, so that we get the compiler's version of
local_bh_disable(), which may be instrumented, and isn't generally
callable from assembler.
However, this assumes that passing an alternative 'success' return
address works in C as it does in asm, and this is only the case if the C
calls in question are tail calls, as otherwise, the stack will need some
unwinding as well.
I have already sent patches to the list that replace most of the asm
logic with C code, and so it is preferable to have a minimal fix that
addresses the issue and can be backported along with the commit that it
fixes to v6.3 from v6.4. Hopefully, we can land the C conversion for v6.5.
So instead of passing the 'success' return address as a function
argument, pass the stack address from where to pop it so that both LR
and SP have the expected value.
Fixes: c76c6c4ecbec0deb5 ("ARM: 9294/2: vfp: Fix broken softirq handling with ...")
Reported-by: syzbot+d4b00edc2d0c910d4bf4@syzkaller.appspotmail.com
Tested-by: syzbot+d4b00edc2d0c910d4bf4@syzkaller.appspotmail.com
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
Commit 62b95a7b44d1 ("ARM: 9282/1: vfp: Manipulate task VFP state with
softirqs disabled") replaced the en/disable preemption calls inside the
VFP state handling code with en/disabling of soft IRQs, which is
necessary to allow kernel use of the VFP/SIMD unit when handling a soft
IRQ.
Unfortunately, when lockdep is enabled (or other instrumentation that
enables TRACE_IRQFLAGS), the disable path implemented in asm fails to
perform the lockdep and RCU related bookkeeping, resulting in spurious
warnings and other badness.
Set let's rework the VFP entry code a little bit so we can make the
local_bh_disable() call from C, with all the instrumentations that
happen to have been configured. Calling local_bh_enable() can be done
from asm, as it is a simple wrapper around __local_bh_enable_ip(), which
is always a callable function.
Link: https://lore.kernel.org/all/ZBBYCSZUJOWBg1s8@localhost.localdomain/
Fixes: 62b95a7b44d1 ("ARM: 9282/1: vfp: Manipulate task VFP state with softirqs disabled")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
In preparation for reimplementing the do_vfp()->vfp_support_entry()
handover in C code, switch to using R3 to pass the 'success' return
address, rather than R9, as it cannot be used for parameter passing.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
Instead of dereferencing thread_info in do_vfp, pass the thread_info
pointer to vfp_support_entry via R1. That way, we only use a single
caller save register, which makes it easier to convert do_vfp to C code
in a subsequent patch.
Note that, unlike the CPU number, which can change due to preemption,
passing the thread_info pointer can safely be done with preemption
enabled.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
In a subsequent patch, we will relax the kernel mode NEON policy, and
permit kernel mode NEON to be used not only from task context, as is
permitted today, but also from softirq context.
Given that softirqs may trigger over the back of any IRQ unless they are
explicitly disabled, we need to address the resulting races in the VFP
state handling, by disabling softirq processing in two distinct but
related cases:
- kernel mode NEON will leave the FPU disabled after it completes, so
any kernel code sequence that enables the FPU and subsequently accesses
its registers needs to disable softirqs until it completes;
- kernel_neon_begin() will preserve the userland VFP state in memory,
and if it interrupts the ordinary VFP state preserve sequence, the
latter will resume execution with the VFP registers corrupted, and
happily continue saving them to memory.
Given that disabling softirqs also disables preemption, we can replace
the existing preempt_disable/enable occurrences in the VFP state
handling asm code with new macros that dis/enable softirqs instead.
In the VFP state handling C code, add local_bh_disable/enable() calls
in those places where the VFP state is preserved.
One thing to keep in mind is that, once we allow NEON use in softirq
context, the result of any such interruption is that the FPEXC_EN bit in
the FPEXC register will be cleared, and vfp_current_hw_state[cpu] will
be NULL. This means that any sequence that [conditionally] clears
FPEXC_EN and/or sets vfp_current_hw_state[cpu] to NULL does not need to
run with softirqs disabled, as the result will be the same. Furthermore,
the handling of THREAD_NOTIFY_SWITCH is guaranteed to run with IRQs
disabled, and so it does not need protection from softirq interruptions
either.
Tested-by: Martin Willi <martin@strongswan.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
We have a number of systems industry-wide that have a subset of their
functionality that works as follows:
1. Receive a message from local kmsg, serial console, or netconsole;
2. Apply a set of rules to classify the message;
3. Do something based on this classification (like scheduling a
remediation for the machine), rinse, and repeat.
As a couple of examples of places we have this implemented just inside
Facebook, although this isn't a Facebook-specific problem, we have this
inside our netconsole processing (for alarm classification), and as part
of our machine health checking. We use these messages to determine
fairly important metrics around production health, and it's important
that we get them right.
While for some kinds of issues we have counters, tracepoints, or metrics
with a stable interface which can reliably indicate the issue, in order
to react to production issues quickly we need to work with the interface
which most kernel developers naturally use when developing: printk.
Most production issues come from unexpected phenomena, and as such
usually the code in question doesn't have easily usable tracepoints or
other counters available for the specific problem being mitigated. We
have a number of lines of monitoring defence against problems in
production (host metrics, process metrics, service metrics, etc), and
where it's not feasible to reliably monitor at another level, this kind
of pragmatic netconsole monitoring is essential.
As one would expect, monitoring using printk is rather brittle for a
number of reasons -- most notably that the message might disappear
entirely in a new version of the kernel, or that the message may change
in some way that the regex or other classification methods start to
silently fail.
One factor that makes this even harder is that, under normal operation,
many of these messages are never expected to be hit. For example, there
may be a rare hardware bug which one wants to detect if it was to ever
happen again, but its recurrence is not likely or anticipated. This
precludes using something like checking whether the printk in question
was printed somewhere fleetwide recently to determine whether the
message in question is still present or not, since we don't anticipate
that it should be printed anywhere, but still need to monitor for its
future presence in the long-term.
This class of issue has happened on a number of occasions, causing
unhealthy machines with hardware issues to remain in production for
longer than ideal. As a recent example, some monitoring around
blk_update_request fell out of date and caused semi-broken machines to
remain in production for longer than would be desirable.
Searching through the codebase to find the message is also extremely
fragile, because many of the messages are further constructed beyond
their callsite (eg. btrfs_printk and other module-specific wrappers,
each with their own functionality). Even if they aren't, guessing the
format and formulation of the underlying message based on the aesthetics
of the message emitted is not a recipe for success at scale, and our
previous issues with fleetwide machine health checking demonstrate as
much.
This provides a solution to the issue of silently changed or deleted
printks: we record pointers to all printk format strings known at
compile time into a new .printk_index section, both in vmlinux and
modules. At runtime, this can then be iterated by looking at
<debugfs>/printk/index/<module>, which emits the following format, both
readable by humans and able to be parsed by machines:
$ head -1 vmlinux; shuf -n 5 vmlinux
# <level[,flags]> filename:line function "format"
<5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
<4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
<6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
<6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
<6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"
This mitigates the majority of cases where we have a highly-specific
printk which we want to match on, as we can now enumerate and check
whether the format changed or the printk callsite disappeared entirely
in userspace. This allows us to catch changes to printks we monitor
earlier and decide what to do about it before it becomes problematic.
There is no additional runtime cost for printk callers or printk itself,
and the assembly generated is exactly the same.
Signed-off-by: Chris Down <chris@chrisdown.name>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Jessica Yu <jeyu@kernel.org> # for module.{c,h}
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name
|
|
There are a couple of problems with the exception entry code that deals
with FP exceptions (which are reported as UND exceptions) when building
the kernel in Thumb2 mode:
- the conditional branch to vfp_kmode_exception in vfp_support_entry()
may be out of range for its target, depending on how the linker decides
to arrange the sections;
- when the UND exception is taken in kernel mode, the emulation handling
logic is entered via the 'call_fpe' label, which means we end up using
the wrong value/mask pairs to match and detect the NEON opcodes.
Since UND exceptions in kernel mode are unlikely to occur on a hot path
(as opposed to the user mode version which is invoked for VFP support
code and lazy restore), we can use the existing undef hook machinery for
any kernel mode instruction emulation that is needed, including calling
the existing vfp_kmode_exception() routine for unexpected cases. So drop
the call to call_fpe, and instead, install an undef hook that will get
called for NEON and VFP instructions that trigger an UND exception in
kernel mode.
While at it, make sure that the PC correction is accurate for the
execution mode where the exception was taken, by checking the PSR
Thumb bit.
Cc: Dmitry Osipenko <digetx@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Fixes: eff8728fe698 ("vmlinux.lds.h: Add PGO and AutoFDO input sections")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
|
|
The integrated assembler of Clang 10 and earlier do not allow to access
the VFP registers through the coprocessor load/store instructions:
arch/arm/vfp/vfpmodule.c:342:2: error: invalid operand for instruction
fmxr(FPEXC, fpexc & ~(FPEXC_EX|FPEXC_DEX|FPEXC_FP2V|FPEXC_VV|FPEXC_TRAP_MASK));
^
arch/arm/vfp/vfpinstr.h:79:6: note: expanded from macro 'fmxr'
asm("mcr p10, 7, %0, " vfpreg(_vfp_) ", cr0, 0 @ fmxr " #_vfp_ ", %0"
^
<inline asm>:1:6: note: instantiated into assembly here
mcr p10, 7, r0, cr8, cr0, 0 @ fmxr FPEXC, r0
^
This has been addressed with Clang 11 [0]. However, to support earlier
versions of Clang and for better readability use of VFP assembler
mnemonics still is preferred.
Ideally we would replace this code with the unified assembler language
mnemonics vmrs/vmsr on call sites along with .fpu assembler directives.
The GNU assembler supports the .fpu directive at least since 2.17 (when
documentation has been added). Since Linux requires binutils 2.21 it is
safe to use .fpu directive. However, binutils does not allow to use
FPINST or FPINST2 as an argument to vmrs/vmsr instructions up to
binutils 2.24 (see binutils commit 16d02dc907c5):
arch/arm/vfp/vfphw.S: Assembler messages:
arch/arm/vfp/vfphw.S:162: Error: operand 0 must be FPSID or FPSCR pr FPEXC -- `vmsr FPINST,r6'
arch/arm/vfp/vfphw.S:165: Error: operand 0 must be FPSID or FPSCR pr FPEXC -- `vmsr FPINST2,r8'
arch/arm/vfp/vfphw.S:235: Error: operand 1 must be a VFP extension System Register -- `vmrs r3,FPINST'
arch/arm/vfp/vfphw.S:238: Error: operand 1 must be a VFP extension System Register -- `vmrs r12,FPINST2'
Use as-instr in Kconfig to check if FPINST/FPINST2 can be used. If they
can be used make use of .fpu directives and UAL VFP mnemonics for
register access.
This allows to build vfpmodule.c with Clang and its integrated assembler.
[0] https://reviews.llvm.org/D59733
Link: https://github.com/ClangBuiltLinux/linux/issues/905
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
|
|
Explicit FPU selection has been introduced in commit 1a6be26d5b1a
("[ARM] Enable VFP to be built when non-VFP capable CPUs are selected")
to make use of assembler mnemonics for VFP instructions.
However, clang currently does not support passing assembler flags
like this and errors out with:
clang-10: error: the clang compiler does not support '-Wa,-mfpu=softvfp+vfp'
Make use of the .fpu assembler directives to select the floating point
hardware selectively. Also use the new unified assembler language
mnemonics. This allows to build these procedures with Clang.
Link: https://github.com/ClangBuiltLinux/linux/issues/762
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
|
|
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Certain ARM CPU implementations (e.g. Cortex-A15) may not raise a
floating- point exception whenever deprecated short-vector VFP
instructions are executed. Instead these instructions are treated
as UNALLOCATED. Change the VFP exception handling code to emulate
short-vector instructions even if FPEXC exception bits are not
set.
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org>
Tested-by: Will Deacon <will.deacon@arm.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
ARMv6 and greater introduced a new instruction ("bx") which can be used
to return from function calls. Recent CPUs perform better when the
"bx lr" instruction is used rather than the "mov pc, lr" instruction,
and this sequence is strongly recommended to be used by the ARM
architecture manual (section A.4.1.1).
We provide a new macro "ret" with all its variants for the condition
code which will resolve to the appropriate instruction.
Rather than doing this piecemeal, and miss some instances, change all
the "mov pc" instances to use the new macro, with the exception of
the "movs" instruction and the kprobes code. This allows us to detect
the "mov pc, lr" case and fix it up - and also gives us the possibility
of deploying this for other registers depending on the CPU selection.
Reported-by: Will Deacon <will.deacon@arm.com>
Tested-by: Stephen Warren <swarren@nvidia.com> # Tegra Jetson TK1
Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> # mioa701_bootresume.S
Tested-by: Andrew Lunn <andrew@lunn.ch> # Kirkwood
Tested-by: Shawn Guo <shawn.guo@freescale.com>
Tested-by: Tony Lindgren <tony@atomide.com> # OMAPs
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com> # Armada XP, 375, 385
Acked-by: Sekhar Nori <nsekhar@ti.com> # DaVinci
Acked-by: Christoffer Dall <christoffer.dall@linaro.org> # kvm/hyp
Acked-by: Haojian Zhuang <haojian.zhuang@gmail.com> # PXA3xx
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> # Xen
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> # ARMv7M
Tested-by: Simon Horman <horms+renesas@verge.net.au> # Shmobile
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
The patch adds asm macros for inc_preempt_count and dec_preempt_count_ti
(which also gets the current thread_info) instead of open-coding them in
arch/arm/vfp/*.S files.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Arun KS <getarunks@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
asm/assembler.h is a better place for this macro since it is used by
asm files outside arch/arm/kernel/
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Arun KS <getarunks@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
The support code in vfp_support_entry does not care whether the
exception that caused it to be invoked occurred in kernel mode or
in user mode. However, neither condition that could trigger this
exception (lazy restore and VFP bounce to support code) is
currently allowable in kernel mode.
In either case, print a message describing the condition before
letting the undefined instruction handler run its course and trigger
an oops.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
|
|
Commit 0cc41e4a21d43 (arch: remove direct definitions of KERN_<LEVEL>
uses) is broken - not enough thought was put into changing:
.asciz "string"
to
.asciz "string1" "string2"
The problem is that each string gets _separately_ NUL terminated, so
the result is a string containing:
"string1\0string2\0"
rather than:
"string1string2\0"
With our new printk levels, this ends up as - eg, KERN_DEBUG "string":
0x01 0x00 0x07 0x00 "string" 0x00
which produces lots of \x01 in the kernel log.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
Patrik Kluba reports that the preempt count becomes invalid due
to the preempt_enable() call being unbalanced with a
preempt_disable() call in the vfp assembly routines. This happens
because preempt_enable() and preempt_disable() update preempt
counts under PREEMPT_COUNT=y but the vfp assembly routines do so
under PREEMPT=y. In a configuration where PREEMPT=n and
DEBUG_ATOMIC_SLEEP=y, PREEMPT_COUNT=y and so the preempt_enable()
call in VFP_bounce() keeps subtracting from the preempt count
until it goes negative.
Fix this by always using PREEMPT_COUNT to decided when to update
preempt counts in the ARM assembly code.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reported-by: Patrik Kluba <pkluba@dension.com>
Tested-by: Patrik Kluba <pkluba@dension.com>
Cc: <stable@vger.kernel.org> # 2.6.30
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
Pull ARM fixes from Russell King:
"This fixes various issues found during July"
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7479/1: mm: avoid NULL dereference when flushing gate_vma with VIVT caches
ARM: Fix undefined instruction exception handling
ARM: 7480/1: only call smp_send_stop() on SMP
ARM: 7478/1: errata: extend workaround for erratum #720789
ARM: 7477/1: vfp: Always save VFP state in vfp_pm_suspend on UP
ARM: 7476/1: vfp: only clear vfp state for current cpu in vfp_pm_suspend
ARM: 7468/1: ftrace: Trace function entry before updating index
ARM: 7467/1: mutex: use generic xchg-based implementation for ARMv6+
ARM: 7466/1: disable interrupt before spinning endlessly
ARM: 7465/1: Handle >4GB memory sizes in device tree and mem=size@start option
|
|
While trying to get a v3.5 kernel booted on the cubox, I noticed that
VFP does not work correctly with VFP bounce handling. This is because
of the confusion over 16-bit vs 32-bit instructions, and where PC is
supposed to point to.
The rule is that FP handlers are entered with regs->ARM_pc pointing at
the _next_ instruction to be executed. However, if the exception is
not handled, regs->ARM_pc points at the faulting instruction.
This is easy for ARM mode, because we know that the next instruction and
previous instructions are separated by four bytes. This is not true of
Thumb2 though.
Since all FP instructions are 32-bit in Thumb2, it makes things easy.
We just need to select the appropriate adjustment. Do this by moving
the adjustment out of do_undefinstr() into the assembly code, as only
the assembly code knows whether it's dealing with a 32-bit or 16-bit
instruction.
Cc: <stable@vger.kernel.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
Add #include <linux/kern_levels.h> so that the #define KERN_<LEVEL> macros
don't have to be duplicated.
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Kay Sievers <kay@vrfy.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix a hole in the VFP thread migration. Lets define two threads.
Thread 1, we'll call 'interesting_thread' which is a thread which is
running on CPU0, using VFP (so vfp_current_hw_state[0] =
&interesting_thread->vfpstate) and gets migrated off to CPU1, where
it continues execution of VFP instructions.
Thread 2, we'll call 'new_cpu0_thread' which is the thread which takes
over on CPU0. This has also been using VFP, and last used VFP on CPU0,
but doesn't use it again.
The following code will be executed twice:
cpu = thread->cpu;
/*
* On SMP, if VFP is enabled, save the old state in
* case the thread migrates to a different CPU. The
* restoring is done lazily.
*/
if ((fpexc & FPEXC_EN) && vfp_current_hw_state[cpu]) {
vfp_save_state(vfp_current_hw_state[cpu], fpexc);
vfp_current_hw_state[cpu]->hard.cpu = cpu;
}
/*
* Thread migration, just force the reloading of the
* state on the new CPU in case the VFP registers
* contain stale data.
*/
if (thread->vfpstate.hard.cpu != cpu)
vfp_current_hw_state[cpu] = NULL;
The first execution will be on CPU0 to switch away from 'interesting_thread'.
interesting_thread->cpu will be 0.
So, vfp_current_hw_state[0] points at interesting_thread->vfpstate.
The hardware state will be saved, along with the CPU number (0) that
it was executing on.
'thread' will be 'new_cpu0_thread' with new_cpu0_thread->cpu = 0.
Also, because it was executing on CPU0, new_cpu0_thread->vfpstate.hard.cpu = 0,
and so the thread migration check is not triggered.
This means that vfp_current_hw_state[0] remains pointing at interesting_thread.
The second execution will be on CPU1 to switch _to_ 'interesting_thread'.
So, 'thread' will be 'interesting_thread' and interesting_thread->cpu now
will be 1. The previous thread executing on CPU1 is not relevant to this
so we shall ignore that.
We get to the thread migration check. Here, we discover that
interesting_thread->vfpstate.hard.cpu = 0, yet interesting_thread->cpu is
now 1, indicating thread migration. We set vfp_current_hw_state[1] to
NULL.
So, at this point vfp_current_hw_state[] contains the following:
[0] = &interesting_thread->vfpstate
[1] = NULL
Our interesting thread now executes a VFP instruction, takes a fault
which loads the state into the VFP hardware. Now, through the assembly
we now have:
[0] = &interesting_thread->vfpstate
[1] = &interesting_thread->vfpstate
CPU1 stops due to ptrace (and so saves its VFP state) using the thread
switch code above), and CPU0 calls vfp_sync_hwstate().
if (vfp_current_hw_state[cpu] == &thread->vfpstate) {
vfp_save_state(&thread->vfpstate, fpexc | FPEXC_EN);
BANG, we corrupt interesting_thread's VFP state by overwriting the
more up-to-date state saved by CPU1 with the old VFP state from CPU0.
Fix this by ensuring that we have sane semantics for the various state
describing variables:
1. vfp_current_hw_state[] points to the current owner of the context
information stored in each CPUs hardware, or NULL if that state
information is invalid.
2. thread->vfpstate.hard.cpu always contains the most recent CPU number
which the state was loaded into or NR_CPUS if no CPU owns the state.
So, for a particular CPU to be a valid owner of the VFP state for a
particular thread t, two things must be true:
vfp_current_hw_state[cpu] == &t->vfpstate && t->vfpstate.hard.cpu == cpu.
and that is valid from the moment a CPU loads the saved VFP context
into the hardware. This gives clear and consistent semantics to
interpreting these variables.
This patch also fixes thread copying, ensuring that t->vfpstate.hard.cpu
is invalidated, otherwise CPU0 may believe it was the last owner. The
hole can happen thus:
- thread1 runs on CPU2 using VFP, migrates to CPU3, exits and thread_info
freed.
- New thread allocated from a previously running thread on CPU2, reusing
memory for thread1 and copying vfp.hard.cpu.
At this point, the following are true:
new_thread1->vfpstate.hard.cpu == 2
&new_thread1->vfpstate == vfp_current_hw_state[2]
Lastly, this also addresses thread flushing in a similar way to thread
copying. Hole is:
- thread runs on CPU0, using VFP, migrates to CPU1 but does not use VFP.
- thread calls execve(), so thread flush happens, leaving
vfp_current_hw_state[0] intact. This vfpstate is memset to 0 causing
thread->vfpstate.hard.cpu = 0.
- thread migrates back to CPU0 before using VFP.
At this point, the following are true:
thread->vfpstate.hard.cpu == 0
&thread->vfpstate == vfp_current_hw_state[0]
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
Rename this branch to more accurately reflect why its taken, rather
than what the following code does. It is the only caller of this code.
This helps to clarify following changes, yet this change results in no
actual code change.
Document the VFP hardware state at the target of this branch.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
Rename the slightly confusing 'last_VFP_context' variable to be more
descriptive of what it actually is. This variable stores a pointer
to the current owner's vfpstate structure for the context held in the
VFP hardware.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
Directives such as .long and .word do not magically cause the
assembler location counter to become aligned in gas. As a result,
using these directives in code sections can result in misaligned
data words when building a Thumb-2 kernel (CONFIG_THUMB2_KERNEL).
This is a Bad Thing, since the ABI permits the compiler to assume
that fundamental types of word size or above are word- aligned when
accessing them from C. If the data is not really word-aligned,
this can cause impaired performance and stray alignment faults in
some circumstances.
In general, the following rules should be applied when using data
word declaration directives inside code sections:
* .quad and .double:
.align 3
* .long, .word, .single, .float:
.align (or .align 2)
* .short:
No explicit alignment required, since Thumb-2
instructions are always 2 or 4 bytes in size.
immediately after an instruction.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Dave Martin <dave.martin@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
vfp_put_double() takes the double value in r0,r1 not r1,r2.
Reported-by: Tarun Kanti DebBarma <tarun.kanti@ti.com>
Cc: <stable@kernel.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
This patch modifies the VFP files for the ARM/Thumb-2 unified
assembly syntax.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
This CPU generates synchronous VFP exceptions in a non-standard way -
the FPEXC.EX bit set but without the FPSCR.IXE bit being set like in the
VFP subarchitecture 1 or just the FPEXC.DEX bit like in VFP
subarchitecture 2. The main problem is that the faulty instruction
(which needs to be emulated in software) will be restarted several times
(normally until a context switch disables the VFP). This patch ensures
that the VFP exception is treated as synchronous.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Nicolas Pitre <nico@cam.org>
|
|
We've observed that ARM VFP state can be corrupted during VFP exception
handling when PREEMPT is enabled. The exact conditions are difficult
to reproduce but appear to occur during VFP exception handling when a
task causes a VFP exception which is handled via VFP_bounce and is then
preempted by yet another task which in turn causes yet another VFP
exception. Since the VFP_bounce code is not preempt safe, VFP state then
becomes corrupt. In order to prevent preemption from occuring while
handling a VFP exception, this patch disables preemption while handling
VFP exceptions.
Signed-off-by: George G. Davis <gdavis@mvista.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
This patch adds ptrace support for setting and getting the VFP registers
using PTRACE_SETVFPREGS and PTRACE_GETVFPREGS. The user_vfp structure
defined in asm/user.h contains 32 double registers (to cover VFPv3 and
Neon hardware) and the FPSCR register.
Cc: Paul Brook <paul@codesourcery.com>
Cc: Daniel Jacobowitz <dan@codesourcery.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
When CONFIG_PM is selected, the VFP code does not have any handler
installed to deal with either saving the VFP state of the current
task, nor does it do anything to try and restore the VFP after a
resume.
On resume, the VFP will have been reset and the co-processor access
control registers are in an indeterminate state (very probably the
CP10 and CP11 the VFP uses will have been disabled by the ARM core
reset). When this happens, resume will break as soon as it tries to
unfreeze the tasks and restart scheduling.
Add a sys device to allow us to hook the suspend call to save the
current thread state if the thread is using VFP and a resume hook
which restores the CP10/CP11 access and ensures the VFP is disabled
so that the lazy swapping will take place on next access.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
On ARMv7, conditional undefined instructions may generate exceptions
even if the condition is not met. The vfphw.S contains the FPINST and
FPINST2 access instructions which may not be present on processors with
synchronous VFP exceptions.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
This declaration specifies the "function" type and size for various
assembly functions, mainly needed for generating the correct branch
instructions in Thumb-2.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
This patch adds the support for VFPv3 (the kernel currently supports
VFPv2). The main difference is 32 double registers (compared to 16).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
This patch allows the VFP support code to run correctly on CPUs
compatible with the common VFP subarchitecture specification (Appendix
B in the ARM ARM v7-A and v7-R edition). It implements support for VFP
subarchitecture 2 while being backwards compatible with
subarchitecture 1.
On VFP subarchitecture 1, the arithmetic exceptions are asynchronous
(or imprecise as described in the old ARM ARM) unless the FPSCR.IXE
bit is 1. The exceptional instructions can be read from FPINST and
FPINST2 registers. With VFP subarchitecture 2, the arithmetic
exceptions can also be synchronous and marked by the FPEXC.DEX bit
(the FPEXC.EX bit is cleared). CPUs implementing the synchronous
arithmetic exceptions don't have the FPINST and FPINST2 registers and
accessing them would trigger and undefined exception.
Note that FPEXC.EX bit has an additional meaning on subarchitecture 1
- if it isn't set, there is no additional information in FPINST and
FPINST2 that needs to be saved at context switch or when lazy-loading
the VFP state of a different thread.
The patch also removes the clearing of the cumulative exception flags in
FPSCR when additional exceptions were raised. It is up to the user
application to clear these bits.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
Use the fpexc abbreviated names instead of long verbose names
for fpexc bits.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
The current lazy saving of the VFP registers is no longer possible
with thread migration on SMP. This patch implements a per-CPU
vfp-state pointer and the saving of the VFP registers at every context
switch. The registers restoring is still performed in a lazy way.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
Patch from Daniel Jacobowitz
vfp_put_double didn't work in a CONFIG_AEABI kernel. By swapping
the arguments, we arrange for them to be in the same place regardless
of ABI. I made the same change to vfp_put_float for consistency.
Signed-off-by: Daniel Jacobowitz <dan@codesourcery.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
Since we pass flags to the compiler to control code generation based
on the least capable selected CPU, if we want to include VFP support,
we must tweak the assembler flags to allow the VFP instructions.
Moreover, we must not use the mrrc/mcrr versions since these will not
be recognised by the assembler.
We do not convert all instructions to the VFP-equivalent (yet) since
binutils appears to barf on "fmrx rn, fpinst" and doesn't provide any
other way (other than using the mrc equivalent) to encode this
instruction - which is rather a problem when you have a VFP
implementation which requires these instructions.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
Patch from Catalin Marinas
This patch changes the double registers numbering to 0-15 from even 0-30,
in preparation for future VFP extensions. It also fixes the VFP_REG_ZERO
bug (value 16 actually represents the 8th double register with the original
numbering).
The original mcrr/mrrc on CP10 were generating FMRRS/FMSRR instead of
FMRRD/FMDRR. The patch changes to CP11 for the correct instructions.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
Patch from Catalin Marinas
The current VFP code corrupts the VFP registers (including the control
ones) if more than one floating point application is executed at the same
time. This patch fixes the updating of the load/store base addresses for
the VFP registers.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!
|