Age | Commit message (Collapse) | Author | Files | Lines |
|
Handle PMD-level mappings the same as PUD ones.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
Add the next level of the pagetable populating function, we handle
chunks around a 1G boundary by mapping them with the lower level
functions - otherwise we use 1G pages for the mappings, thus using as
less amount of pagetable pages as possible.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
This allocates, if necessary, and populates the corresponding PGD entry
with a PUD page. The next population level is a dummy macro which will
be removed by the next patch and it is added here to keep the patch
small and easily reviewable but not break bisection, at the same time.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
This is preparatory work in order to be able to map pages into a
specified PGD and not implicitly and only into init_mm.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
... and lose one #ifdef .. #endif sandwich.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
It's incredibly difficult to diagnose early EFI boot issues without
special hardware because earlyprintk=vga doesn't work on EFI systems.
Add support for writing to the EFI framebuffer, via earlyprintk=efi,
which will actually give users a chance of providing debug output.
Cc: H. Peter Anvin <hpa@zytor.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Jones <pjones@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
Incorrect use of 0 in terminating entry of arch_tables[] causes the
following sparse warning,
arch/x86/platform/efi/efi.c:74:27: sparse: Using plain integer as NULL pointer
Replace with NULL.
Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
[ Included sparse warning in commit message. ]
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
The problem in efi_main was that the idt was cleared before the
interrupts were disabled.
The UEFI spec states that interrupts aren't used so this shouldn't be
too much of a problem. Peripherals however don't necessarily know about
this and thus might cause interrupts to happen anyway. Even if
ExitBootServices() has been called.
This means there is a risk of an interrupt being triggered while the IDT
register is nullified and the interrupt bit hasn't been cleared,
allowing for a triple fault.
This patch disables the interrupt flag, while leaving the existing IDT
in place. The CPU won't care about the IDT at all as long as the
interrupt bit is off, so it's safe to leave it in place as nothing will
ever happen to it.
[ Removed the now unused 'idt' variable - Matt ]
Signed-off-by: Bart Kuivenhoven <bemk@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
This patch fixes a problem with EFI memory maps larger than 128 entries
when booting using the EFI stub, which results in overflowing e820_map
in boot_params and an eventual halt when checking the map size in
sanitize_e820_map().
If the number of map entries is greater than what can fit in e820_map,
add the extra entries to the setup_data list using type SETUP_E820_EXT.
These extra entries are then picked up when the setup_data list is
parsed in parse_e820_ext().
Signed-off-by: Linn Crosetto <linn@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
The handle_cmdline_files now takes the option to handle as a string,
and returns the loaded data through parameters, rather than taking
an x86 specific setup_header structure. For ARM, this will be used
to load a device tree blob in addition to initrd images.
Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
Make efi_free() safely callable with size of 0, similar to free() being
callable with NULL pointers, and do nothing in that case.
Remove size checks that this makes redundant. This also avoids some
size checks in the ARM EFI stub code that will be added as well.
Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
Replace the open-coded memory map getting with the
efi_get_memory_map() that is now general enough to use.
Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
Move the open-coded conversion to a shared function for
use by all architectures. Change the allocation to prefer
a high address for ARM, as this is required to avoid conflicts
with reserved regions in low memory. We don't know the specifics
of these regions until after we process the command line and
device tree.
Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
Rename relocate_kernel() to efi_relocate_kernel(), and take
parameters rather than x86 specific structure. Add max_addr
argument as for ARM we have some address constraints that we
need to enforce when relocating the kernel. Add alloc_size
parameter for use by ARM64 which uses an uncompressed kernel,
and needs to allocate space for BSS.
Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
The relocate_kernel() function will be generalized and used
by all architectures, as they all have similar requirements.
Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
Rename them to be more similar, as low_free() could be used to free
memory allocated by both high_alloc() and low_alloc().
high_alloc() -> efi_high_alloc()
low_alloc() -> efi_low_alloc()
low_free() -> efi_free()
Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
Add system table pointer argument to shared EFI stub related functions
so they no longer use a global system table pointer as they did when part
of eboot.c. For the ARM EFI stub this allows us to avoid global
variables completely and thereby not have to deal with GOT fixups.
Not having the EFI stub fixup its GOT, which is shared with the
decompressor, simplifies the relocating of the zImage to a
bootable address.
Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
No code changes made, just moving functions and #define from x86 arch
directory to common location. Code is shared using #include, similar
to how decompression code is shared among architectures.
Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
The x86/AMD64 EFI stubs must use a call wrapper to convert between
the Linux and EFI ABIs, so void pointers are sufficient. For ARM,
the ABIs are compatible, so we can directly invoke the function
pointers. The functions that are used by the ARM stub are updated
to match the EFI definitions.
Also add some EFI types used by EFI functions.
Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
Move efi-stub.txt out of x86 directory and into common directory
in preparation for adding ARM EFI stub support.
Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
efi_lookup_mapped_addr() is a handy utility for other platforms than
x86. Move it from arch/x86 to drivers/firmware. Add memmap pointer
to global efi structure, and initialise it on x86.
Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
Common to (U)EFI support on all platforms is the global "efi" data
structure, and the code that parses the System Table to locate
addresses to populate that structure with.
This patch adds both of these to the global EFI driver code and
removes the local definition of the global "efi" data structure from
the x86 and ia64 code.
Squashed into one big patch to avoid breaking bisection.
Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot fix from Peter Anvin:
"A single very small boot fix for very large memory systems (> 0.5T)"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix boot crash with DEBUG_PAGE_ALLOC=y and more than 512G RAM
|
|
This is the updated version of df54d6fa5427 ("x86 get_unmapped_area():
use proper mmap base for bottom-up direction") that only randomizes the
mmap base address once.
Signed-off-by: Radu Caragea <sinaelgl@gmail.com>
Reported-and-tested-by: Jeff Shorey <shoreyjeff@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Adrian Sendroiu <molecula2788@gmail.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This reverts commit df54d6fa54275ce59660453e29d1228c2b45a826.
The commit isn't necessarily wrong, but because it recalculates the
random mmap_base every time, it seems to confuse user memory allocators
that expect contiguous mmap allocations even when the mmap address isn't
specified.
In particular, the MATLAB Java runtime seems to be unhappy. See
https://bugzilla.kernel.org/show_bug.cgi?id=60774
So we'll want to apply the random offset only once, and Radu has a patch
for that. Revert this older commit in order to apply the other one.
Reported-by: Jeff Shorey <shoreyjeff@gmail.com>
Cc: Radu Caragea <sinaelgl@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
- On ARM did not have balanced calls to get/put_cpu.
- Fix to make tboot + Xen + Linux correctly.
- Fix events VCPU binding issues.
- Fix a vCPU online race where IPIs are sent to not-yet-online vCPU.
* tag 'stable/for-linus-3.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/smp: initialize IPI vectors before marking CPU online
xen/events: mask events when changing their VCPU binding
xen/events: initialize local per-cpu mask for all possible events
x86/xen: do not identity map UNUSABLE regions in the machine E820
xen/arm: missing put_cpu in xen_percpu_init
|
|
An older PVHVM guest (v3.0 based) crashed during vCPU hot-plug with:
kernel BUG at drivers/xen/events.c:1328!
RCU has detected that a CPU has not entered a quiescent state within the
grace period. It needs to send the CPU a reschedule IPI if it is not
offline. rcu_implicit_offline_qs() does this check:
/*
* If the CPU is offline, it is in a quiescent state. We can
* trust its state not to change because interrupts are disabled.
*/
if (cpu_is_offline(rdp->cpu)) {
rdp->offline_fqs++;
return 1;
}
Else the CPU is online. Send it a reschedule IPI.
The CPU is in the middle of being hot-plugged and has been marked online
(!cpu_is_offline()). See start_secondary():
set_cpu_online(smp_processor_id(), true);
...
per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
start_secondary() then waits for the CPU bringing up the hot-plugged CPU to
mark it as active:
/*
* Wait until the cpu which brought this one up marked it
* online before enabling interrupts. If we don't do that then
* we can end up waking up the softirq thread before this cpu
* reached the active state, which makes the scheduler unhappy
* and schedule the softirq thread on the wrong cpu. This is
* only observable with forced threaded interrupts, but in
* theory it could also happen w/o them. It's just way harder
* to achieve.
*/
while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
cpu_relax();
/* enable local interrupts */
local_irq_enable();
The CPU being hot-plugged will be marked active after it has been fully
initialized by the CPU managing the hot-plug. In the Xen PVHVM case
xen_smp_intr_init() is called to set up the hot-plugged vCPU's
XEN_RESCHEDULE_VECTOR.
The hot-plugging CPU is marked online, not marked active and does not have
its IPI vectors set up. rcu_implicit_offline_qs() sees the hot-plugging
cpu is !cpu_is_offline() and tries to send it a reschedule IPI:
This will lead to:
kernel BUG at drivers/xen/events.c:1328!
xen_send_IPI_one()
xen_smp_send_reschedule()
rcu_implicit_offline_qs()
rcu_implicit_dynticks_qs()
force_qs_rnp()
force_quiescent_state()
__rcu_process_callbacks()
rcu_process_callbacks()
__do_softirq()
call_softirq()
do_softirq()
irq_exit()
xen_evtchn_do_upcall()
because xen_send_IPI_one() will attempt to use an uninitialized IRQ for
the XEN_RESCHEDULE_VECTOR.
There is at least one other place that has caused the same crash:
xen_smp_send_reschedule()
wake_up_idle_cpu()
add_timer_on()
clocksource_watchdog()
call_timer_fn()
run_timer_softirq()
__do_softirq()
call_softirq()
do_softirq()
irq_exit()
xen_evtchn_do_upcall()
xen_hvm_callback_vector()
clocksource_watchdog() uses cpu_online_mask to pick the next CPU to handle
a watchdog timer:
/*
* Cycle through CPUs to check if the CPUs stay synchronized
* to each other.
*/
next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
if (next_cpu >= nr_cpu_ids)
next_cpu = cpumask_first(cpu_online_mask);
watchdog_timer.expires += WATCHDOG_INTERVAL;
add_timer_on(&watchdog_timer, next_cpu);
This resulted in an attempt to send an IPI to a hot-plugging CPU that
had not initialized its reschedule vector. One option would be to make
the RCU code check to not check for CPU offline but for CPU active.
As becoming active is done after a CPU is online (in older kernels).
But Srivatsa pointed out that "the cpu_active vs cpu_online ordering has been
completely reworked - in the online path, cpu_active is set *before* cpu_online,
and also, in the cpu offline path, the cpu_active bit is reset in the CPU_DYING
notification instead of CPU_DOWN_PREPARE." Drilling in this the bring-up
path: "[brought up CPU].. send out a CPU_STARTING notification, and in response
to that, the scheduler sets the CPU in the cpu_active_mask. Again, this mask
is better left to the scheduler alone, since it has the intelligence to use it
judiciously."
The conclusion was that:
"
1. At the IPI sender side:
It is incorrect to send an IPI to an offline CPU (cpu not present in
the cpu_online_mask). There are numerous places where we check this
and warn/complain.
2. At the IPI receiver side:
It is incorrect to let the world know of our presence (by setting
ourselves in global bitmasks) until our initialization steps are complete
to such an extent that we can handle the consequences (such as
receiving interrupts without crashing the sender etc.)
" (from Srivatsa)
As the native code enables the interrupts at some point we need to be
able to service them. In other words a CPU must have valid IPI vectors
if it has been marked online.
It doesn't need to handle the IPI (interrupts may be disabled) but needs
to have valid IPI vectors because another CPU may find it in cpu_online_mask
and attempt to send it an IPI.
This patch will change the order of the Xen vCPU bring-up functions so that
Xen vectors have been set up before start_secondary() is called.
It also will not continue to bring up a Xen vCPU if xen_smp_intr_init() fails
to initialize it.
Orabug 13823853
Signed-off-by Chuck Anderson <chuck.anderson@oracle.com>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
If there are UNUSABLE regions in the machine memory map, dom0 will
attempt to map them 1:1 which is not permitted by Xen and the kernel
will crash.
There isn't anything interesting in the UNUSABLE region that the dom0
kernel needs access to so we can avoid making the 1:1 mapping and
treat it as RAM.
We only do this for dom0, as that is where tboot case shows up.
A PV domU could have an UNUSABLE region in its pseudo-physical map
and would need to be handled in another patch.
This fixes a boot failure on hosts with tboot.
tboot marks a region in the e820 map as unusable and the dom0 kernel
would attempt to map this region and Xen does not permit unusable
regions to be mapped by guests.
(XEN) 0000000000000000 - 0000000000060000 (usable)
(XEN) 0000000000060000 - 0000000000068000 (reserved)
(XEN) 0000000000068000 - 000000000009e000 (usable)
(XEN) 0000000000100000 - 0000000000800000 (usable)
(XEN) 0000000000800000 - 0000000000972000 (unusable)
tboot marked this region as unusable.
(XEN) 0000000000972000 - 00000000cf200000 (usable)
(XEN) 00000000cf200000 - 00000000cf38f000 (reserved)
(XEN) 00000000cf38f000 - 00000000cf3ce000 (ACPI data)
(XEN) 00000000cf3ce000 - 00000000d0000000 (reserved)
(XEN) 00000000e0000000 - 00000000f0000000 (reserved)
(XEN) 00000000fe000000 - 0000000100000000 (reserved)
(XEN) 0000000100000000 - 0000000630000000 (usable)
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[v1: Altered the patch and description with domU's with UNUSABLE regions]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
Dave Hansen reported that systems between 500G and 600G RAM
crash early if DEBUG_PAGEALLOC is selected.
> [ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
> [ 0.000000] [mem 0x00000000-0x000fffff] page 4k
> [ 0.000000] BRK [0x02086000, 0x02086fff] PGTABLE
> [ 0.000000] BRK [0x02087000, 0x02087fff] PGTABLE
> [ 0.000000] BRK [0x02088000, 0x02088fff] PGTABLE
> [ 0.000000] init_memory_mapping: [mem 0xe80ee00000-0xe80effffff]
> [ 0.000000] [mem 0xe80ee00000-0xe80effffff] page 4k
> [ 0.000000] BRK [0x02089000, 0x02089fff] PGTABLE
> [ 0.000000] BRK [0x0208a000, 0x0208afff] PGTABLE
> [ 0.000000] Kernel panic - not syncing: alloc_low_page: ran out of memory
It turns out that we missed increasing needed pages in BRK to
mapping initial 2M and [0,1M) when we switched to use the #PF
handler to set memory mappings:
> commit 8170e6bed465b4b0c7687f93e9948aca4358a33b
> Author: H. Peter Anvin <hpa@zytor.com>
> Date: Thu Jan 24 12:19:52 2013 -0800
>
> x86, 64bit: Use a #PF handler to materialize early mappings on demand
Before that, we had the maping from [0,512M) in head_64.S, and we
can spare two pages [0-1M). After that change, we can not reuse
pages anymore.
When we have more than 512M ram, we need an extra page for pgd page
with [512G, 1024g).
Increase pages in BRK for page table to solve the boot crash.
Reported-by: Dave Hansen <dave.hansen@intel.com>
Bisected-by: Dave Hansen <dave.hansen@intel.com>
Tested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: <stable@vger.kernel.org> # v3.9 and later
Link: http://lkml.kernel.org/r/1376351004-4015-1-git-send-email-yinghai@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Two AMD microcode loader fixes and an OLPC firmware support fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, microcode, AMD: Fix early microcode loading
x86, microcode, AMD: Make cpu_has_amd_erratum() use the correct struct cpuinfo_x86
x86: Don't clear olpc_ofw_header when sentinel is detected
|
|
Merge a bunch of fixes from Andrew Morton.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
fs/proc/task_mmu.c: fix buffer overflow in add_page_map()
arch: *: Kconfig: add "kernel/Kconfig.freezer" to "arch/*/Kconfig"
ocfs2: fix null pointer dereference in ocfs2_dir_foreach_blk_id()
x86 get_unmapped_area(): use proper mmap base for bottom-up direction
ocfs2: fix NULL pointer dereference in ocfs2_duplicate_clusters_by_page
ocfs2: Revert 40bd62e to avoid regression in extended allocation
drivers/rtc/rtc-stmp3xxx.c: provide timeout for potentially endless loop polling a HW bit
hugetlb: fix lockdep splat caused by pmd sharing
aoe: adjust ref of head for compound page tails
microblaze: fix clone syscall
mm: save soft-dirty bits on file pages
mm: save soft-dirty bits on swapped pages
memcg: don't initialize kmem-cache destroying work for root caches
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent
Pull AMD microcode fixes from Borislav Petkov:
" Those are basically two fixes which correct the AMD early ucode loader
from accessing cpu_data too early, i.e. before smp_store_cpu_info()
has copied the boot_cpu_data ontop and overwritten an already empty
structure (which we shouldn't access that early in the first place
anyway).
The second patch is kinda largish for that late in the game but it
shouldn't be problematic because we're simply switching from using
cpu_data to use the CPU family number directly and thus again, not use
uninitialized cpu_data structure. "
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
When the stack is set to unlimited, the bottomup direction is used for
mmap-ings but the mmap_base is not used and thus effectively renders
ASLR for mmapings along with PIE useless.
Cc: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Sendroiu <molecula2788@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Andy reported that if file page get reclaimed we lose the soft-dirty bit
if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address get
encoded into pte entry. Thus when #pf happens on such non-present pte
we can restore it back.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Andy Lutomirski reported that if a page with _PAGE_SOFT_DIRTY bit set
get swapped out, the bit is getting lost and no longer available when
pte read back.
To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is saved in
pte entry for the page being swapped out. When such page is to be read
back from a swap cache we check for bit presence and if it's there we
clear it and restore the former _PAGE_SOFT_DIRTY bit back.
One of the problem was to find a place in pte entry where we can save
the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The _PAGE_PSE was
chosen for that, it doesn't intersect with swap entry format stored in
pte.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Two small fixlets"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Add Haswell ULT model number used in Macbook Air and other systems
perf/x86: Fix intel QPI uncore event definitions
|
|
This is only theoretical, but after try_to_wake_up(p) was changed
to check p->state under p->pi_lock the code like
__set_current_state(TASK_INTERRUPTIBLE);
schedule();
can miss a signal. This is the special case of wait-for-condition,
it relies on try_to_wake_up/schedule interaction and thus it does
not need mb() between __set_current_state() and if(signal_pending).
However, this __set_current_state() can move into the critical
section protected by rq->lock, now that try_to_wake_up() takes
another lock we need to ensure that it can't be reordered with
"if (signal_pending(current))" check inside that section.
The patch is actually one-liner, it simply adds smp_wmb() before
spin_lock_irq(rq->lock). This is what try_to_wake_up() already
does by the same reason.
We turn this wmb() into the new helper, smp_mb__before_spinlock(),
for better documentation and to allow the architectures to change
the default implementation.
While at it, kill smp_mb__after_lock(), it has no callers.
Perhaps we can also add smp_mb__before/after_spinunlock() for
prepare_to_wait().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
load_microcode_amd() (and the helper it is using) should not have an
cpu parameter. The microcode loading does not depend on the CPU wrt the
patches loaded since they will end up in a global list for all CPUs
anyway.
The change from cpu to x86family in load_microcode_amd()
now allows to drop the code messing with cpu_data(cpu) from
collect_cpu_info_amd_early(), which is wrong anyway because at that
point the per-cpu cpu_info is not yet setup (These values would later be
overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info()).
Fold the rest of collect_cpu_info_amd_early() into load_ucode_amd_ap(),
because its only used at one place and without the cpuinfo_x86 accesses
it was not much left.
Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com>
[ Fengguang: build fix ]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
[ Boris: adapt it to current tree. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
cpuinfo_x86
cpu_has_amd_erratum() is buggy, because it uses the per-cpu cpu_info
before it is filled by smp_store_boot_cpu_info() / smp_store_cpu_info().
If early microcode loading is enabled its collect_cpu_info_amd_early()
will fill ->x86 and so the fallback to boot_cpu_data is not used. But
->x86_vendor was not filled and is still X86_VENDOR_INTEL resulting in
no errata fixes getting applied and my system hangs on boot.
Using cpu_info in cpu_has_amd_erratum() is wrong anyway: its only
caller init_amd() will have a struct cpuinfo_x86 as parameter and the
set_cpu_bug() that is controlled by cpu_has_amd_erratum() also only uses
that struct.
So pass the struct cpuinfo_x86 from init_amd() to cpu_has_amd_erratum()
and the broken fallback can be dropped.
[ Boris: Drop WARN_ON() since we're called only from init_amd() ]
Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
This one was missed earlier.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1376007983-31616-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
OpenFirmware wasn't quite following the protocol described in boot.txt
and the kernel has detected this through use of the sentinel value
in boot_params. OFW does zero out almost all of the stuff that it should
do, but not the sentinel.
This causes the kernel to clear olpc_ofw_header, which breaks x86 OLPC
support.
OpenFirmware has now been fixed. However, it would be nice if we could
maintain Linux compatibility with old firmware versions. To do that, we just
have to avoid zeroing out olpc_ofw_header.
OFW does not write to any other parts of the header that are being zapped
by the sentinel-detection code, and all users of olpc_ofw_header are
somewhat protected through checking for the OLPC_OFW_SIG magic value
before using it. So this should not cause any problems for anyone.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/20130809221420.618E6FAB03@dev.laptop.org
Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.9+
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Peter Anvin.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, amd, microcode: Fix error path in apply_microcode_amd()
x86, fpu: correct the asm constraints for fxsave, unbreak mxcsr.daz
x86, efi: correct call to free_pages
x86/iommu/vt-d: Expand interrupt remapping quirk to cover x58 chipset
|
|
3cc8e40e8ff8e232a9dd672da81beabd09f87366
"xen/arm: rename xen_secondary_init and run it on every online cpu"
The commit is in v3.10-rc2, the current branch is based on v3.10-rc1.
|
|
John McCalpin reports that the "drs_data" and "ncb_data" QPI
uncore events are missing the "extra bit" and always return zero
values unless the bit is properly set.
More details from him:
According to the Xeon E5-2600 Product Family Uncore Performance
Monitoring Guide, Table 2-94, about 1/2 of the QPI Link Layer events
(including the ones that "perf" calls "drs_data" and "ncb_data") require
that the "extra bit" be set.
This was confusing for a while -- a note at the bottom of page 94 says
that the "extra bit" is bit 16 of the control register.
Unfortunately, Table 2-86 clearly says that bit 16 is reserved and must
be zero. Looking around a bit, I found that bit 21 appears to be the
correct "extra bit", and further investigation shows that "perf" actually
agrees with me:
[root@c560-003.stampede]# cat /sys/bus/event_source/devices/uncore_qpi_0/format/event
config:0-7,21
So the command
# perf -e "uncore_qpi_0/event=drs_data/"
Is the same as
# perf -e "uncore_qpi_0/event=0x02,umask=0x08/"
While it should be
# perf -e "uncore_qpi_0/event=0x102,umask=0x08/"
I confirmed that this last version gives results that agree with the
amount of data that I expected the STREAM benchmark to move across the QPI
link in the second (cross-chip) test of the original script.
Reported-by: John McCalpin <mccalpin@tacc.utexas.edu>
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: zheng.z.yan@intel.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1308021037280.26119@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull MCE fix from Tony Luck:
"Fix a regression in mce-severity.c"
* tag 'please-pull-fix-mce-regression' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
x86/mce: Fix mce regression from recent cleanup
|
|
Fix the build:
arch/x86/platform/ce4100/ce4100.c: In function 'x86_ce4100_early_setup':
arch/x86/platform/ce4100/ce4100.c:165:2: error: 'reboot_type' undeclared (first use in this function)
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Return -1 (like Intels apply_microcode) when the loading fails, also
do not set the active microcode level on failure.
Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Link: http://lkml.kernel.org/r/20130723225823.2e4e7588@googlemail.com
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent
Pull EFI fix from Matt Fleming:
* The size of memory that gets freed by free_pages() needs to be
specified in pages, not bytes - by Roy Franz.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In commit 33d7885b594e169256daef652e8d3527b2298e75
x86/mce: Update MCE severity condition check
We simplified the rules to recognise each classification of recoverable
machine check combining the instruction and data fetch rules into a
single entry based on clarifications in the June 2013 SDM that all
recoverable events would be reported on the unaffected processor with
MCG_STATUS.EIPV=0 and MCG_STATUS.RIPV=1. Unfortunately the simplified
rule has a couple of bugs. Fix them here.
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
GCC will optimize mxcsr_feature_mask_init in arch/x86/kernel/i387.c:
memset(&fx_scratch, 0, sizeof(struct i387_fxsave_struct));
asm volatile("fxsave %0" : : "m" (fx_scratch));
mask = fx_scratch.mxcsr_mask;
if (mask == 0)
mask = 0x0000ffbf;
to
memset(&fx_scratch, 0, sizeof(struct i387_fxsave_struct));
asm volatile("fxsave %0" : : "m" (fx_scratch));
mask = 0x0000ffbf;
since asm statement doesn’t say it will update fx_scratch. As the
result, the DAZ bit will be cleared. This patch fixes it. This bug
dates back to at least kernel 2.6.12.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
|