Age | Commit message (Collapse) | Author | Files | Lines |
|
There are a couple of nasty truncation bugs lurking in the pageattr
code that can be triggered when mapping EFI regions, e.g. when we pass
a cpa->pgd pointer. Because cpa->numpages is a 32-bit value, shifting
left by PAGE_SHIFT will truncate the resultant address to 32-bits.
Viorel-Cătălin managed to trigger this bug on his Dell machine that
provides a ~5GB EFI region which requires 1236992 pages to be mapped.
When calling populate_pud() the end of the region gets calculated
incorrectly in the following buggy expression,
end = start + (cpa->numpages << PAGE_SHIFT);
And only 188416 pages are mapped. Next, populate_pud() gets invoked
for a second time because of the loop in __change_page_attr_set_clr(),
only this time no pages get mapped because shifting the remaining
number of pages (1048576) by PAGE_SHIFT is zero. At which point the
loop in __change_page_attr_set_clr() spins forever because we fail to
map progress.
Hitting this bug depends very much on the virtual address we pick to
map the large region at and how many pages we map on the initial run
through the loop. This explains why this issue was only recently hit
with the introduction of commit
a5caa209ba9c ("x86/efi: Fix boot crash by mapping EFI memmap
entries bottom-up at runtime, instead of top-down")
It's interesting to note that safe uses of cpa->numpages do exist in
the pageattr code. If instead of shifting ->numpages we multiply by
PAGE_SIZE, no truncation occurs because PAGE_SIZE is a UL value, and
so the result is unsigned long.
To avoid surprises when users try to convert very large cpa->numpages
values to addresses, change the data type from 'int' to 'unsigned
long', thereby making it suitable for shifting by PAGE_SHIFT without
any type casting.
The alternative would be to make liberal use of casting, but that is
far more likely to cause problems in the future when someone adds more
code and fails to cast properly; this bug was difficult enough to
track down in the first place.
Reported-and-tested-by: Viorel-Cătălin Răpițeanu <rapiteanu.catalin@gmail.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=110131
Link: http://lkml.kernel.org/r/1454067370-10374-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
For PAE kernels "unsigned long" is not suitable to hold page protection
flags, since _PAGE_NX doesn't fit there. This is the reason for quite a
few W+X pages getting reported as insecure during boot (observed namely
for the entire initrd range).
Fixes: 281d4078be ("x86: Make page cache mode a real type")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <JGross@suse.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/56A7635602000078000CAFF1@prv-mh.provo.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
When we print values, such as @size, we have to understand that
it's derived from [begin .. end] as:
size = end - begin + 1
On the opposite the @end is derived from the rest as:
end = begin + size - 1
Correct the IMR code to print values correctly.
Note that @__end_rodata actually points to the next address
after the aligned .rodata section.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ong, Boon Leong <boon.leong.ong@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1453320821-64328-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Commit a5d90c923bcf ("x86/efi: Quirk out SGI UV") added a quirk
to efi_apply_memmap_quirks to force SGI UV systems to fall back
to the old EFI memmap mechanism. We have a BIOS fix for this
issue on all systems except for UV1. This commit fixes up the
EFI quirk/MMR mapping code so that we only apply the special
case to UV1 hardware.
Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hedi Berriche <hedi@sgi.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1449867585-189233-2-git-send-email-athorlton@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Join string back to make grepping a bit easier. While here,
lowering case for Penwell SoC name in one case to be aligned
with the rest messages.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1452888668-147116-2-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Intel Tangier SoC is known to have 64-bit dual core CPU. Enable
64-bit build for it.
The kernel has been tested on Intel Edison board:
Linux buildroot 4.4.0-next-20160115+ #25 SMP Fri Jan 15 22:03:19 EET 2016 x86_64 GNU/Linux
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 74
model name : Genuine Intel(R) CPU 4000 @ 500MHz
stepping : 8
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1452888668-147116-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
We still can end up with a stale vector due to the following:
CPU0 CPU1 CPU2
lock_vector()
data->move_in_progress=0
sendIPI()
unlock_vector()
set_affinity()
assign_irq_vector()
lock_vector() handle_IPI
move_in_progress = 1 lock_vector()
unlock_vector()
move_in_progress == 1
So we need to serialize the vector assignment against a pending cleanup. The
solution is rather simple now. We not only check for the move_in_progress flag
in assign_irq_vector(), we also check whether there is still a cleanup pending
in the old_domain cpumask. If so, we return -EBUSY to the caller and let him
deal with it. Though we have to be careful in the cpu unplug case. If the
cleanout has not yet completed then the following setaffinity() call would
return -EBUSY. Add code which prevents this.
Full context is here: http://lkml.kernel.org/r/5653B688.4050809@stratus.com
Reported-and-tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160107.207265407@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
First of all there is no point in looking up the irq descriptor again, but we
also need the descriptor for the final cleanup race fix in the next
patch. Make that change seperate. No functional difference.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160107.125211743@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
We want to synchronize new vector assignments with a pending cleanup. Remove a
dying cpu from a pending cleanup mask.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160107.045961667@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
There is no need to allocate a new cpumask for sending the cleanup vector. The
old_domain mask is now protected by the vector_lock, so we can safely remove
the offline cpus from it and send the IPI with the resulting mask.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.967993932@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
send_cleanup_vector() fiddles with the old_domain mask unprotected because it
relies on the protection by the move_in_progress flag. But this is fatal, as
the flag is reset after the IPI has been sent. So a cpu which receives the IPI
can still see the flag set and therefor ignores the cleanup request. If no
other cleanup request happens then the vector stays stale on that cpu and in
case of an irq removal the vector still persists. That can lead to use after
free when the next cleanup IPI happens.
Protect the code with vector_lock and clear move_in_progress before sending
the IPI.
This does not plug the race which Joe reported because:
CPU0 CPU1 CPU2
lock_vector()
data->move_in_progress=0
sendIPI()
unlock_vector()
set_affinity()
assign_irq_vector()
lock_vector() handle_IPI
move_in_progress = 1 lock_vector()
unlock_vector()
move_in_progress == 1
The full fix comes with a later patch.
Reported-and-tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.892412198@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
No point of keeping offline cpus in the cleanup mask.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.808642683@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Reusing an existing vector and assigning a new vector has duplicated
code. Consolidate it.
This is also a preparatory patch for finally plugging the cleanup race.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.721599216@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
In the case that the new vector mask is a subset of the existing mask there is
no point to do a AND operation of currentmask & newmask. The result is
newmask. So we can simply copy the new mask to the current mask and be done
with it. Preparatory patch for further consolidation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.640253454@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
__assign_irq_vector() uses the vector_cpumask which is assigned by
apic->vector_allocation_domain() without doing basic sanity checks. That can
result in a situation where the final assignement of a newly found vector
fails in apic->cpu_mask_to_apicid_and(). So we have to do rollbacks for no
reason.
apic->cpu_mask_to_apicid_and() only fails if
vector_cpumask & requested_cpumask & cpu_online_mask
is empty.
Check for this condition right away and if the result is empty try immediately
the next possible cpu in the requested mask. So in case of a failure the old
setting is unchanged and we can remove the rollback code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.561877324@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Split out the code which advances the target cpu for the search so we can
reuse it for the next patch which adds an early validation check for the
vectormask which we get from the apic.
Add comments while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.484562040@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Use an explicit goto for the cases where we have success in the search/update
and return -ENOSPC if the search loop ends due to no space.
Preparatory patch for fixes. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.403491024@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Function __assign_irq_vector() makes use of apic_chip_data.old_domain as a
temporary buffer, which is in the way of using apic_chip_data.old_domain for
synchronizing the vector cleanup with the vector assignement code.
Use a proper temporary cpumask for this.
[ tglx: Renamed the mask to searched_cpumask for clarity ]
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/1450880014-11741-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
In fixup_irqs() we unconditionally dereference the irq chip of an irq
descriptor. The descriptor might still be valid, but already cleaned up,
i.e. the chip removed. Add a check for this condition.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.236423282@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
There's a race condition between
x86_vector_free_irqs()
{
free_apic_chip_data(irq_data->chip_data);
xxxxx //irq_data->chip_data has been freed, but the pointer
//hasn't been reset yet
irq_domain_reset_irq_data(irq_data);
}
and
smp_irq_move_cleanup_interrupt()
{
raw_spin_lock(&vector_lock);
data = apic_chip_data(irq_desc_get_irq_data(desc));
access data->xxxx // may access freed memory
raw_spin_unlock(&desc->lock);
}
which may cause smp_irq_move_cleanup_interrupt() to access freed memory.
Call irq_domain_reset_irq_data(), which clears the pointer with vector lock
held.
[ tglx: Free memory outside of lock held region. ]
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/1450880014-11741-3-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
setup_ioapic_dest() calls irqchip->irq_set_affinity() completely
unprotected. That's wrong in several aspects:
- it opens a race window where irq_set_affinity() can be interrupted and the
irq chip left in unconsistent state.
- it triggers a lockdep splat when we fix the vector race for 4.3+ because
vector lock is taken with interrupts enabled.
The proper calling convention is irq descriptor lock held and interrupts
disabled.
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Joe Lawrence <joe.lawrence@stratus.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1601140919420.3575@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Originally we calculated ht_nodeid as "ht_nodeid = apicid -
boot_cpu_id;" so presumably it could be negative.
But after commit:
01aaea1afbcd ('x86: introduce initial apicid')
we use c->initial_apicid which is an unsigned short and thus always >= 0.
It causes a static checker warning to test for impossible
conditions so let's remove it.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Link: http://lkml.kernel.org/r/20160113123940.GE19993@mwanda
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
If the clock becomes unstable while we're reading it, we need to
bail. We can do this by simply moving the check into the
seqcount loop.
Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexander Graf <agraf@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/755dcedb17269e1d7ce12a9a713dea303835137e.1451949191.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
My previous comments were still a bit confusing and there was a
typo. Fix it up.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 71b3c126e611 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization")
Link: http://lkml.kernel.org/r/0a0b43cdcdd241c5faaaecfbcc91a155ddedc9a1.1452631609.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Without the reboot=pci method, the iMac 10,1 simply
hangs after printing "Restarting system" at the point
when it should reboot. This fixes it.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1450466646-26663-1-git-send-email-mario.kleiner.de@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Pavel noted that lguest maps the switcher code executable and
read-write. This is a bad idea for any kernel text, but
particularly for text mapped at a fixed address.
Create two vmas, one for the text (PAGE_KERNEL_RX) and another
for the stacks (PAGE_KERNEL). Use VM_NO_GUARD to map them
adjacent (as expected by the rest of the code).
Reported-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
... from the final ELF image's symbol table as they're not
really needed there.
Before:
$ readelf -a vmlinux | grep verify_cpu
43: ffffffff810001a9 0 NOTYPE LOCAL DEFAULT 1 verify_cpu
45: ffffffff8100028f 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_no_longmode
46: ffffffff810001de 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_noamd
47: ffffffff8100022b 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_check
48: ffffffff8100021c 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_clear_xd
49: ffffffff81000263 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_sse_test
50: ffffffff81000296 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_sse_ok
After:
$ readelf -a vmlinux | grep verify_cpu
43: ffffffff810001a9 0 NOTYPE LOCAL DEFAULT 1 verify_cpu
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1451860733-21163-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
When "eagerfpu=off" is given as a command-line input, the kernel
should disable AVX support.
The Task Switched bit used for lazy context switching does not
support AVX. If AVX is enabled without eagerfpu context
switching, one task's AVX state could become corrupted or leak
to other tasks. This is a bug and has bad security implications.
This only affects systems that have AVX/AVX2/AVX512 and this
issue will be found only when one actually uses AVX/AVX2/AVX512
_AND_ does eagerfpu=off.
Reference: Intel Software Developer's Manual Vol. 3A
Sec. 2.5 Control Registers:
TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the
x87 FPU/ MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch
to be delayed until an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4
instruction is actually executed by the new task.
Sec. 13.4.1 Using the TS Flag to Control the Saving of the X87
FPU and SSE State
When the TS flag is set, the processor monitors the instruction
stream for x87 FPU, MMX, SSE instructions. When the processor
detects one of these instructions, it raises a
device-not-available exeception (#NM) prior to executing the
instruction.
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/1452119094-7252-5-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This issue is a fallout from the command-line parsing move.
When "eagerfpu=off" is given as a command-line input, the kernel
should disable MPX support. The decision for turning off MPX was
made in fpu__init_system_ctx_switch(), which is after the
selection of the XSAVE format. This patch fixes it by getting
that decision done earlier in fpu__init_system_xstate().
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/1452119094-7252-4-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
When "noxsave" is given as a command-line input, the kernel
should disable XGETBV1. This issue currently does not cause any
actual problems. XGETBV1 is only useful if we have something
using the 'init optimization' (i.e. xsaveopt, xsaves). We
already clear both of those in fpu__xstate_clear_all_cpu_caps().
But this is good for completeness.
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/1452119094-7252-3-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The function fpu__init_system() is executed before
parse_early_param(). This causes wrong FPU configuration. This
patch fixes this issue by parsing boot_command_line in the
beginning of fpu__init_system().
With all four patches in this series, each parameter disables
features as the following:
eagerfpu=off: eagerfpu, avx, avx2, avx512, mpx
no387: fpu
nofxsr: fxsr, fxsropt, xmm
noxsave: xsave, xsaveopt, xsaves, xsavec, avx, avx2, avx512,
mpx, xgetbv1 noxsaveopt: xsaveopt
noxsaves: xsaves
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/1452119094-7252-2-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Use PAGE_ALIGEND macro in <linux/mm.h> to simplify code.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <guohanjun@huawei.com>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1452565170-11083-1-git-send-email-wangkefeng.wang@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
/proc/meminfo output
In CONFIG_PAGEALLOC_DEBUG=y builds, we disable 2M pages.
Unfortunatly when we split up mappings during boot,
split_page_count() doesn't take this into account, and
starts decrementing an empty direct_pages_count[] level.
This results in /proc/meminfo showing crazy things like:
DirectMap2M: 18446744073709543424 kB
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
"Two changes:
- one to quirk-save/restore certain system MSRs across
suspend/resume, to make certain Intel systems work better
(Chen Yu)
- and also to constify a read only structure (Julia Lawall)"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/calgary: Constify cal_chipset_ops structures
x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
"The main changes in this cycle were:
- make the debugfs 'kernel_page_tables' file read-only, as it only
has read ops. (Borislav Petkov)
- micro-optimize clflush_cache_range() (Chris Wilson)
- swiotlb enhancements, which fixes certain KVM emulated devices
(Igor Mammedov)
- fix an LDT related debug message (Jan Beulich)
- modularize CONFIG_X86_PTDUMP (Kees Cook)
- tone down an overly alarming warning (Laura Abbott)
- Mark variable __initdata (Rasmus Villemoes)
- PAT additions (Toshi Kani)"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Micro-optimise clflush_cache_range()
x86/mm/pat: Change free_memtype() to support shrinking case
x86/mm/pat: Add untrack_pfn_moved for mremap
x86/mm: Drop WARN from multi-BAR check
x86/LDT: Print the real LDT base address
x86/mm/64: Enable SWIOTLB if system has SRAT memory regions above MAX_DMA32_PFN
x86/mm: Introduce max_possible_pfn
x86/mm/ptdump: Make (debugfs)/kernel_page_tables read-only
x86/mm/mtrr: Mark the 'range_new' static variable in mtrr_calc_range_state() as __initdata
x86/mm: Turn CONFIG_X86_PTDUMP into a module
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
"This cleans up the FPU fault handling methods to be more robust, and
moves eligible variables to .init.data"
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Put a few variables in .init.data
x86/fpu: Get rid of xstate_fault()
x86/fpu: Add an XSTATE_OP() macro
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
"The main changes in this cycle were:
- Improved CPU ID handling code and related enhancements (Borislav
Petkov)
- RDRAND fix (Len Brown)"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Replace RDRAND forced-reseed with simple sanity check
x86/MSR: Chop off lower 32-bit value
x86/cpu: Fix MSR value truncation issue
x86/cpu/amd, kvm: Satisfy guest kernel reads of IC_CFG MSR
kvm: Add accessors for guest CPU's family, model, stepping
x86/cpu: Unify CPU family, model, stepping calculation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
"The main changes in this cycle were:
- code patching and cpu_has cleanups (Borislav Petkov)
- paravirt cleanups (Juergen Gross)
- TSC cleanup (Thomas Gleixner)
- ptrace cleanup (Chen Gang)"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arch/x86/kernel/ptrace.c: Remove unused arg_offs_table
x86/mm: Align macro defines
x86/cpu: Provide a config option to disable static_cpu_has
x86/cpufeature: Remove unused and seldomly used cpu_has_xx macros
x86/cpufeature: Cleanup get_cpu_cap()
x86/cpufeature: Move some of the scattered feature bits to x86_capability
x86/paravirt: Remove paravirt ops pmd_update[_defer] and pte_update_defer
x86/paravirt: Remove unused pv_apic_ops structure
x86/tsc: Remove unused tsc_pre_init() hook
x86: Remove unused function cpu_has_ht_siblings()
x86/paravirt: Kill some unused patching functions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
"The main changes in this cycle were:
- vDSO and asm entry improvements (Andy Lutomirski)
- Xen paravirt entry enhancements (Boris Ostrovsky)
- asm entry labels enhancement (Borislav Petkov)
- and other misc changes (Thomas Gleixner, me)"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n
Revert "x86/kvm: On KVM re-enable (e.g. after suspend), update clocks"
x86/entry/64_compat: Make labels local
x86/platform/uv: Include clocksource.h for clocksource_touch_watchdog()
x86/vdso: Enable vdso pvclock access on all vdso variants
x86/vdso: Remove pvclock fixmap machinery
x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap
x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
x86/kvm: On KVM re-enable (e.g. after suspend), update clocks
x86/entry/64: Bypass enter_from_user_mode on non-context-tracking boots
x86/asm: Add asm macros for static keys/jump labels
x86/asm: Error out if asm/jump_label.h is included inappropriately
context_tracking: Switch to new static_branch API
x86/entry, x86/paravirt: Remove the unused usergs_sysret32 PV op
x86/paravirt: Remove the unused irq_enable_sysexit pv op
x86/xen: Avoid fast syscall path for Xen PV guests
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Ingo Molnar:
"The main changes in this cycle were:
- introduce optimized single IPI sending methods on modern APICs
(Linus Torvalds, Thomas Gleixner)
- kexec/crash APIC handling fixes and enhancements (Hidehiro Kawai)
- extend lapic vector saving/restoring to the CMCI (MCE) vector as
well (Juergen Gross)
- various fixes and enhancements (Jake Oshins, Len Brown)"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/irq: Export functions to allow MSI domains in modules
Documentation: Document kernel.panic_on_io_nmi sysctl
x86/nmi: Save regs in crash dump on external NMI
x86/apic: Introduce apic_extnmi command line parameter
kexec: Fix race between panic() and crash_kexec()
panic, x86: Allow CPUs to save registers even if looping in NMI context
panic, x86: Fix re-entrance problem due to panic on NMI
x86/apic: Fix the saving and restoring of lapic vectors during suspend/resume
x86/smpboot: Re-enable init_udelay=0 by default on modern CPUs
x86/smp: Remove single IPI wrapper
x86/apic: Use default send single IPI wrapper
x86/apic: Provide default send single IPI wrapper
x86/apic: Implement single IPI for apic_noop
x86/apic: Wire up single IPI for apic_numachip
x86/apic: Wire up single IPI for x2apic_uv
x86/apic: Implement single IPI for x2apic_phys
x86/apic: Wire up single IPI for bigsmp_apic
x86/apic: Remove pointless indirections from bigsmp_apic
x86/apic: Wire up single IPI for apic_physflat
x86/apic: Remove pointless indirections from apic_physflat
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:
- tickless load average calculation enhancements (Byungchul Park)
- vtime handling enhancements (Frederic Weisbecker)
- scalability improvement via properly aligning a key structure field
(Jiri Olsa)
- various stop_machine() fixes (Oleg Nesterov)
- sched/numa enhancement (Rik van Riel)
- various fixes and improvements (Andi Kleen, Dietmar Eggemann,
Geliang Tang, Hiroshi Shimamoto, Joonwoo Park, Peter Zijlstra,
Waiman Long, Wanpeng Li, Yuyang Du)"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
sched/fair: Fix new task's load avg removed from source CPU in wake_up_new_task()
sched/core: Move sched_entity::avg into separate cache line
x86/fpu: Properly align size in CHECK_MEMBER_AT_END_OF() macro
sched/deadline: Fix the earliest_dl.next logic
sched/fair: Disable the task group load_avg update for the root_task_group
sched/fair: Move the cache-hot 'load_avg' variable into its own cacheline
sched/fair: Avoid redundant idle_cpu() call in update_sg_lb_stats()
sched/core: Move the sched_to_prio[] arrays out of line
sched/cputime: Convert vtime_seqlock to seqcount
sched/cputime: Introduce vtime accounting check for readers
sched/cputime: Rename vtime_accounting_enabled() to vtime_accounting_cpu_enabled()
sched/cputime: Correctly handle task guest time on housekeepers
sched/cputime: Clarify vtime symbols and document them
sched/cputime: Remove extra cost in task_cputime()
sched/fair: Make it possible to account fair load avg consistently
sched/fair: Modify the comment about lock assumptions in migrate_task_rq_fair()
stop_machine: Clean up the usage of the preemption counter in cpu_stopper_thread()
stop_machine: Shift the 'done != NULL' check from cpu_stop_signal_done() to callers
stop_machine: Kill cpu_stop_done->executed
stop_machine: Change __stop_cpus() to rely on cpu_stop_queue_work()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
"Various x86 MCE fixes and small enhancements"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Make usable address checks Intel-only
x86/mce: Add the missing memory error check on AMD
x86/RAS: Remove mce.usable_addr
x86/mce: Do not enter deferred errors into the generic pool twice
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- Intel Knights Landing support. (Harish Chegondi)
- Intel Broadwell-EP uncore PMU support. (Kan Liang)
- Core code improvements. (Peter Zijlstra.)
- Event filter, LBR and PEBS fixes. (Stephane Eranian)
- Enable cycles:pp on Intel Atom. (Stephane Eranian)
- Add cycles:ppp support for Skylake. (Andi Kleen)
- Various x86 NMI overhead optimizations. (Andi Kleen)
- Intel PT enhancements. (Takao Indoh)
- AMD cache events fix. (Vince Weaver)
Tons of tooling changes:
- Show random perf tool tips in the 'perf report' bottom line
(Namhyung Kim)
- perf report now defaults to --group if the perf.data file has
grouped events, try it with:
# perf record -e '{cycles,instructions}' -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.093 MB perf.data (1247 samples) ]
# perf report
# Samples: 1K of event 'anon group { cycles, instructions }'
# Event count (approx.): 1955219195
#
# Overhead Command Shared Object Symbol
2.86% 0.22% swapper [kernel.kallsyms] [k] intel_idle
1.05% 0.33% firefox libxul.so [.] js::SetObjectElement
1.05% 0.00% kworker/0:3 [kernel.kallsyms] [k] gen6_ring_get_seqno
0.88% 0.17% chrome chrome [.] 0x0000000000ee27ab
0.65% 0.86% firefox libxul.so [.] js::ValueToId<(js::AllowGC)1>
0.64% 0.23% JS Helper libxul.so [.] js::SplayTree<js::jit::LiveRange*, js::jit::LiveRange>::splay
0.62% 1.27% firefox libxul.so [.] js::GetIterator
0.61% 1.74% firefox libxul.so [.] js::NativeSetProperty
0.61% 0.31% firefox libxul.so [.] js::SetPropertyByDefining
- Introduce the 'perf stat record/report' workflow:
Generate perf.data files from 'perf stat', to tap into the
scripting capabilities perf has instead of defining a 'perf stat'
specific scripting support to calculate event ratios, etc.
Simple example:
$ perf stat record -e cycles usleep 1
Performance counter stats for 'usleep 1':
1,134,996 cycles
0.000670644 seconds time elapsed
$ perf stat report
Performance counter stats for '/home/acme/bin/perf stat record -e cycles usleep 1':
1,134,996 cycles
0.000670644 seconds time elapsed
$
It generates PERF_RECORD_ userspace records to store the details:
$ perf report -D | grep PERF_RECORD
0xf0 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 27637
0x118 [0x12]: PERF_RECORD_CPU_MAP nr: 1 cpu: 65535
0x12a [0x40]: PERF_RECORD_STAT_CONFIG
0x16a [0x30]: PERF_RECORD_STAT
-1 -1 0x19a [0x40]: PERF_RECORD_MMAP -1/0: [0xffffffff81000000(0x1f000000) @ 0xffffffff81000000]: x [kernel.kallsyms]_text
0x1da [0x18]: PERF_RECORD_STAT_ROUND
[acme@ssdandy linux]$
An effort was made to make perf.data files generated like this to
not generate cryptic messages when processed by older tools.
The 'perf script' bits need rebasing, will go up later.
- Make command line options always available, even when they depend
on some feature being enabled, warning the user about use of such
options (Wang Nan)
- Support hw breakpoint events (mem:0xAddress) in the default output
mode in 'perf script' (Wang Nan)
- Fixes and improvements for supporting annotating ARM binaries,
support ARM call and jump instructions, more work needed to have
arch specific stuff separated into tools/perf/arch/*/annotate/
(Russell King)
- Add initial 'perf config' command, for now just with a --list
command to the contents of the configuration file in use and a
basic man page describing its format, commands for doing edits and
detailed documentation are being reviewed and proof-read. (Taeung
Song)
- Allows BPF scriptlets specify arguments to be fetched using DWARF
info, using a prologue generated at compile/build time (He Kuang,
Wang Nan)
- Allow attaching BPF scriptlets to module symbols (Wang Nan)
- Allow attaching BPF scriptlets to userspace code using uprobe (Wang
Nan)
- BPF programs now can specify 'perf probe' tunables via its section
name, separating key=val values using semicolons (Wang Nan)
Testing some of these new BPF features:
Use case: get callchains when receiving SSL packets, filter then in the
kernel, at arbitrary place.
# cat ssl.bpf.c
#define SEC(NAME) __attribute__((section(NAME), used))
struct pt_regs;
SEC("func=__inet_lookup_established hnum")
int func(struct pt_regs *ctx, int err, unsigned short port)
{
return err == 0 && port == 443;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
#
# perf record -a -g -e ssl.bpf.c
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.787 MB perf.data (3 samples) ]
# perf script | head -30
swapper 0 [000] 58783.268118: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
8572a8 process_backlog (/lib/modules/4.3.0+/build/vmlinux)
856b11 net_rx_action (/lib/modules/4.3.0+/build/vmlinux)
2a284b __do_softirq (/lib/modules/4.3.0+/build/vmlinux)
2a2ba3 irq_exit (/lib/modules/4.3.0+/build/vmlinux)
96b7a4 do_IRQ (/lib/modules/4.3.0+/build/vmlinux)
969807 ret_from_intr (/lib/modules/4.3.0+/build/vmlinux)
2dede5 cpu_startup_entry (/lib/modules/4.3.0+/build/vmlinux)
95d5bc rest_init (/lib/modules/4.3.0+/build/vmlinux)
1163ffa start_kernel ([kernel.vmlinux].init.text)
11634d7 x86_64_start_reservations ([kernel.vmlinux].init.text)
1163623 x86_64_start_kernel ([kernel.vmlinux].init.text)
qemu-system-x86 9178 [003] 58785.792417: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
856660 netif_receive_skb_internal (/lib/modules/4.3.0+/build/vmlinux)
8566ec netif_receive_skb_sk (/lib/modules/4.3.0+/build/vmlinux)
430a br_handle_frame_finish ([bridge])
48bc br_handle_frame ([bridge])
855f44 __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
#
- Use 'perf probe' various options to list functions, see what
variables can be collected at any given point, experiment first
collecting without a filter, then filter, use it together with
'perf trace', 'perf top', with or without callchains, if it
explodes, please tell us!
- Introduce a new callchain mode: "folded", that will list per line
representations of all callchains for a give histogram entry,
facilitating 'perf report' output processing by other tools, such
as Brendan Gregg's flamegraph tools (Namhyung Kim)
E.g:
# perf report | grep -v ^# | head
18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry
|
---cpu_startup_entry
|
|--12.07%--start_secondary
|
--6.30%--rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
#
Becomes, in "folded" mode:
# perf report -g folded | grep -v ^# | head -5
18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry
12.07% cpu_startup_entry;start_secondary
6.30% cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
16.90% 0.00% swapper [kernel.kallsyms] [k] call_cpuidle
11.23% call_cpuidle;cpu_startup_entry;start_secondary
5.67% call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
16.90% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter
11.23% cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
5.67% cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
15.12% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter_state
#
The user can also select one of "count", "period" or "percent" as
the first column.
... and lots of infrastructure enhancements, plus fixes and other
changes, features I failed to list - see the shortlog and the git log
for details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (271 commits)
perf evlist: Add --trace-fields option to show trace fields
perf record: Store data mmaps for dwarf unwind
perf libdw: Check for mmaps also in MAP__VARIABLE tree
perf unwind: Check for mmaps also in MAP__VARIABLE tree
perf unwind: Use find_map function in access_dso_mem
perf evlist: Remove perf_evlist__(enable|disable)_event functions
perf evlist: Make perf_evlist__open() open evsels with their cpus and threads (like perf record does)
perf report: Show random usage tip on the help line
perf hists: Export a couple of hist functions
perf diff: Use perf_hpp__register_sort_field interface
perf tools: Add overhead/overhead_children keys defaults via string
perf tools: Remove list entry from struct sort_entry
perf tools: Include all tools/lib directory for tags/cscope/TAGS targets
perf script: Align event name properly
perf tools: Add missing headers in perf's MANIFEST
perf tools: Do not show trace command if it's not compiled in
perf report: Change default to use event group view
perf top: Decay periods in callchains
tools lib: Move bitmap.[ch] from tools/perf/ to tools/{lib,include}/
tools lib: Sync tools/lib/find_bit.c with the kernel
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"So we have a laundry list of locking subsystem changes:
- continuing barrier API and code improvements
- futex enhancements
- atomics API improvements
- pvqspinlock enhancements: in particular lock stealing and adaptive
spinning
- qspinlock micro-enhancements"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op
futex: Cleanup the goto confusion in requeue_pi()
futex: Remove pointless put_pi_state calls in requeue()
futex: Document pi_state refcounting in requeue code
futex: Rename free_pi_state() to put_pi_state()
futex: Drop refcount if requeue_pi() acquired the rtmutex
locking/barriers, arch: Remove ambiguous statement in the smp_store_mb() documentation
lcoking/barriers, arch: Use smp barriers in smp_store_release()
locking/cmpxchg, arch: Remove tas() definitions
locking/pvqspinlock: Queue node adaptive spinning
locking/pvqspinlock: Allow limited lock stealing
locking/pvqspinlock: Collect slowpath lock statistics
sched/core, locking: Document Program-Order guarantees
locking, sched: Introduce smp_cond_acquire() and use it
locking/pvqspinlock, x86: Optimize the PV unlock code path
locking/qspinlock: Avoid redundant read of next pointer
locking/qspinlock: Prefetch the next node cacheline
locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg()
atomics: Add test for atomic operations with _relaxed variants
|
|
When decompressing kernel image during x86 bootup, malloc memory
for ELF program headers may run out of heap space, which leads
to system halt. This patch doubles BOOT_HEAP_SIZE to 64KB.
Tested with 32-bit kernel which failed to boot without this patch.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
When switch_mm() activates a new PGD, it also sets a bit that
tells other CPUs that the PGD is in use so that TLB flush IPIs
will be sent. In order for that to work correctly, the bit
needs to be visible prior to loading the PGD and therefore
starting to fill the local TLB.
Document all the barriers that make this work correctly and add
a couple that were missing.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"This is the final small set of ARM SoC bug fixes for linux-4.4, almost
all regressions:
OMAP:
- data corruption on the Nokia N900 flash
Allwinner:
- Two defconfig change to get USB working again
ARM Versatile:
- Interrupt numbers gone bad after an older bug fix
Nomadik:
- Crashes from incorrect L2 cache settings
VIA vt8500:
- SD/MMC support on WM8650 never worked"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
dts: vt8500: Add SDHC node to DTS file for WM8650
ARM: Fix broken USB support in multi_v7_defconfig for sunxi devices
ARM: versatile: fix MMC/SD interrupt assignment
ARM: nomadik: set latencies to 8 cycles
ARM: OMAP2+: Fix onenand rate detection to avoid filesystem corruption
ARM: Fix broken USB support in sunxi_defconfig
|
|
Pull KVM fix from Paolo Bonzini:
"A simple fix. I'm sending it before the merge window, because it
refines a patch found in your master branch but not yet in the
kvm/next branch that is destined for 4.5"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: x86: only channel 0 of the i8254 is linked to the HPET
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"A handful of x86 fixes:
- a syscall ABI fix, fixing an Android breakage
- a Xen PV guest fix relating to the RTC device, causing a
non-working console
- a Xen guest syscall stack frame fix
- an MCE hotplug CPU crash fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/numachip: Fix NumaConnect2 MMCFG PCI access
x86/entry: Restore traditional SYSENTER calling convention
x86/entry: Fix some comments
x86/paravirt: Prevent rtc_cmos platform device init on PV guests
x86/xen: Avoid fast syscall path for Xen PV guests
x86/mce: Ensure offline CPUs don't participate in rendezvous process
|