Age | Commit message (Collapse) | Author | Files | Lines |
|
commit d721b02fd00bf133580f431b82ef37f3b746dfb2 upstream.
Looks like the TSEG lives just above TOUD, stolen comes after TSEG.
The spec seems somewhat self-contradictory in places, in the ESMRAMC
register desctription it says:
TSEG Size:
10=(TOUD + 512 KB) to TOUD
11 =(TOUD + 1 MB) to TOUD
so that agrees with TSEG being at TOUD. But the example given
elsehwere in the spec says:
TOUD equals 62.5 MB = 03E7FFFFh
TSEG selected as 512 KB in size,
Graphics local memory selected as 1 MB in size
General System RAM available in system = 62.5 MB
General system RAM range00000000h to 03E7FFFFh
TSEG address range03F80000h to 03FFFFFFh
TSEG pre-allocated from03F80000h to 03FFFFFFh
Graphics local memory pre-allocated from03E80000h to 03F7FFFFh
so here we have TSEG above stolen.
Real world evidence agrees with the TOUD->TSEG->stolen order however, so
let's fix up the code to account for the TSEG size.
Cc: Taketo Kabe <fdporg@vega.pgw.jp>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Fixes: 0ad98c74e093 ("drm/i915: Determine the stolen memory base address on gen2")
Fixes: a4dff76924fe ("x86/gpu: Add Intel graphics stolen memory quirk for gen2 platforms")
Reported-by: Taketo Kabe <fdporg@vega.pgw.jp>
Tested-by: Taketo Kabe <fdporg@vega.pgw.jp>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96473
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470653919-27251-1-git-send-email-ville.syrjala@linux.intel.com
Link: http://download.intel.com/design/chipsets/datashts/25251405.pdf
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ff8560512b8d4b7ca3ef4fd69166634ac30b2525 upstream.
Apparently trying to poke a disabled or non-existent APIC
leads to a box that doesn't even boot. Let's not do that.
No real clue if this is the right fix, but at least my
P3 machine boots again.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: dyoung@redhat.com
Cc: kexec@lists.infradead.org
Fixes: 2a51fe083eba ("arch/x86: Handle non enumerated CPU after physical hotplug")
Link: http://lkml.kernel.org/r/1477102684-5092-1-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit caef78b6cdeddf4ad364f95910bba6b43b8eb9bf upstream.
Some time ago, we brought our UV BIOS callback code up to speed with the
new EFI memory mapping scheme, in commit:
d1be84a232e3 ("x86/uv: Update uv_bios_call() to use efi_call_virt_pointer()")
By leveraging some changes that I made to a few of the EFI runtime
callback mechanisms, in commit:
80e75596079f ("efi: Convert efi_call_virt() to efi_call_virt_pointer()")
This got everything running smoothly on UV, with the new EFI mapping
code. However, this left one, small loose end, in that EFI_OLD_MEMMAP
(a.k.a. efi=old_map) will no longer work on UV, on kernels that include
the aforementioned changes.
At the time this was not a major issue (in fact, it still really isn't),
but there's no reason that EFI_OLD_MEMMAP *shouldn't* work on our
systems. This commit adds a check into uv_bios_call(), to see if we have
the EFI_OLD_MEMMAP bit set in efi.flags. If it is set, we fall back to
using our old callback method, which uses efi_call() directly on the __va()
of our function pointer.
Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1476928131-170101-1-git-send-email-athorlton@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8678654e3c7ad7b0f4beb03fa89691279cba71f9 upstream.
gcc 7 warns:
arch/x86/kvm/ioapic.c: In function 'kvm_ioapic_reset':
arch/x86/kvm/ioapic.c:597:2: warning: 'memset' used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]
And it is right. Memset whole array using sizeof operator.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
[Added x86 subject tag]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 23446cb66c073b827779e5eb3dec301623299b32 upstream.
Commit:
917db484dc6a ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation")
... fixed up the broken manipulations of max_pfn in the presence of
E820_PRAM ranges.
However, it also broke the sanitize_e820_map() support for not merging
E820_PRAM ranges.
Re-introduce the enabling to keep resource boundaries between
consecutive defined ranges. Otherwise, for example, an environment that
boots with memmap=2G!8G,2G!10G will end up with a single 4G /dev/pmem0
device instead of a /dev/pmem0 and /dev/pmem1 device 2G in size.
Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zhang Yi <yizhan@redhat.com>
Cc: linux-nvdimm@lists.01.org
Fixes: 917db484dc6a ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation")
Link: http://lkml.kernel.org/r/147629530854.10618.10383744751594021268.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 72b4f6a5e903b071f2a7c4eb1418cbe4eefdc344 upstream.
On x86_32, when an interrupt happens from kernel space, SS and SP aren't
pushed and the existing stack is used. So pt_regs is effectively two
words shorter, and the previous stack pointer is normally the memory
after the shortened pt_regs, aka '®s->sp'.
But in the rare case where the interrupt hits right after the stack
pointer has been changed to point to an empty stack, like for example
when call_on_stack() is used, the address immediately after the
shortened pt_regs is no longer on the stack. In that case, instead of
'®s->sp', the previous stack pointer should be retrieved from the
beginning of the current stack page.
kernel_stack_pointer() wants to do that, but it forgets to dereference
the pointer. So instead of returning a pointer to the previous stack,
it returns a pointer to the beginning of the current stack.
Note that it's probably outside of kernel_stack_pointer()'s scope to be
switching stacks at all. The x86_64 version of this function doesn't do
it, and it would be better for the caller to do it if necessary. But
that's a patch for another day. This just fixes the original intent.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0788aa6a23cb ("x86: Prepare removal of previous_esp from i386 thread_info structure")
Link: http://lkml.kernel.org/r/472453d6e9f6a2d4ab16aaed4935f43117111566.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ba6d018e3d2f6a0fad58a668cadf66b2d1f80f59 upstream.
__show_regs() fails to dump the PKRU state when the debug registers are in
their default state because there is a return statement on the debug
register state.
Change the logic to report PKRU value even when debug registers are in
their default state.
Fixes:c0b17b5bd4b7 ("x86/mm/pkeys: Dump PKRU with other kernel registers")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20160910183045.4618-1-nicolas.iooss_linux@m4x.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2a51fe083eba7f99cbda72f5ef90cdf2f4df882c upstream.
When a CPU is physically added to a system then the MADT table is not
updated.
If subsequently a kdump kernel is started on that physically added CPU then
the ACPI enumeration fails to provide the information for this CPU which is
now the boot CPU of the kdump kernel.
As a consequence, generic_processor_info() is not invoked for that CPU so
the number of enumerated processors is 0 and none of the initializations,
including the logical package id management, are performed.
We have code which relies on the correctness of the logical package map and
other information which is initialized via generic_processor_info().
Executing such code will result in undefined behaviour or kernel crashes.
This problem applies only to the kdump kernel because a normal kexec will
switch to the original boot CPU, which is enumerated in MADT, before
jumping into the kexec kernel.
The boot code already has a check for num_processors equal 0 in
prefill_possible_map(). We can use that check as an indicator that the
enumeration of the boot CPU did not happen and invoke generic_processor_info()
for it. That initializes the relevant data for the boot CPU and therefore
prevents subsequent failure.
[ tglx: Refined the code and rewrote the changelog ]
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Fixes: 1f12e32f4cd5 ("x86/topology: Create logical package id")
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: dyoung@redhat.com
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/1475514432-27682-1-git-send-email-prarit@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cff9ab2b291e64259d97add48fe073c081afe4e2 upstream.
The array has a size of MAX_LOCAL_APIC, which can be as large as 32k, so it
can consume up to 128k.
The array has been there forever and was never used for anything useful
other than a version mismatch check which was introduced in 2009.
There is no reason to store the version in an array. The kernel is not
prepared to handle different APIC versions anyway, so the real important
part is to detect a version mismatch and warn about it, which can be done
with a single variable as well.
[ tglx: Massaged changelog ]
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Borislav Petkov <bp@alien8.de>
CC: Brian Gerst <brgerst@gmail.com>
CC: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20160913181232.30815-1-dvlasenk@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f43ea76cf310c3be95cb75ae1350cbe76a8f2380 upstream.
On Penwell SRAM has to be powered on, otherwise it prevents booting.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ca22312dc840 ("x86/platform/intel-mid: Extend PWRMU to support Penwell")
Link: http://lkml.kernel.org/r/20160908103232.137587-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8e522e1d321b12829960c9b26668c92f14c68d7f upstream.
Commit:
ca22312dc840 ("x86/platform/intel-mid: Extend PWRMU to support Penwell")
... enabled the PWRMU driver on platforms based on Intel Penwell, but
unfortunately this is not enough.
Add Intel Penwell ID to pci-mid.c driver as well. To avoid confusion in the
future add a comment to both drivers.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ca22312dc840 ("x86/platform/intel-mid: Extend PWRMU to support Penwell")
Link: http://lkml.kernel.org/r/20160908103232.137587-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f5fbf848303c8704d0e1a1e7cabd08fd0a49552f upstream.
Merrifield2 is actually Moorefield.
Rename it accordingly and drop tail digit from Merrifield1.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160906184254.94440-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d4b05923f579c234137317cdf9a5eb69ddab76d1 upstream.
Our XSAVE features are divided into two categories: those that
generate FPU exceptions, and those that do not. MPX and pkeys do
not generate FPU exceptions and thus can not be used lazily. We
disable them when lazy mode is forced on.
We have a pair of masks to collect these two sets of features, but
XFEATURE_MASK_PKRU was added to the wrong mask: XFEATURE_MASK_LAZY.
Fix it by moving the feature to XFEATURE_MASK_EAGER.
Note: this only causes problem if you boot with lazy FPU mode
(eagerfpu=off) which is *not* the default. It also only affects
hardware which is not currently publicly available. It looks like
eager mode is going away, but we still need this patch applied
to any kernel that has protection keys and lazy mode, which is 4.6
through 4.8 at this point, and 4.9 if the lazy removal isn't sent
to Linus for 4.9.
Fixes: c8df40098451 ("x86/fpu, x86/mm/pkeys: Add PKRU xsave fields and data structures")
Signed-off-by: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20161007162342.28A49813@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit db91aa793ff984ac048e199ea1c54202543952fe upstream.
When a CPU is about to be offlined we call fixup_irqs() that resets IRQ
affinities related to the CPU in question. The same thing is also done when
the system is suspended to S-states like S3 (mem).
For each IRQ we try to complete any on-going move regardless whether the
IRQ is actually part of x86_vector_domain. For each IRQ descriptor we fetch
its chip_data, assume it is of type struct apic_chip_data and manipulate it
by clearing old_domain mask etc. For irq_chips that are not part of the
x86_vector_domain, like those created by various GPIO drivers, will find
their chip_data being changed unexpectly.
Below is an example where GPIO chip owned by pinctrl-sunrisepoint.c gets
corrupted after resume:
# cat /sys/kernel/debug/gpio
gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
gpio-511 ( |sysfs ) in hi
# rtcwake -s10 -mmem
<10 seconds passes>
# cat /sys/kernel/debug/gpio
gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
gpio-511 ( |sysfs ) in ?
Note '?' in the output. It means the struct gpio_chip ->get function is
NULL whereas before suspend it was there.
Fix this by first checking that the IRQ belongs to x86_vector_domain before
we try to use the chip_data as struct apic_chip_data.
Reported-and-tested-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Link: http://lkml.kernel.org/r/20161003101708.34795-1-mika.westerberg@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 917db484dc6a69969d317b3e57add4208a8d9d42 upstream.
In commit:
ec776ef6bbe1 ("x86/mm: Add support for the non-standard protected e820 type")
Christoph references the original patch I wrote implementing pmem support.
The intent of the 'max_pfn' changes in that commit were to enable persistent
memory ranges to be covered by the struct page memmap by default.
However, that approach was abandoned when Christoph ported the patches [1], and
that functionality has since been replaced by devm_memremap_pages().
In the meantime, this max_pfn manipulation is confusing kdump [2] that
assumes that everything covered by the max_pfn is "System RAM". This
results in kdump hanging or crashing.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-March/000348.html
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1351098
So fix it.
Reported-by: Zhang Yi <yizhan@redhat.com>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Tested-by: Zhang Yi <yizhan@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-nvdimm@lists.01.org
Fixes: ec776ef6bbe1 ("x86/mm: Add support for the non-standard protected e820 type")
Link: http://lkml.kernel.org/r/147448744538.34910.11287693517367139607.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a6a198bc60e6c980a56eca24d33dc7f29139f8ea upstream.
Early during boot topology_update_package_map() computes
logical_pkg_ids for all present processors.
Later, when processors are brought up, identify_cpu() updates
these values based on phys_pkg_id which is a function of
initial_apicid. On PV guests the latter may point to a
non-existing node, causing logical_pkg_ids to be set to -1.
Intel's RAPL uses logical_pkg_id (as topology_logical_package_id())
to index its arrays and therefore in this case will point to index
65535 (since logical_pkg_id is a u16). This could lead to either a
crash or may actually access random memory location.
As a workaround, we recompute topology during CPU bringup to reset
logical_pkg_id to a valid value.
(The reason for initial_apicid being bogus is because it is
initial_apicid of the processor from which the guest is launched.
This value is CPUID(1).EBX[31:24])
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This warning:
WARNING: CPU: 0 PID: 3331 at arch/x86/entry/common.c:45 enter_from_user_mode+0x32/0x50
CPU: 0 PID: 3331 Comm: ldt_gdt_64 Not tainted 4.8.0-rc7+ #13
Call Trace:
dump_stack+0x99/0xd0
__warn+0xd1/0xf0
warn_slowpath_null+0x1d/0x20
enter_from_user_mode+0x32/0x50
error_entry+0x6d/0xc0
? general_protection+0x12/0x30
? native_load_gs_index+0xd/0x20
? do_set_thread_area+0x19c/0x1f0
SyS_set_thread_area+0x24/0x30
do_int80_syscall_32+0x7c/0x220
entry_INT80_compat+0x38/0x50
... can be reproduced by running the GS testcase of the ldt_gdt test unit in
the x86 selftests.
do_int80_syscall_32() will call enter_form_user_mode() to convert context
tracking state from user state to kernel state. The load_gs_index() call
can fail with user gsbase, gsbase will be fixed up and proceed if this
happen.
However, enter_from_user_mode() will be called again in the fixed up path
though it is context tracking kernel state currently.
This patch fixes it by just fixing up gsbase and telling lockdep that IRQs
are off once load_gs_index() failed with user gsbase.
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1475197266-3440-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Otherwise arch_task_struct_size == 0 and we die. While we're at it,
set X86_FEATURE_ALWAYS, too.
Reported-by: David Saggiorato <david@saggiorato.net>
Tested-by: David Saggiorato <david@saggiorato.net>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: aaeb5c01c5b ("x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86")
Link: http://lkml.kernel.org/r/8de723afbf0811071185039f9088733188b606c9.1475103911.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
We need to call GET_LE to read hdr->e_type.
Fixes: 57f90c3dfc75 ("x86/vdso: Error out if the vDSO isn't a valid DSO")
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: linux-next@vger.kernel.org
Link: http://lkml.kernel.org/r/20160929193442.GA16617@gate.crashing.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The condition for reading CR4 was wrong: there are some CPUs with
CPUID but not CR4. Rather than trying to make the condition exact,
use __read_cr4_safe().
Fixes: 18bc7bd523e0 ("x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly")
Reported-by: david@saggiorato.net
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Link: http://lkml.kernel.org/r/8c453a61c4f44ab6ff43c29780ba04835234d2e5.1475178369.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
cr4_init_shadow() will panic on 486-like machines without CR4. Fix
it using __read_cr4_safe().
Reported-by: david@saggiorato.net
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4")
Link: http://lkml.kernel.org/r/43a20f81fb504013bf613913dc25574b45336a61.1475091074.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"Three fixlets for perf:
- add a missing NULL pointer check in the intel BTS driver
- make BTS an exclusive PMU because BTS can only handle one event at
a time
- ensure that exclusive events are limited to one PMU so that several
exclusive events can be scheduled on different PMU instances"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Limit matching exclusive events to one PMU
perf/x86/intel/bts: Make it an exclusive PMU
perf/x86/intel/bts: Make sure debug store is valid
|
|
Just like intel_pt, intel_bts can only handle one event at a time,
which is the reason we introduced PERF_PMU_CAP_EXCLUSIVE in the first
place. However, at the moment one can have as many intel_bts events
within the same context at the same time as one pleases. Only one of
them, however, will get scheduled and receive the actual trace data.
Fix this by making intel_bts an "exclusive" PMU.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160920154811.3255-2-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Since commit 4d4c47412464 ("perf/x86/intel/bts: Fix BTS PMI detection")
my box goes boom on boot:
| .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7
| BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
| IP: [<ffffffff8100c463>] intel_bts_interrupt+0x43/0x130
| Call Trace:
| <NMI> d [<ffffffff8100b341>] intel_pmu_handle_irq+0x51/0x4b0
| [<ffffffff81004d47>] perf_event_nmi_handler+0x27/0x40
This happens because the code introduced in this commit dereferences the
debug store pointer unconditionally. The debug store is not guaranteed to
be available, so a NULL pointer check as on other places is required.
Fixes: 4d4c47412464 ("perf/x86/intel/bts: Fix BTS PMI detection")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: vince@deater.net
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20160920131220.xg5pbdjtznszuyzb@breakpoint.cc
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Waiman reported that booting with CONFIG_EFI_MIXED enabled on his
multi-terabyte HP machine results in boot crashes, because the EFI
region mapping functions loop forever while trying to map those
regions describing RAM.
While this patch doesn't fix the underlying hang, there's really no
reason to map EFI_CONVENTIONAL_MEMORY regions into the EFI page tables
when mixed-mode is not in use at runtime.
Reported-by: Waiman Long <waiman.long@hpe.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
CC: Theodore Ts'o <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: <stable@vger.kernel.org> # v4.6+
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
|
|
There's a mixture of signed 32-bit and unsigned 32-bit and 64-bit data
types used for keeping track of how many pages have been mapped.
This leads to hangs during boot when mapping large numbers of pages
(multiple terabytes, as reported by Waiman) because those values are
interpreted as being negative.
commit 742563777e8d ("x86/mm/pat: Avoid truncation when converting
cpa->numpages to address") fixed one of those bugs, but there is
another lurking in __change_page_attr_set_clr().
Additionally, the return value type for the populate_*() functions can
return negative values when a large number of pages have been mapped,
triggering the error paths even though no error occurred.
Consistently use 64-bit types on 64-bit platforms when counting pages.
Even in the signed case this gives us room for regions 8PiB
(pebibytes) in size whilst still allowing the usual negative value
error checking idiom.
Reported-by: Waiman Long <waiman.long@hpe.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
CC: Theodore Ts'o <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"A couple of small fixes to x86 perf drivers:
- Measure L2 for HW_CACHE* events on AMD
- Fix the address filter handling in the intel/pt driver
- Handle the BTS disabling at the proper place"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
perf/x86/intel/pt: Do validate the size of a kernel address filter
perf/x86/intel/pt: Fix kernel address filter's offset validation
perf/x86/intel/pt: Fix an off-by-one in address filter configuration
perf/x86/intel: Don't disable "intel_bts" around "intel" event batching
|
|
While the Intel PMU monitors the LLC when perf enables the
HW_CACHE_REFERENCES and HW_CACHE_MISSES events, these events monitor
L1 instruction cache fetches (0x0080) and instruction cache misses
(0x0081) on the AMD PMU.
This is extremely confusing when monitoring the same workload across
Intel and AMD machines, since parameters like,
$ perf stat -e cache-references,cache-misses
measure completely different things.
Instead, make the AMD PMU measure instruction/data cache and TLB fill
requests to the L2 and instruction/data cache and TLB misses in the L2
when HW_CACHE_REFERENCES and HW_CACHE_MISSES are enabled,
respectively. That way the events measure unified caches on both
platforms.
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1472044328-21302-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Right now, the kernel address filters in PT are prone to integer overflow
that may happen in adding filter's size to its offset to obtain the end
of the range. Such an overflow would also throw a #GP in the PT event
configuration path.
Fix this by explicitly validating the result of this calculation.
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org # v4.7
Cc: stable@vger.kernel.org#v4.7
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915151352.21306-4-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The kernel_ip() filter is used mostly by the DS/LBR code to look at the
branch addresses, but Intel PT also uses it to validate the address
filter offsets for kernel addresses, for which it is not sufficient:
supplying something in bits 64:48 that's not a sign extension of the lower
address bits (like 0xf00d000000000000) throws a #GP.
This patch adds address validation for the user supplied kernel filters.
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org # v4.7
Cc: stable@vger.kernel.org#v4.7
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915151352.21306-3-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
PT address filter configuration requires that a range is specified by
its first and last address, but at the moment we're obtaining the end
of the range by adding user specified size to its start, which is off
by one from what it actually needs to be.
Fix this and make sure that zero-sized filters don't pass the filter
validation.
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org # v4.7
Cc: stable@vger.kernel.org#v4.7
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915151352.21306-2-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Pull kvm fix from Paolo Bonzini:
"One fix for an x86 regression in VM migration, mostly visible with
Windows because it uses RTC periodic interrupts"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: x86: correctly reset dest_map->vector when restoring LAPIC state
|
|
get_user_ex(x, ptr) should zero x on failure. It's not a lot of a leak
(at most we are leaking uninitialized 64bit value off the kernel stack,
and in a fairly constrained situation, at that), but the fix is trivial,
so...
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ This sat in different branch from the uaccess fixes since mid-August ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When userspace sends KVM_SET_LAPIC, KVM schedules a check between
the vCPU's IRR and ISR and the IOAPIC redirection table, in order
to re-establish the IOAPIC's dest_map (the list of CPUs servicing
the real-time clock interrupt with the corresponding vectors).
However, __rtc_irq_eoi_tracking_restore_one was forgetting to
set dest_map->vectors. Because of this, the IOAPIC did not process
the real-time clock interrupt EOI, ioapic->rtc_status.pending_eoi
got stuck at a non-zero value, and further RTC interrupts were
reported to userspace as coalesced.
Fixes: 9e4aabe2bb3454c83dac8139cf9974503ee044db
Fixes: 4d99ba898dd0c521ca6cdfdde55c9b58aea3cb3d
Cc: stable@vger.kernel.org
Cc: Joerg Roedel <jroedel@suse.de>
Cc: David Gilbert <dgilbert@redhat.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
At the moment, intel_bts events get disabled from intel PMU's disable
callback, which includes event scheduling transactions of said PMU,
which have nothing to do with intel_bts events.
We do want to keep intel_bts events off inside the PMI handler to
avoid filling up their buffer too soon.
This patch moves intel_bts enabling/disabling directly to the PMI
handler.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915082233.11065-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"Here are two changes for v4.8. The first fixes a "[Firmware Bug]: reg
0x10: invalid BAR (can't size)" warning on Haswell, and the second
fixes a problem in some new runtime suspend functionality we merged
for v4.8. Summary:
Enumeration:
Mark Haswell Power Control Unit as having non-compliant BARs (Bjorn Helgaas)
Power management:
Fix bridge_d3 update on device removal (Lukas Wunner)"
* tag 'pci-v4.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Fix bridge_d3 update on device removal
PCI: Mark Haswell Power Control Unit as having non-compliant BARs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Three fixes:
- AMD microcode loading fix with randomization
- an lguest tooling fix
- and an APIC enumeration boundary condition fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Fix num_processors value in case of failure
tools/lguest: Don't bork the terminal in case of wrong args
x86/microcode/AMD: Fix load of builtin microcode with randomized memory
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"This contains:
- a set of fixes found by directed-random perf fuzzing efforts by
Vince Weaver, Alexander Shishkin and Peter Zijlstra
- a cqm driver crash fix
- an AMD uncore driver use after free fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix PEBSv3 record drain
perf/x86/intel/bts: Kill a silly warning
perf/x86/intel/bts: Fix BTS PMI detection
perf/x86/intel/bts: Fix confused ordering of PMU callbacks
perf/core: Fix aux_mmap_count vs aux_refcount order
perf/core: Fix a race between mmap_close() and set_output() of AUX events
perf/x86/amd/uncore: Prevent use after free
perf/x86/intel/cqm: Check cqm/mbm enabled state in event init
perf/core: Remove WARN from perf_event_read()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"This contains a Xen fix, an arm64 fix and a race condition /
robustization set of fixes related to ExitBootServices() usage and
boundary conditions"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Use efi_exit_boot_services()
efi/libstub: Use efi_exit_boot_services() in FDT
efi/libstub: Introduce ExitBootServices helper
efi/libstub: Allocate headspace in efi_get_memory_map()
efi: Fix handling error value in fdt_find_uefi_params
efi: Make for_each_efi_memory_desc_in_map() cope with running on Xen
|
|
Pull KVM fixes from Paolo Bonzini:
- s390: nested virt fixes (new 4.8 feature)
- x86: fixes for 4.8 regressions
- ARM: two small bugfixes
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm-arm: Unmap shadow pagetables properly
x86, clock: Fix kvm guest tsc initialization
arm: KVM: Fix idmap overlap detection when the kernel is idmap'ed
KVM: lapic: adjust preemption timer correctly when goes TSC backward
KVM: s390: vsie: fix riccbd
KVM: s390: don't use current->thread.fpu.* when accessing registers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"nvdimm fixes for v4.8, two of them are tagged for -stable:
- Fix devm_memremap_pages() to use track_pfn_insert(). Otherwise,
DAX pmd mappings end up with an uncached pgprot, and unusable
performance for the device-dax interface. The device-dax interface
appeared in 4.7 so this is tagged for -stable.
- Fix a couple VM_BUG_ON() checks in the show_smaps() path to
understand DAX pmd entries. This fix is tagged for -stable.
- Fix a mis-merge of the nfit machine-check handler to flip the
polarity of an if() to match the final version of the patch that
Vishal sent for 4.8-rc1. Without this the nfit machine check
handler never detects / inserts new 'badblocks' entries which
applications use to identify lost portions of files.
- For test purposes, fix the nvdimm_clear_poison() path to operate on
legacy / simulated nvdimm memory ranges. Without this fix a test
can set badblocks, but never clear them on these ranges.
- Fix the range checking done by dax_dev_pmd_fault(). This is not
tagged for -stable since this problem is mitigated by specifying
aligned resources at device-dax setup time.
These patches have appeared in a next release over the past week. The
recent rebase you can see in the timestamps was to drop an invalid fix
as identified by the updated device-dax unit tests [1]. The -mm
touches have an ack from Andrew"
[1]: "[ndctl PATCH 0/3] device-dax test for recent kernel bugs"
https://lists.01.org/pipermail/linux-nvdimm/2016-September/006855.html
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm: allow legacy (e820) pmem region to clear bad blocks
nfit, mce: Fix SPA matching logic in MCE handler
mm: fix cache mode of dax pmd mappings
mm: fix show_smap() for zone_device-pmd ranges
dax: fix mapping size check
|
|
Alexander hit the WARN_ON_ONCE(!event) on his Skylake while running
the perf fuzzer.
This means the PEBSv3 record included a status bit for an inactive
event, something that _should_ not happen.
Move the code that filters the status bits against our known PEBS
events up a spot to guarantee we only deal with events we know about.
Further add "continue" statements to the WARN_ON_ONCE()s such that
we'll not die nor generate silly events in case we ever do hit them
again.
Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: stable@vger.kernel.org
Fixes: a3d86542de88 ("perf/x86/intel/pebs: Add PEBSv3 decoding")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
At the moment, intel_bts will WARN() out if there is more than one
event writing to the same ring buffer, via SET_OUTPUT, and will only
send data from one event to a buffer.
There is no reason to have this warning in, so kill it.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160906132353.19887-6-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Since BTS doesn't have a dedicated PMI status bit, the driver needs to
take extra care to check for the condition that triggers it to avoid
spurious NMI warnings.
Regardless of the local BTS context state, the only way of knowing that
the NMI is ours is to compare the write pointer against the interrupt
threshold.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160906132353.19887-5-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The intel_bts driver is using a CPU-local 'started' variable to order
callbacks and PMIs and make sure that AUX transactions don't get messed
up. However, the ordering rules in regard to this variable is a complete
mess, which recently resulted in perf_fuzzer-triggered warnings and
panics.
The general ordering rule that is patch is enforcing is that this
cpu-local variable be set only when the cpu-local AUX transaction is
active; consequently, this variable is to be checked before the AUX
related bits can be touched.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160906132353.19887-4-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
track_pfn_insert() in vmf_insert_pfn_pmd() is marking dax mappings as
uncacheable rendering them impractical for application usage. DAX-pte
mappings are cached and the goal of establishing DAX-pmd mappings is to
attain more performance, not dramatically less (3 orders of magnitude).
track_pfn_insert() relies on a previous call to reserve_memtype() to
establish the expected page_cache_mode for the range. While memremap()
arranges for reserve_memtype() to be called, devm_memremap_pages() does
not. So, teach track_pfn_insert() and untrack_pfn() how to handle
tracking without a vma, and arrange for devm_memremap_pages() to
establish the write-back-cache reservation in the memtype tree.
Cc: <stable@vger.kernel.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Toshi Kani <toshi.kani@hpe.com>
Reported-by: Kai Zhang <kai.ka.zhang@oracle.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The resent conversion of the cpu hotplug support in the uncore driver
introduced a regression due to the way the callbacks are invoked at
initialization time.
The old code called the prepare/starting/online function on each online cpu
as a block. The new code registers the hotplug callbacks in the core for
each state. The core invokes the callbacks at each registration on all
online cpus.
The code implicitely relied on the prepare/starting/online callbacks being
called as combo on a particular cpu, which was not obvious and completely
undocumented.
The resulting subtle wreckage happens due to the way how the uncore code
manages shared data structures for cpus which share an uncore resource in
hardware. The sharing is determined in the cpu starting callback, but the
prepare callback allocates per cpu data for the upcoming cpu because
potential sharing is unknown at this point. If the starting callback finds
a online cpu which shares the hardware resource it takes a refcount on the
percpu data of that cpu and puts the own data structure into a
'free_at_online' pointer of that shared data structure. The online callback
frees that.
With the old model this worked because in a starting callback only one non
unused structure (the one of the starting cpu) was available. The new code
allocates the data structures for all cpus when the prepare callback is
registered.
Now the starting function iterates through all online cpus and looks for a
data structure (skipping its own) which has a matching hardware id. The id
member of the data structure is initialized to 0, but the hardware id can
be 0 as well. The resulting wreckage is:
CPU0 finds a matching id on CPU1, takes a refcount on CPU1 data and puts
its own data structure into CPU1s data structure to be freed.
CPU1 skips CPU0 because the data structure is its allegedly unsued own.
It finds a matching id on CPU2, takes a refcount on CPU1 data and puts
its own data structure into CPU2s data structure to be freed.
....
Now the online callbacks are invoked.
CPU0 has a pointer to CPU1s data and frees the original CPU0 data. So
far so good.
CPU1 has a pointer to CPU2s data and frees the original CPU1 data, which
is still referenced by CPU0 ---> Booom
So there are two issues to be solved here:
1) The id field must be initialized at allocation time to a value which
cannot be a valid hardware id, i.e. -1
This prevents the above scenario, but now CPU1 and CPU2 both stick their
own data structure into the free_at_online pointer of CPU0. So we leak
CPU1s data structure.
2) Fix the memory leak described in #1
Instead of having a single pointer, use a hlist to enqueue the
superflous data structures which are then freed by the first cpu
invoking the online callback.
Ideally we should know the sharing _before_ invoking the prepare callback,
but that's way beyond the scope of this bug fix.
[ tglx: Rewrote changelog ]
Fixes: 96b2bd3866a0 ("perf/x86/amd/uncore: Convert to hotplug state machine")
Reported-and-tested-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20160909160822.lowgmkdwms2dheyv@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
When booting a kvm guest on AMD with the latest kernel the following
messages are displayed in the boot log:
tsc: Unable to calibrate against PIT
tsc: HPET/PMTIMER calibration failed
aa297292d708 ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID")
introduced a change to account for a difference in cpu and tsc frequencies for
Intel SKL processors. Before this change the native tsc set
x86_platform.calibrate_tsc to native_calibrate_tsc() which is a hardware
calibration of the tsc, and in tsc_init() executed
tsc_khz = x86_platform.calibrate_tsc();
cpu_khz = tsc_khz;
The kvm code changed x86_platform.calibrate_tsc to kvm_get_tsc_khz() and
executed the same tsc_init() function. This meant that KVM guests did not
execute the native hardware calibration function.
After aa297292d708, there are separate native calibrations for cpu_khz and
tsc_khz. The code sets x86_platform.calibrate_tsc to native_calibrate_tsc()
which is now an Intel specific calibration function, and
x86_platform.calibrate_cpu to native_calibrate_cpu() which is the "old"
native_calibrate_tsc() function (ie, the native hardware calibration
function).
tsc_init() now does
cpu_khz = x86_platform.calibrate_cpu();
tsc_khz = x86_platform.calibrate_tsc();
if (tsc_khz == 0)
tsc_khz = cpu_khz;
else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz)
cpu_khz = tsc_khz;
The kvm code should not call the hardware initialization in
native_calibrate_cpu(), as it isn't applicable for kvm and it didn't do that
prior to aa297292d708.
This patch resolves this issue by setting x86_platform.calibrate_cpu to
kvm_get_tsc_khz().
v2: I had originally set x86_platform.calibrate_cpu to
cpu_khz_from_cpuid(), however, pbonzini pointed out that the CPUID leaf
in that function is not available in KVM. I have changed the function
pointer to kvm_get_tsc_khz().
Fixes: aa297292d708 ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Len Brown <len.brown@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
If the topology package map check of the APIC ID and the CPU is a failure,
we don't generate the processor info for that APIC ID yet we increase
disabled_cpus by one - which is buggy.
Only increase num_processors once we are sure we don't fail.
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1473214893-16481-1-git-send-email-douly.fnst@cn.fujitsu.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp fixes from Kees Cook:
"Fix UM seccomp vs ptrace, after reordering landed"
* tag 'seccomp-v4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: Remove 2-phase API documentation
um/ptrace: Fix the syscall number update after a ptrace
um/ptrace: Fix the syscall_trace_leave call
|