Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"The interrupt departement provides:
- A mechanism to shield isolated tasks from managed interrupts:
The affinity of managed interrupts is completely controlled by the
kernel and user space has no influence on them. The reason is that
the automatically assigned affinity correlates to the multi-queue
CPU handling of block devices.
If the generated affinity mask spaws both housekeeping and isolated
CPUs the interrupt could be routed to an isolated CPU which would
then be disturbed by I/O submitted by a housekeeping CPU.
The new mechamism ensures that as long as one housekeeping CPU is
online in the assigned affinity mask the interrupt is routed to a
housekeeping CPU.
If there is no online housekeeping CPU in the affinity mask, then
the interrupt is routed to an isolated CPU to keep the device queue
intact, but unless the isolated CPU submits I/O by itself these
interrupts are not raised.
- A small addon to the device tree irqdomain core code to avoid
duplication in irq chip drivers
- Conversion of the SiFive PLIC to hierarchical domains
- The usual pile of new irq chip drivers: SiFive GPIO, Aspeed SCI,
NXP INTMUX, Meson A1 GPIO
- The first cut of support for the new ARM GICv4.1
- The usual pile of fixes and improvements in core and driver code"
* tag 'irq-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
genirq, sched/isolation: Isolate from handling managed interrupts
irqchip/gic-v4.1: Allow direct invalidation of VLPIs
irqchip/gic-v4.1: Suppress per-VLPI doorbell
irqchip/gic-v4.1: Add VPE INVALL callback
irqchip/gic-v4.1: Add VPE eviction callback
irqchip/gic-v4.1: Add VPE residency callback
irqchip/gic-v4.1: Add mask/unmask doorbell callbacks
irqchip/gic-v4.1: Plumb skeletal VPE irqchip
irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP
irqchip/gic-v4.1: Don't use the VPE proxy if RVPEID is set
irqchip/gic-v4.1: Implement the v4.1 flavour of VMAPP
irqchip/gic-v4.1: VPE table (aka GICR_VPROPBASER) allocation
irqchip/gic-v3: Add GICv4.1 VPEID size discovery
irqchip/gic-v3: Detect GICv4.1 supporting RVPEID
irqchip/gic-v3-its: Fix get_vlpi_map() breakage with doorbells
irqdomain: Fix a memory leak in irq_domain_push_irq()
irqchip: Add NXP INTMUX interrupt multiplexer support
dt-bindings: interrupt-controller: Add binding for NXP INTMUX interrupt multiplexer
irqchip: Define EXYNOS_IRQ_COMBINER
irqchip/meson-gpio: Add support for meson a1 SoCs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core SMP updates from Thomas Gleixner:
"A small set of SMP core code changes:
- Rework the smp function call core code to avoid the allocation of
an additional cpumask
- Remove the not longer required GFP argument from on_each_cpu_cond()
and on_each_cpu_cond_mask() and fixup the callers"
* tag 'smp-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp: Remove allocation mask from on_each_cpu_cond.*()
smp: Add a smp_cond_func_t argument to smp_call_function_many()
smp: Use smp_cond_func_t as type for the conditional function
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"The timekeeping and timers departement provides:
- Time namespace support:
If a container migrates from one host to another then it expects
that clocks based on MONOTONIC and BOOTTIME are not subject to
disruption. Due to different boot time and non-suspended runtime
these clocks can differ significantly on two hosts, in the worst
case time goes backwards which is a violation of the POSIX
requirements.
The time namespace addresses this problem. It allows to set offsets
for clock MONOTONIC and BOOTTIME once after creation and before
tasks are associated with the namespace. These offsets are taken
into account by timers and timekeeping including the VDSO.
Offsets for wall clock based clocks (REALTIME/TAI) are not provided
by this mechanism. While in theory possible, the overhead and code
complexity would be immense and not justified by the esoteric
potential use cases which were discussed at Plumbers '18.
The overhead for tasks in the root namespace (ie where host time
offsets = 0) is in the noise and great effort was made to ensure
that especially in the VDSO. If time namespace is disabled in the
kernel configuration the code is compiled out.
Kudos to Andrei Vagin and Dmitry Sofanov who implemented this
feature and kept on for more than a year addressing review
comments, finding better solutions. A pleasant experience.
- Overhaul of the alarmtimer device dependency handling to ensure
that the init/suspend/resume ordering is correct.
- A new clocksource/event driver for Microchip PIT64
- Suspend/resume support for the Hyper-V clocksource
- The usual pile of fixes, updates and improvements mostly in the
driver code"
* tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n
alarmtimer: Use wakeup source from alarmtimer platform device
alarmtimer: Make alarmtimer platform device child of RTC device
alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality
hrtimer: Add missing sparse annotation for __run_timer()
lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres()
MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel
clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC
clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources
clocksource/drivers/timer-microchip-pit64b: Fix sparse warning
clocksource/drivers/exynos_mct: Rename Exynos to lowercase
clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access
clocksource/drivers/timer-ti-dm: Switch to platform_get_irq
clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource
clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe
clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource
clocksource/drivers/bcm2835_timer: Fix memory leak of timer
clocksource/drivers/cadence-ttc: Use ttc driver as platform driver
clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support
clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull watchdog updates from Thomas Gleixner:
"A set of watchdog/softlockup related improvements:
- Enforce that the watchdog timestamp is always valid on boot. The
original implementation caused a watchdog disabled gap of one
second in the boot process due to truncation of the underlying
sched clock.
The sched clock is divided by 1e9 to convert nanoseconds to
seconds. So for the first second of the boot process the result is
0 which is at the same time the indicator to disable the watchdog.
The trivial fix is to change the disabled indicator to ULONG_MAX.
- Two cleanup patches removing unused and redundant code which got
forgotten to be cleaned up in previous changes"
* tag 'core-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
watchdog/softlockup: Enforce that timestamp is valid on boot
watchdog/softlockup: Remove obsolete check of last reported task
watchdog: Remove soft_lockup_hrtimer_cnt and related code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Two fixes for the generic VDSO code which missed 5.5:
- Make the update to the coarse timekeeper unconditional.
This is required because the coarse timekeeper interfaces in the
VDSO do not depend on a VDSO capable clocksource. If the system
does not have a VDSO capable clocksource and the update is
depending on the VDSO capable clocksource, the coarse VDSO
interfaces would operate on stale data forever.
- Invert the logic of __arch_update_vdso_data() to avoid further head
scratching.
Tripped over this several times while analyzing the update problem
above"
* tag 'timers-urgent-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lib/vdso: Update coarse timekeeper unconditionally
lib/vdso: Make __arch_update_vdso_data() logic understandable
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit update from Paul Moore:
"One small audit patch for the Linux v5.6 merge window, and
unsurprisingly it passes our test suite with flying colors"
* tag 'audit-pr-20200127' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: Add __rcu annotation to RCU pointer
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- cgroup2 interface for hugetlb controller. I think this was the last
remaining bit which was missing from cgroup2
- fixes for race and a spurious warning in threaded cgroup handling
- other minor changes
* 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
iocost: Fix iocost_monitor.py due to helper type mismatch
cgroup: Prevent double killing of css when enabling threaded cgroup
cgroup: fix function name in comment
mm: hugetlb controller for cgroups v2
|
|
Pull workqueue updates from Tejun Heo:
"Just a couple tracepoint patches"
* 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: remove workqueue_work event class
workqueue: add worker function to workqueue_execute_end tracepoint
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add ACPI support to the intel_idle driver along with an admin
guide document for it, add support for CPR (Core Power Reduction) to
the AVS (Adaptive Voltage Scaling) subsystem, add new hardware support
in a few places, add some new sysfs attributes, debugfs files and
tracepoints, fix bugs and clean up a bunch of things all over.
Specifics:
- Update the ACPI processor driver in order to export
acpi_processor_evaluate_cst() to the code outside of it, add ACPI
support to the intel_idle driver based on that and clean up that
driver somewhat (Rafael Wysocki).
- Add an admin guide document for the intel_idle driver (Rafael
Wysocki).
- Clean up cpuidle core and drivers, enable compilation testing for
some of them (Benjamin Gaignard, Krzysztof Kozlowski, Rafael
Wysocki, Yangtao Li).
- Fix reference counting of OPP (operating performance points) table
structures (Viresh Kumar).
- Add support for CPR (Core Power Reduction) to the AVS (Adaptive
Voltage Scaling) subsystem (Niklas Cassel, Colin Ian King,
YueHaibing).
- Add support for TigerLake Mobile and JasperLake to the Intel RAPL
power capping driver (Zhang Rui).
- Update cpufreq drivers:
- Add i.MX8MP support to imx-cpufreq-dt (Anson Huang).
- Fix usage of a macro in loongson2_cpufreq (Alexandre Oliva).
- Fix cpufreq policy reference counting issues in s3c and
brcmstb-avs (chenqiwu).
- Fix ACPI table reference counting issue and HiSilicon quirk
handling in the CPPC driver (Hanjun Guo).
- Clean up spelling mistake in intel_pstate (Harry Pan).
- Convert the kirkwood and tegra186 drivers to using
devm_platform_ioremap_resource() (Yangtao Li).
- Update devfreq core:
- Add 'name' sysfs attribute for devfreq devices (Chanwoo Choi).
- Clean up the handing of transition statistics and allow them to
be reset by writing 0 to the 'trans_stat' devfreq device
attribute in sysfs (Kamil Konieczny).
- Add 'devfreq_summary' to debugfs (Chanwoo Choi).
- Clean up kerneldoc comments and Kconfig indentation (Krzysztof
Kozlowski, Randy Dunlap).
- Update devfreq drivers:
- Add dynamic scaling for the imx8m DDR controller and clean up
imx8m-ddrc (Leonard Crestez, YueHaibing).
- Fix DT node reference counting and nitialization error code path
in rk3399_dmc and add COMPILE_TEST and HAVE_ARM_SMCCC dependency
for it (Chanwoo Choi, Yangtao Li).
- Fix DT node reference counting in rockchip-dfi and make it use
devm_platform_ioremap_resource() (Yangtao Li).
- Fix excessive stack usage in exynos-ppmu (Arnd Bergmann).
- Fix initialization error code paths in exynos-bus (Yangtao Li).
- Clean up exynos-bus and exynos somewhat (Artur Świgoń, Krzysztof
Kozlowski).
- Add tracepoints for tracking usage_count updates unrelated to
status changes in PM-runtime (Michał Mirosław).
- Add sysfs attribute to control the "sync on suspend" behavior
during system-wide suspend (Jonas Meurer).
- Switch system-wide suspend tests over to 64-bit time (Alexandre
Belloni).
- Make wakeup sources statistics in debugfs cover deleted ones which
used to be the case some time ago (zhuguangqing).
- Clean up computations carried out during hibernation, update
messages related to hibernation and fix a spelling mistake in one
of them (Wen Yang, Luigi Semenzato, Colin Ian King).
- Add mailmap entry for maintainer e-mail address that has not been
functional for several years (Rafael Wysocki)"
* tag 'pm-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (83 commits)
cpufreq: loongson2_cpufreq: adjust cpufreq uses of LOONGSON_CHIPCFG
intel_idle: Clean up irtl_2_usec()
intel_idle: Move 3 functions closer to their callers
intel_idle: Annotate initialization code and data structures
intel_idle: Move and clean up intel_idle_cpuidle_devices_uninit()
intel_idle: Rearrange intel_idle_cpuidle_driver_init()
intel_idle: Clean up NULL pointer check in intel_idle_init()
intel_idle: Fold intel_idle_probe() into intel_idle_init()
intel_idle: Eliminate __setup_broadcast_timer()
cpuidle: fix cpuidle_find_deepest_state() kerneldoc warnings
cpuidle: sysfs: fix warnings when compiling with W=1
cpuidle: coupled: fix warnings when compiling with W=1
cpufreq: brcmstb-avs: fix imbalance of cpufreq policy refcount
PM: suspend: Add sysfs attribute to control the "sync on suspend" behavior
PM / devfreq: Add debugfs support with devfreq_summary file
Documentation: admin-guide: PM: Add intel_idle document
cpuidle: arm: Enable compile testing for some of drivers
PM-runtime: add tracepoints for usage_count changes
cpufreq: intel_pstate: fix spelling mistake: "Whethet" -> "Whether"
PM: hibernate: fix spelling mistake "shapshot" -> "snapshot"
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The changes are a real mixed bag this time around.
The only scary looking one from the diffstat is the uapi change to
asm-generic/mman-common.h, but this has been acked by Arnd and is
actually just adding a pair of comments in an attempt to prevent
allocation of some PROT values which tend to get used for
arch-specific purposes. We'll be using them for Branch Target
Identification (a CFI-like hardening feature), which is currently
under review on the mailing list.
New architecture features:
- Support for Armv8.5 E0PD, which benefits KASLR in the same way as
KPTI but without the overhead. This allows KPTI to be disabled on
CPUs that are not affected by Meltdown, even is KASLR is enabled.
- Initial support for the Armv8.5 RNG instructions, which claim to
provide access to a high bandwidth, cryptographically secure
hardware random number generator. As well as exposing these to
userspace, we also use them as part of the KASLR seed and to seed
the crng once all CPUs have come online.
- Advertise a bunch of new instructions to userspace, including
support for Data Gathering Hint, Matrix Multiply and 16-bit
floating point.
Kexec:
- Cleanups in preparation for relocating with the MMU enabled
- Support for loading crash dump kernels with kexec_file_load()
Perf and PMU drivers:
- Cleanups and non-critical fixes for a couple of system PMU drivers
FPU-less (aka broken) CPU support:
- Considerable fixes to support CPUs without the FP/SIMD extensions,
including their presence in heterogeneous systems. Good luck
finding a 64-bit userspace that handles this.
Modern assembly function annotations:
- Start migrating our use of ENTRY() and ENDPROC() over to the
new-fangled SYM_{CODE,FUNC}_{START,END} macros, which are intended
to aid debuggers
Kbuild:
- Cleanup detection of LSE support in the assembler by introducing
'as-instr'
- Remove compressed Image files when building clean targets
IP checksumming:
- Implement optimised IPv4 checksumming routine when hardware offload
is not in use. An IPv6 version is in the works, pending testing.
Hardware errata:
- Work around Cortex-A55 erratum #1530923
Shadow call stack:
- Work around some issues with Clang's integrated assembler not
liking our perfectly reasonable assembly code
- Avoid allocating the X18 register, so that it can be used to hold
the shadow call stack pointer in future
ACPI:
- Fix ID count checking in IORT code. This may regress broken
firmware that happened to work with the old implementation, in
which case we'll have to revert it and try something else
- Fix DAIF corruption on return from GHES handler with pseudo-NMIs
Miscellaneous:
- Whitelist some CPUs that are unaffected by Spectre-v2
- Reduce frequency of ASID rollover when KPTI is compiled in but
inactive
- Reserve a couple of arch-specific PROT flags that are already used
by Sparc and PowerPC and are planned for later use with BTI on
arm64
- Preparatory cleanup of our entry assembly code in preparation for
moving more of it into C later on
- Refactoring and cleanup"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (73 commits)
arm64: acpi: fix DAIF manipulation with pNMI
arm64: kconfig: Fix alignment of E0PD help text
arm64: Use v8.5-RNG entropy for KASLR seed
arm64: Implement archrandom.h for ARMv8.5-RNG
arm64: kbuild: remove compressed images on 'make ARCH=arm64 (dist)clean'
arm64: entry: Avoid empty alternatives entries
arm64: Kconfig: select HAVE_FUTEX_CMPXCHG
arm64: csum: Fix pathological zero-length calls
arm64: entry: cleanup sp_el0 manipulation
arm64: entry: cleanup el0 svc handler naming
arm64: entry: mark all entry code as notrace
arm64: assembler: remove smp_dmb macro
arm64: assembler: remove inherit_daif macro
ACPI/IORT: Fix 'Number of IDs' handling in iort_id_map()
mm: Reserve asm-generic prot flags 0x10 and 0x20 for arch use
arm64: Use macros instead of hard-coded constants for MAIR_EL1
arm64: Add KRYO{3,4}XX CPU cores to spectre-v2 safe list
arm64: kernel: avoid x18 in __cpu_soft_restart
arm64: kvm: stop treating register x18 as caller save
arm64/lib: copy_page: avoid x18 register in assembler code
...
|
|
* pm-cpufreq:
cpufreq: loongson2_cpufreq: adjust cpufreq uses of LOONGSON_CHIPCFG
cpufreq: brcmstb-avs: fix imbalance of cpufreq policy refcount
cpufreq: intel_pstate: fix spelling mistake: "Whethet" -> "Whether"
cpufreq: s3c: fix unbalances of cpufreq policy refcount
cpufreq: imx-cpufreq-dt: Add i.MX8MP support
cpufreq: Use imx-cpufreq-dt for i.MX8MP's speed grading
cpufreq: tegra186: convert to devm_platform_ioremap_resource
cpufreq: kirkwood: convert to devm_platform_ioremap_resource
cpufreq: CPPC: put ACPI table after using it
cpufreq : CPPC: Break out if HiSilicon CPPC workaround is matched
* pm-sleep:
PM: suspend: Add sysfs attribute to control the "sync on suspend" behavior
PM: hibernate: fix spelling mistake "shapshot" -> "snapshot"
PM: hibernate: Add more logging on hibernation failure
PM: hibernate: improve arithmetic division in preallocate_highmem_fraction()
PM: wakeup: Show statistics for deleted wakeup sources again
PM: sleep: Switch to rtc_time64_to_tm()/rtc_tm_to_time64()
|
|
The stubbed version of alarmtimer_get_rtcdev() is not exported.
so this won't work if this function is used in a module when
CONFIG_RTC_CLASS=n.
Move the stub function to the header file and make it inline so that
callers don't have to worry about linking against this symbol.
rtcdev isn't used outside of this ifdef so it's not required to be
redefined to NULL. Drop that while touching this area.
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200124055849.154411-4-swboyd@chromium.org
|
|
Use the wakeup source that can be associated with the 'alarmtimer'
platform device instead of registering another one by hand.
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20200124055849.154411-3-swboyd@chromium.org
|
|
The alarmtimer_suspend() function will fail if an RTC device is on a bus
such as SPI or i2c and that RTC device registers and probes after
alarmtimer_init() registers and probes the 'alarmtimer' platform device.
This is because system wide suspend suspends devices in the reverse order
of their probe. When alarmtimer_suspend() attempts to program the RTC for a
wakeup it will try to program an RTC device on a bus that has already been
suspended.
Move the alarmtimer device registration to happen when the RTC which is
used for wakeup is registered. Register the 'alarmtimer' platform device as
a child of the RTC device too, so that it can be guaranteed that the RTC
device won't be suspended when alarmtimer_suspend() is called.
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20200124055849.154411-2-swboyd@chromium.org
|
|
This function doesn't do anything like this comment says when an RTC device
hasn't been chosen. It looks like we used to do something like that before
commit 8bc0dafb5cf3 ("alarmtimers: Rework RTC device selection using class
interface") but that's long gone now. Remove this sentence to avoid
confusing the reader.
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200124055849.154411-5-swboyd@chromium.org
|
|
The allocation mask is no longer used by on_each_cpu_cond() and
on_each_cpu_cond_mask() and can be removed.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200117090137.1205765-4-bigeasy@linutronix.de
|
|
on_each_cpu_cond_mask() allocates a new CPU mask. The newly allocated
mask is a subset of the provided mask based on the conditional function.
This memory allocation can be avoided by extending smp_call_function_many()
with the conditional function and performing the remote function call based
on the mask and the conditional function.
Rename smp_call_function_many() to smp_call_function_many_cond() and add
the smp_cond_func_t argument. If smp_cond_func_t is provided then it is
used before invoking the function. Provide smp_call_function_many() with
cond_func set to NULL. Let on_each_cpu_cond_mask() use
smp_call_function_many_cond().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200117090137.1205765-3-bigeasy@linutronix.de
|
|
Use a typdef for the conditional function instead defining it each time in
the function prototype.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200117090137.1205765-2-bigeasy@linutronix.de
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
- Conversion of the SiFive PLIC to hierarchical domains
- New SiFive GPIO irqchip driver
- New Aspeed SCI irqchip driver
- New NXP INTMUX irqchip driver
- Additional support for the Meson A1 GPIO irqchip
- First part of the GICv4.1 support
- Assorted fixes
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Various tracing fixes:
- Fix a function comparison warning for a xen trace event macro
- Fix a double perf_event linking to a trace_uprobe_filter for
multiple events
- Fix suspicious RCU warnings in trace event code for using
list_for_each_entry_rcu() when the "_rcu" portion wasn't needed.
- Fix a bug in the histogram code when using the same variable
- Fix a NULL pointer dereference when tracefs lockdown enabled and
calling trace_set_default_clock()
- A fix to a bug found with the double perf_event linking patch"
* tag 'trace-v5.5-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/uprobe: Fix to make trace_uprobe_filter alignment safe
tracing: Do not set trace clock if tracefs lockdown is in effect
tracing: Fix histogram code when expression has same var as value
tracing: trigger: Replace unneeded RCU-list traversals
tracing/uprobe: Fix double perf_event linking on multiprobe uprobe
tracing: xen: Ordered comparison of function pointers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Prevent the kernel from crashing during resume from hibernation if
free pages contain leftover data from the restore kernel and
init_on_free is set (Alexander Potapenko)"
* tag 'pm-5.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: hibernate: fix crashes with init_on_free=1
|
|
|
|
The affinity of managed interrupts is completely handled in the kernel and
cannot be changed via the /proc/irq/* interfaces from user space. As the
kernel tries to spread out interrupts evenly accross CPUs on x86 to prevent
vector exhaustion, it can happen that a managed interrupt whose affinity
mask contains both isolated and housekeeping CPUs is routed to an isolated
CPU. As a consequence IO submitted on a housekeeping CPU causes interrupts
on the isolated CPU.
Add a new sub-parameter 'managed_irq' for 'isolcpus' and the corresponding
logic in the interrupt affinity selection code.
The subparameter indicates to the interrupt affinity selection logic that
it should try to avoid the above scenario.
This isolation is best effort and only effective if the automatically
assigned interrupt mask of a device queue contains isolated and
housekeeping CPUs. If housekeeping CPUs are online then such interrupts are
directed to the housekeeping CPU so that IO submitted on the housekeeping
CPU cannot disturb the isolated CPU.
If a queue's affinity mask contains only isolated CPUs then this parameter
has no effect on the interrupt routing decision, though interrupts are only
happening when tasks running on those isolated CPUs submit IO. IO submitted
on housekeeping CPUs has no influence on those queues.
If the affinity mask contains both housekeeping and isolated CPUs, but none
of the contained housekeeping CPUs is online, then the interrupt is also
routed to an isolated CPU. Interrupts are only delivered when one of the
isolated CPUs in the affinity mask submits IO. If one of the contained
housekeeping CPUs comes online, the CPU hotplug logic migrates the
interrupt automatically back to the upcoming housekeeping CPU. Depending on
the type of interrupt controller, this can require that at least one
interrupt is delivered to the isolated CPU in order to complete the
migration.
[ tglx: Removed unused parameter, added and edited comments/documentation
and rephrased the changelog so it contains more details. ]
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200120091625.17912-1-ming.lei@redhat.com
|
|
Sparse reports a warning at __run_hrtimer()
|warning: context imbalance in __run_hrtimer() - unexpected unlock
Add the missing must_hold() annotation.
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200120224347.51843-1-jbi.octave@gmail.com
|
|
Commit 99c9a923e97a ("tracing/uprobe: Fix double perf_event
linking on multiprobe uprobe") moved trace_uprobe_filter on
trace_probe_event. However, since it introduced a flexible
data structure with char array and type casting, the
alignment of trace_uprobe_filter can be broken.
This changes the type of the array to trace_uprobe_filter
data strucure to fix it.
Link: http://lore.kernel.org/r/20200120124022.GA14897@hirez.programming.kicks-ass.net
Link: http://lkml.kernel.org/r/157966340499.5107.10978352478952144902.stgit@devnote2
Fixes: 99c9a923e97a ("tracing/uprobe: Fix double perf_event linking on multiprobe uprobe")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
When trace_clock option is not set and unstable clcok detected,
tracing_set_default_clock() sets trace_clock(ThinkPad A285 is one of
case). In that case, if lockdown is in effect, null pointer
dereference error happens in ring_buffer_set_clock().
Link: http://lkml.kernel.org/r/20200116131236.3866925-1-masami256@gmail.com
Cc: stable@vger.kernel.org
Fixes: 17911ff38aa58 ("tracing: Add locked_down checks to the open calls of files created for tracefs")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1788488
Signed-off-by: Masami Ichikawa <masami256@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
While working on a tool to convert SQL syntex into the histogram language of
the kernel, I discovered the following bug:
# echo 'first u64 start_time u64 end_time pid_t pid u64 delta' >> synthetic_events
# echo 'hist:keys=pid:start=common_timestamp' > events/sched/sched_waking/trigger
# echo 'hist:keys=next_pid:delta=common_timestamp-$start,start2=$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger
Would not display any histograms in the sched_switch histogram side.
But if I were to swap the location of
"delta=common_timestamp-$start" with "start2=$start"
Such that the last line had:
# echo 'hist:keys=next_pid:start2=$start,delta=common_timestamp-$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger
The histogram works as expected.
What I found out is that the expressions clear out the value once it is
resolved. As the variables are resolved in the order listed, when
processing:
delta=common_timestamp-$start
The $start is cleared. When it gets to "start2=$start", it errors out with
"unresolved symbol" (which is silent as this happens at the location of the
trace), and the histogram is dropped.
When processing the histogram for variable references, instead of adding a
new reference for a variable used twice, use the same reference. That way,
not only is it more efficient, but the order will no longer matter in
processing of the variables.
From Tom Zanussi:
"Just to clarify some more about what the problem was is that without
your patch, we would have two separate references to the same variable,
and during resolve_var_refs(), they'd both want to be resolved
separately, so in this case, since the first reference to start wasn't
part of an expression, it wouldn't get the read-once flag set, so would
be read normally, and then the second reference would do the read-once
read and also be read but using read-once. So everything worked and
you didn't see a problem:
from: start2=$start,delta=common_timestamp-$start
In the second case, when you switched them around, the first reference
would be resolved by doing the read-once, and following that the second
reference would try to resolve and see that the variable had already
been read, so failed as unset, which caused it to short-circuit out and
not do the trigger action to generate the synthetic event:
to: delta=common_timestamp-$start,start2=$start
With your patch, we only have the single resolution which happens
correctly the one time it's resolved, so this can't happen."
Link: https://lore.kernel.org/r/20200116154216.58ca08eb@gandalf.local.home
Cc: stable@vger.kernel.org
Fixes: 067fe038e70f6 ("tracing: Add variable reference handling to hist triggers")
Reviewed-by: Tom Zanuss <zanussi@kernel.org>
Tested-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Fix a memory leak reported by kmemleak:
unreferenced object 0xffff000bc6f50e80 (size 128):
comm "kworker/23:2", pid 201, jiffies 4294894947 (age 942.132s)
hex dump (first 32 bytes):
00 00 00 00 41 00 00 00 86 c0 03 00 00 00 00 00 ....A...........
00 a0 b2 c6 0b 00 ff ff 40 51 fd 10 00 80 ff ff ........@Q......
backtrace:
[<00000000e62d2240>] kmem_cache_alloc_trace+0x1a4/0x320
[<00000000279143c9>] irq_domain_push_irq+0x7c/0x188
[<00000000d9f4c154>] thunderx_gpio_probe+0x3ac/0x438
[<00000000fd09ec22>] pci_device_probe+0xe4/0x198
[<00000000d43eca75>] really_probe+0xdc/0x320
[<00000000d3ebab09>] driver_probe_device+0x5c/0xf0
[<000000005b3ecaa0>] __device_attach_driver+0x88/0xc0
[<000000004e5915f5>] bus_for_each_drv+0x7c/0xc8
[<0000000079d4db41>] __device_attach+0xe4/0x140
[<00000000883bbda9>] device_initial_probe+0x18/0x20
[<000000003be59ef6>] bus_probe_device+0x98/0xa0
[<0000000039b03d3f>] deferred_probe_work_func+0x74/0xa8
[<00000000870934ce>] process_one_work+0x1c8/0x470
[<00000000e3cce570>] worker_thread+0x1f8/0x428
[<000000005d64975e>] kthread+0xfc/0x128
[<00000000f0eaa764>] ret_from_fork+0x10/0x18
Fixes: 495c38d3001f ("irqdomain: Add irq_domain_{push,pop}_irq() functions")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200120043547.22271-1-haokexin@gmail.com
|
|
Add a new function irq_domain_translate_onecell() that is to be used as
the translate function in struct irq_domain_ops.
Signed-off-by: Yash Shah <yash.shah@sifive.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1575976274-13487-2-git-send-email-yash.shah@sifive.com
|
|
Pull networking fixes from David Miller:
1) Fix non-blocking connect() in x25, from Martin Schiller.
2) Fix spurious decryption errors in kTLS, from Jakub Kicinski.
3) Netfilter use-after-free in mtype_destroy(), from Cong Wang.
4) Limit size of TSO packets properly in lan78xx driver, from Eric
Dumazet.
5) r8152 probe needs an endpoint sanity check, from Johan Hovold.
6) Prevent looping in tcp_bpf_unhash() during sockmap/tls free, from
John Fastabend.
7) hns3 needs short frames padded on transmit, from Yunsheng Lin.
8) Fix netfilter ICMP header corruption, from Eyal Birger.
9) Fix soft lockup when low on memory in hns3, from Yonglong Liu.
10) Fix NTUPLE firmware command failures in bnxt_en, from Michael Chan.
11) Fix memory leak in act_ctinfo, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
cxgb4: reject overlapped queues in TC-MQPRIO offload
cxgb4: fix Tx multi channel port rate limit
net: sched: act_ctinfo: fix memory leak
bnxt_en: Do not treat DSN (Digital Serial Number) read failure as fatal.
bnxt_en: Fix ipv6 RFS filter matching logic.
bnxt_en: Fix NTUPLE firmware command failures.
net: systemport: Fixed queue mapping in internal ring map
net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec
net: dsa: sja1105: Don't error out on disabled ports with no phy-mode
net: phy: dp83867: Set FORCE_LINK_GOOD to default after reset
net: hns: fix soft lockup when there is not enough memory
net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()
net/sched: act_ife: initalize ife->metalist earlier
netfilter: nat: fix ICMP header corruption on ICMP errors
net: wan: lapbether.c: Use built-in RCU list checking
netfilter: nf_tables: fix flowtable list del corruption
netfilter: nf_tables: fix memory leak in nf_tables_parse_netdev_hooks()
netfilter: nf_tables: remove WARN and add NLA_STRING upper limits
netfilter: nft_tunnel: ERSPAN_VERSION must not be null
netfilter: nft_tunnel: fix null-attribute check
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Three fixes: fix link failure on Alpha, fix a Sparse warning and
annotate/robustify a lockless access in the NOHZ code"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/sched: Annotate lockless access to last_jiffies_update
lib/vdso: Make __cvdso_clock_getres() static
time/posix-stubs: Provide compat itimer supoprt for alpha
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull cpu/SMT fix from Ingo Molnar:
"Fix a build bug on CONFIG_HOTPLUG_SMT=y && !CONFIG_SYSFS kernels"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/SMT: Fix x86 link error without CONFIG_SYSFS
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Tooling fixes, three Intel uncore driver fixes, plus an AUX events fix
uncovered by the perf fuzzer"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Remove PCIe3 unit for SNR
perf/x86/intel/uncore: Fix missing marker for snr_uncore_imc_freerunning_events
perf/x86/intel/uncore: Add PCI ID of IMC for Xeon E3 V5 Family
perf: Correctly handle failed perf_get_aux_event()
perf hists: Fix variable name's inconsistency in hists__for_each() macro
perf map: Set kmap->kmaps backpointer for main kernel map chunks
perf report: Fix incorrectly added dimensions as switch perf data file
tools lib traceevent: Fix memory leakage in filter_event
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Three fixes:
- Fix an rwsem spin-on-owner crash, introduced in v5.4
- Fix a lockdep bug when running out of stack_trace entries,
introduced in v5.4
- Docbook fix"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Fix kernel crash when spinning on RWSEM_OWNER_UNKNOWN
futex: Fix kernel-doc notation warning
locking/lockdep: Fix buffer overrun problem in stack_trace[]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull rseq fixes from Ingo Molnar:
"Two rseq bugfixes:
- CLONE_VM !CLONE_THREAD didn't work properly, the kernel would end
up corrupting the TLS of the parent. Technically a change in the
ABI but the previous behavior couldn't resonably have been relied
on by applications so this looks like a valid exception to the ABI
rule.
- Make the RSEQ_FLAG_UNREGISTER ABI behavior consistent with the
handling of other flags. This is not thought to impact any
applications either"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: Unregister rseq for clone CLONE_VM
rseq: Reject unknown flags on rseq unregister
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull thread fixes from Christian Brauner:
"Here is an urgent fix for ptrace_may_access() permission checking.
Commit 69f594a38967 ("ptrace: do not audit capability check when
outputing /proc/pid/stat") introduced the ability to opt out of audit
messages for accesses to various proc files since they are not
violations of policy.
While doing so it switched the check from ns_capable() to
has_ns_capability{_noaudit}(). That means it switched from checking
the subjective credentials (ktask->cred) of the task to using the
objective credentials (ktask->real_cred). This is appears to be wrong.
ptrace_has_cap() is currently only used in ptrace_may_access() And is
used to check whether the calling task (subject) has the
CAP_SYS_PTRACE capability in the provided user namespace to operate on
the target task (object). According to the cred.h comments this means
the subjective credentials of the calling task need to be used.
With this fix we switch ptrace_has_cap() to use security_capable() and
thus back to using the subjective credentials.
As one example where this might be particularly problematic, Jann
pointed out that in combination with the upcoming IORING_OP_OPENAT{2}
feature, this bug might allow unprivileged users to bypass the
capability checks while asynchronously opening files like /proc/*/mem,
because the capability checks for this would be performed against
kernel credentials.
To illustrate on the former point about this being exploitable: When
io_uring creates a new context it records the subjective credentials
of the caller. Later on, when it starts to do work it creates a kernel
thread and registers a callback. The callback runs with kernel creds
for ktask->real_cred and ktask->cred.
To prevent this from becoming a full-blown 0-day io_uring will call
override_cred() and override ktask->cred with the subjective
credentials of the creator of the io_uring instance. With
ptrace_has_cap() currently looking at ktask->real_cred this override
will be ineffective and the caller will be able to open arbitray proc
files as mentioned above.
Luckily, this is currently not exploitable but would be so once
IORING_OP_OPENAT{2} land in v5.6. Let's fix it now.
To minimize potential regressions I successfully ran the criu
testsuite. criu makes heavy use of ptrace() and extensively hits
ptrace_may_access() codepaths and has a good change of detecting any
regressions.
Additionally, I succesfully ran the ptrace and seccomp kernel tests"
* tag 'for-linus-2020-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
ptrace: reintroduce usage of subjective credentials in ptrace_has_cap()
|
|
Commit 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
introduced the ability to opt out of audit messages for accesses to various
proc files since they are not violations of policy. While doing so it
somehow switched the check from ns_capable() to
has_ns_capability{_noaudit}(). That means it switched from checking the
subjective credentials of the task to using the objective credentials. This
is wrong since. ptrace_has_cap() is currently only used in
ptrace_may_access() And is used to check whether the calling task (subject)
has the CAP_SYS_PTRACE capability in the provided user namespace to operate
on the target task (object). According to the cred.h comments this would
mean the subjective credentials of the calling task need to be used.
This switches ptrace_has_cap() to use security_capable(). Because we only
call ptrace_has_cap() in ptrace_may_access() and in there we already have a
stable reference to the calling task's creds under rcu_read_lock() there's
no need to go through another series of dereferences and rcu locking done
in ns_capable{_noaudit}().
As one example where this might be particularly problematic, Jann pointed
out that in combination with the upcoming IORING_OP_OPENAT feature, this
bug might allow unprivileged users to bypass the capability checks while
asynchronously opening files like /proc/*/mem, because the capability
checks for this would be performed against kernel credentials.
To illustrate on the former point about this being exploitable: When
io_uring creates a new context it records the subjective credentials of the
caller. Later on, when it starts to do work it creates a kernel thread and
registers a callback. The callback runs with kernel creds for
ktask->real_cred and ktask->cred. To prevent this from becoming a
full-blown 0-day io_uring will call override_cred() and override
ktask->cred with the subjective credentials of the creator of the io_uring
instance. With ptrace_has_cap() currently looking at ktask->real_cred this
override will be ineffective and the caller will be able to open arbitray
proc files as mentioned above.
Luckily, this is currently not exploitable but will turn into a 0-day once
IORING_OP_OPENAT{2} land in v5.6. Fix it now!
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Jann Horn <jannh@google.com>
Fixes: 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
The low resolution parts of the VDSO, i.e.:
clock_gettime(CLOCK_*_COARSE), clock_getres(), time()
can be used even if there is no VDSO capable clocksource.
But if an architecture opts out of the VDSO data update then this
information becomes stale. This affects ARM when there is no architected
timer available. The lack of update causes userspace to use stale data
forever.
Make the update of the low resolution parts unconditional and only skip
the update of the high resolution parts if the architecture requests it.
Fixes: 44f57d788e7d ("timekeeping: Provide a generic update_vsyscall() implementation")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200114185946.765577901@linutronix.de
|
|
The function name suggests that this is a boolean checking whether the
architecture asks for an update of the VDSO data, but it works the other
way round. To spare further confusion invert the logic.
Fixes: 44f57d788e7d ("timekeeping: Provide a generic update_vsyscall() implementation")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200114185946.656652824@linutronix.de
|
|
Vince reports a worrying issue:
| so I was tracking down some odd behavior in the perf_fuzzer which turns
| out to be because perf_even_open() sometimes returns 0 (indicating a file
| descriptor of 0) even though as far as I can tell stdin is still open.
... and further the cause:
| error is triggered if aux_sample_size has non-zero value.
|
| seems to be this line in kernel/events/core.c:
|
| if (perf_need_aux_event(event) && !perf_get_aux_event(event, group_leader))
| goto err_locked;
|
| (note, err is never set)
This seems to be a thinko in commit:
ab43762ef010967e ("perf: Allow normal events to output AUX data")
... and we should probably return -EINVAL here, as this should only
happen when the new event is mis-configured or does not have a
compatible aux_event group leader.
Fixes: ab43762ef010967e ("perf: Allow normal events to output AUX data")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
|
|
Robert reported that during boot the watchdog timestamp is set to 0 for one
second which is the indicator for a watchdog reset.
The reason for this is that the timestamp is in seconds and the time is
taken from sched clock and divided by ~1e9. sched clock starts at 0 which
means that for the first second during boot the watchdog timestamp is 0,
i.e. reset.
Use ULONG_MAX as the reset indicator value so the watchdog works correctly
right from the start. ULONG_MAX would only conflict with a real timestamp
if the system reaches an uptime of 136 years on 32bit and almost eternity
on 64bit.
Reported-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/87o8v3uuzl.fsf@nanos.tec.linutronix.de
|
|
The commit 91d2a812dfb9 ("locking/rwsem: Make handoff writer
optimistically spin on owner") will allow a recently woken up waiting
writer to spin on the owner. Unfortunately, if the owner happens to be
RWSEM_OWNER_UNKNOWN, the code will incorrectly spin on it leading to a
kernel crash. This is fixed by passing the proper non-spinnable bits
to rwsem_spin_on_owner() so that RWSEM_OWNER_UNKNOWN will be treated
as a non-spinnable target.
Fixes: 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner")
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200115154336.8679-1-longman@redhat.com
|
|
Upon resuming from hibernation, free pages may contain stale data from
the kernel that initiated the resume. This breaks the invariant
inflicted by init_on_free=1 that freed pages must be zeroed.
To deal with this problem, make clear_free_pages() also clear the free
pages when init_on_free is enabled.
Fixes: 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")
Reported-by: Johannes Stezenbach <js@sig21.net>
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: 5.3+ <stable@vger.kernel.org> # 5.3+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The sysfs attribute `/sys/power/sync_on_suspend` controls, whether or not
filesystems are synced by the kernel before system suspend.
Congruously, the behaviour of build-time switch CONFIG_SUSPEND_SKIP_SYNC
is slightly changed: It now defines the run-tim default for the new sysfs
attribute `/sys/power/sync_on_suspend`.
The run-time attribute is added because the existing corresponding
build-time Kconfig flag for (`CONFIG_SUSPEND_SKIP_SYNC`) is not flexible
enough. E.g. Linux distributions that provide pre-compiled kernels
usually want to stick with the default (sync filesystems before suspend)
but under special conditions this needs to be changed.
One example for such a special condition is user-space handling of
suspending block devices (e.g. using `cryptsetup luksSuspend` or `dmsetup
suspend`) before system suspend. The Kernel trying to sync filesystems
after the underlying block device already got suspended obviously leads
to dead-locks. Be aware that you have to take care of the filesystem sync
yourself before suspending the system in those scenarios.
Signed-off-by: Jonas Meurer <jonas@freesources.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
commit 9cf57731b63e ("watchdog/softlockup: Replace "watchdog/%u" threads
with cpu_stop_work") ensures that the watchdog is reliably touched during
a task switch.
As a result the check for an unnoticed task switch is not longer needed.
Remove the relevant code, which effectively reverts commit b1a8de1f5343
("softlockup: make detector be aware of task switch of processes hogging
cpu")
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Ziljstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20191024114928.15377-2-pmladek@suse.com
|
|
After commit 9cf57731b63e ("watchdog/softlockup: Replace "watchdog/%u"
threads with cpu_stop_work"), the percpu soft_lockup_hrtimer_cnt is
not used any more, so remove it and related code.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191218131720.4146aea2@xhacker.debian
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2020-01-15
The following pull-request contains BPF updates for your *net* tree.
We've added 12 non-merge commits during the last 9 day(s) which contain
a total of 13 files changed, 95 insertions(+), 43 deletions(-).
The main changes are:
1) Fix refcount leak for TCP time wait and request sockets for socket lookup
related BPF helpers, from Lorenz Bauer.
2) Fix wrong verification of ARSH instruction under ALU32, from Daniel Borkmann.
3) Batch of several sockmap and related TLS fixes found while operating
more complex BPF programs with Cilium and OpenSSL, from John Fastabend.
4) Fix sockmap to read psock's ingress_msg queue before regular sk_receive_queue()
to avoid purging data upon teardown, from Lingpeng Chen.
5) Fix printing incorrect pointer in bpftool's btf_dump_ptr() in order to properly
dump a BPF map's value with BTF, from Martin KaFai Lau.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Anatoly has been fuzzing with kBdysch harness and reported a hang in one
of the outcomes:
0: R1=ctx(id=0,off=0,imm=0) R10=fp0
0: (85) call bpf_get_socket_cookie#46
1: R0_w=invP(id=0) R10=fp0
1: (57) r0 &= 808464432
2: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0
2: (14) w0 -= 810299440
3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0
3: (c4) w0 s>>= 1
4: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0
4: (76) if w0 s>= 0x30303030 goto pc+216
221: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0
221: (95) exit
processed 6 insns (limit 1000000) [...]
Taking a closer look, the program was xlated as follows:
# ./bpftool p d x i 12
0: (85) call bpf_get_socket_cookie#7800896
1: (bf) r6 = r0
2: (57) r6 &= 808464432
3: (14) w6 -= 810299440
4: (c4) w6 s>>= 1
5: (76) if w6 s>= 0x30303030 goto pc+216
6: (05) goto pc-1
7: (05) goto pc-1
8: (05) goto pc-1
[...]
220: (05) goto pc-1
221: (05) goto pc-1
222: (95) exit
Meaning, the visible effect is very similar to f54c7898ed1c ("bpf: Fix
precision tracking for unbounded scalars"), that is, the fall-through
branch in the instruction 5 is considered to be never taken given the
conclusion from the min/max bounds tracking in w6, and therefore the
dead-code sanitation rewrites it as goto pc-1. However, real-life input
disagrees with verification analysis since a soft-lockup was observed.
The bug sits in the analysis of the ARSH. The definition is that we shift
the target register value right by K bits through shifting in copies of
its sign bit. In adjust_scalar_min_max_vals(), we do first coerce the
register into 32 bit mode, same happens after simulating the operation.
However, for the case of simulating the actual ARSH, we don't take the
mode into account and act as if it's always 64 bit, but location of sign
bit is different:
dst_reg->smin_value >>= umin_val;
dst_reg->smax_value >>= umin_val;
dst_reg->var_off = tnum_arshift(dst_reg->var_off, umin_val);
Consider an unknown R0 where bpf_get_socket_cookie() (or others) would
for example return 0xffff. With the above ARSH simulation, we'd see the
following results:
[...]
1: R1=ctx(id=0,off=0,imm=0) R2_w=invP65535 R10=fp0
1: (85) call bpf_get_socket_cookie#46
2: R0_w=invP(id=0) R10=fp0
2: (57) r0 &= 808464432
-> R0_runtime = 0x3030
3: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0
3: (14) w0 -= 810299440
-> R0_runtime = 0xcfb40000
4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0
(0xffffffff)
4: (c4) w0 s>>= 1
-> R0_runtime = 0xe7da0000
5: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0
(0x67c00000) (0x7ffbfff8)
[...]
In insn 3, we have a runtime value of 0xcfb40000, which is '1100 1111 1011
0100 0000 0000 0000 0000', the result after the shift has 0xe7da0000 that
is '1110 0111 1101 1010 0000 0000 0000 0000', where the sign bit is correctly
retained in 32 bit mode. In insn4, the umax was 0xffffffff, and changed into
0x7ffbfff8 after the shift, that is, '0111 1111 1111 1011 1111 1111 1111 1000'
and means here that the simulation didn't retain the sign bit. With above
logic, the updates happen on the 64 bit min/max bounds and given we coerced
the register, the sign bits of the bounds are cleared as well, meaning, we
need to force the simulation into s32 space for 32 bit alu mode.
Verification after the fix below. We're first analyzing the fall-through branch
on 32 bit signed >= test eventually leading to rejection of the program in this
specific case:
0: R1=ctx(id=0,off=0,imm=0) R10=fp0
0: (b7) r2 = 808464432
1: R1=ctx(id=0,off=0,imm=0) R2_w=invP808464432 R10=fp0
1: (85) call bpf_get_socket_cookie#46
2: R0_w=invP(id=0) R10=fp0
2: (bf) r6 = r0
3: R0_w=invP(id=0) R6_w=invP(id=0) R10=fp0
3: (57) r6 &= 808464432
4: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0
4: (14) w6 -= 810299440
5: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0
5: (c4) w6 s>>= 1
6: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0
(0x67c00000) (0xfffbfff8)
6: (76) if w6 s>= 0x30303030 goto pc+216
7: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0
7: (30) r0 = *(u8 *)skb[808464432]
BPF_LD_[ABS|IND] uses reserved fields
processed 8 insns (limit 1000000) [...]
Fixes: 9cbe1f5a32dc ("bpf/verifier: improve register value range tracking with ARSH")
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115204733.16648-1-daniel@iogearbox.net
|
|
Suspend to IDLE invokes tick_unfreeze() on resume. tick_unfreeze() on the
first resuming CPU resumes timekeeping, which also has the side effect of
resetting the softlockup watchdog on this CPU.
But on the secondary CPUs the watchdog is not reset in the resume /
unfreeze() path, which can result in false softlockup warnings on those
CPUs depending on the time spent in suspend.
Prevent this by clearing the softlock watchdog in the unfreeze path also
on the secondary resuming CPUs.
[ tglx: Massaged changelog ]
Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200110083902.27276-1-chunyan.zhang@unisoc.com
|
|
The test_cgcore_no_internal_process_constraint_on_threads selftest when
running with subsystem controlling noise triggers two warnings:
> [ 597.443115] WARNING: CPU: 1 PID: 28167 at kernel/cgroup/cgroup.c:3131 cgroup_apply_control_enable+0xe0/0x3f0
> [ 597.443413] WARNING: CPU: 1 PID: 28167 at kernel/cgroup/cgroup.c:3177 cgroup_apply_control_disable+0xa6/0x160
Both stem from a call to cgroup_type_write. The first warning was also
triggered by syzkaller.
When we're switching cgroup to threaded mode shortly after a subsystem
was disabled on it, we can see the respective subsystem css dying there.
The warning in cgroup_apply_control_enable is harmless in this case
since we're not adding new subsys anyway.
The warning in cgroup_apply_control_disable indicates an attempt to kill
css of recently disabled subsystem repeatedly.
The commit prevents these situations by making cgroup_type_write wait
for all dying csses to go away before re-applying subtree controls.
When at it, the locations of WARN_ON_ONCE calls are moved so that
warning is triggered only when we are about to misuse the dying css.
Reported-by: syzbot+5493b2a54d31d6aea629@syzkaller.appspotmail.com
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|