Merge tag 'timers-core-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer core updates from Thomas Gleixner: - A rework of the hrtimer subsystem to reduce the overhead for frequently armed timers, especially the hrtick scheduler timer: - Better timer locality decision - Simplification of the evaluation of the first expiry time by keeping track of the neighbor timers in the RB-tree by providing a RB-tree variant with neighbor links. That avoids walking the RB-tree on removal to find the next expiry time, but even more important allows to quickly evaluate whether a timer which is rearmed changes the position in the RB-tree with the modified expiry time or not. If not, the dequeue/enqueue sequence which both can end up in rebalancing can be completely avoided. - Deferred reprogramming of the underlying clock event device. This optimizes for the situation where a hrtimer callback sets the need resched bit. In that case the code attempts to defer the re-programming of the clock event device up to the point where the scheduler has picked the next task and has the next hrtick timer armed. In case that there is no immediate reschedule or soft interrupts have to be handled before reaching the reschedule point in the interrupt entry code the clock event is reprogrammed in one of those code paths to prevent that the timer becomes stale. - Support for clocksource coupled clockevents The TSC deadline timer is coupled to the TSC. The next event is programmed in TSC time. Currently this is done by converting the CLOCK_MONOTONIC based expiry value into a relative timeout, converting it into TSC ticks, reading the TSC adding the delta ticks and writing the deadline MSR. As the timekeeping core has the conversion factors for the TSC already, the whole back and forth conversion can be completely avoided. The timekeeping core calculates the reverse conversion factors from nanoseconds to TSC ticks and utilizes the base timestamps of TSC and CLOCK_MONOTONIC which are updated once per tick. This allows a direct conversion into the TSC deadline value without reading the time and as a bonus keeps the deadline conversion in sync with the TSC conversion factors, which are updated by adjtimex() on systems with NTP/PTP enabled. - Allow inlining of the clocksource read and clockevent write functions when they are tiny enough, e.g. on x86 RDTSC and WRMSR. With all those enhancements in place a hrtick enabled scheduler provides the same performance as without hrtick. But also other hrtimer users obviously benefit from these optimizations. - Robustness improvements and cleanups of historical sins in the hrtimer and timekeeping code. - Rewrite of the clocksource watchdog. The clocksource watchdog code has over time reached the state of an impenetrable maze of duct tape and staples. The original design, which was made in the context of systems far smaller than today, is based on the assumption that the to be monitored clocksource (TSC) can be trivially compared against a known to be stable clocksource (HPET/ACPI-PM timer). Over the years this rather naive approach turned out to have major flaws. Long delays between the watchdog invocations can cause wrap arounds of the reference clocksource. The access to the reference clocksource degrades on large multi-sockets systems dure to interconnect congestion. This has been addressed with various heuristics which degraded the accuracy of the watchdog to the point that it fails to detect actual TSC problems on older hardware which exposes slow inter CPU drifts due to firmware manipulating the TSC to hide SMI time. The rewrite addresses this by: - Restricting the validation against the reference clocksource to the boot CPU which is usually closest to the legacy block which contains the reference clocksource (HPET/ACPI-PM). - Do a round robin validation betwen the boot CPU and the other CPUs based only on the TSC with an algorithm similar to the TSC synchronization code during CPU hotplug. - Being more leniant versus remote timeouts - The usual tiny fixes, cleanups and enhancements all over the place * tag 'timers-core-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits) alarmtimer: Access timerqueue node under lock in suspend hrtimer: Fix incorrect #endif comment for BITS_PER_LONG check posix-timers: Fix stale function name in comment timers: Get this_cpu once while clearing the idle state clocksource: Rewrite watchdog code completely clocksource: Don't use non-continuous clocksources as watchdog x86/tsc: Handle CLOCK_SOURCE_VALID_FOR_HRES correctly MIPS: Don't select CLOCKSOURCE_WATCHDOG parisc: Remove unused clocksource flags hrtimer: Add a helper to retrieve a hrtimer from its timerqueue node hrtimer: Remove trailing comma after HRTIMER_MAX_CLOCK_BASES hrtimer: Mark index and clockid of clock base as const hrtimer: Drop unnecessary pointer indirection in hrtimer_expire_entry event hrtimer: Drop spurious space in 'enum hrtimer_base_type' hrtimer: Don't zero-initialize ret in hrtimer_nanosleep() hrtimer: Remove hrtimer_get_expires_ns() timekeeping: Mark offsets array as const timekeeping/auxclock: Consistently use raw timekeeper for tk_setup_internals() timer_list: Print offset as signed integer tracing: Use explicit array size instead of sentinel elements in symbol printing ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2026-04-14 20:27:07 +0300
committer: Linus Torvalds <torvalds@linux-foundation.org> 2026-04-14 20:27:07 +0300
commit: c1fe867b5bf9c57ab7856486d342720e2b205eed (patch)
tree: 3c9a31afe6b81498b821304dfa107dad7ee2da60 /include/linux/clocksource.h
parent: 1d5e40351e7d521d7d143447d57315b6eb1e1160 (diff)
parent: ff1c0c5d07028a84837950b619d30da623f8ddb2 (diff)
download: linux-c1fe867b5bf9c57ab7856486d342720e2b205eed.tar.xz
1 files changed, 8 insertions, 19 deletions
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 65b7c41471c3..ccf5c0ca26b7 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -44,8 +44,6 @@ struct module;
  * @shift:		Cycle to nanosecond divisor (power of two)
  * @max_idle_ns:	Maximum idle time permitted by the clocksource (nsecs)
  * @maxadj:		Maximum adjustment value to mult (~11%)
- * @uncertainty_margin:	Maximum uncertainty in nanoseconds per half second.
- *			Zero says to use default WATCHDOG_THRESHOLD.
  * @archdata:		Optional arch-specific data
  * @max_cycles:		Maximum safe cycle value which won't overflow on
  *			multiplication
@@ -105,7 +103,6 @@ struct clocksource {
 	u32			shift;
 	u64			max_idle_ns;
 	u32			maxadj;
-	u32			uncertainty_margin;
 #ifdef CONFIG_ARCH_CLOCKSOURCE_DATA
 	struct arch_clocksource_data archdata;
 #endif
@@ -133,6 +130,7 @@ struct clocksource {
 	struct list_head	wd_list;
 	u64			cs_last;
 	u64			wd_last;
+	unsigned int		wd_cpu;
 #endif
 	struct module		*owner;
 };
@@ -142,13 +140,19 @@ struct clocksource {
  */
 #define CLOCK_SOURCE_IS_CONTINUOUS		0x01
 #define CLOCK_SOURCE_MUST_VERIFY		0x02
+#define CLOCK_SOURCE_CALIBRATED			0x04
 
 #define CLOCK_SOURCE_WATCHDOG			0x10
 #define CLOCK_SOURCE_VALID_FOR_HRES		0x20
 #define CLOCK_SOURCE_UNSTABLE			0x40
 #define CLOCK_SOURCE_SUSPEND_NONSTOP		0x80
 #define CLOCK_SOURCE_RESELECT			0x100
-#define CLOCK_SOURCE_VERIFY_PERCPU		0x200
+#define CLOCK_SOURCE_CAN_INLINE_READ		0x200
+#define CLOCK_SOURCE_HAS_COUPLED_CLOCK_EVENT	0x400
+
+#define CLOCK_SOURCE_WDTEST			0x800
+#define CLOCK_SOURCE_WDTEST_PERCPU		0x1000
+
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0)
 
@@ -298,21 +302,6 @@ static inline void timer_probe(void) {}
 #define TIMER_ACPI_DECLARE(name, table_id, fn)		\
 	ACPI_DECLARE_PROBE_ENTRY(timer, name, table_id, 0, NULL, 0, fn)
 
-static inline unsigned int clocksource_get_max_watchdog_retry(void)
-{
-	/*
-	 * When system is in the boot phase or under heavy workload, there
-	 * can be random big latencies during the clocksource/watchdog
-	 * read, so allow retries to filter the noise latency. As the
-	 * latency's frequency and maximum value goes up with the number of
-	 * CPUs, scale the number of retries with the number of online
-	 * CPUs.
-	 */
-	return (ilog2(num_online_cpus()) / 2) + 1;
-}
-
-void clocksource_verify_percpu(struct clocksource *cs);
-
 /**
  * struct clocksource_base - hardware abstraction for clock on which a clocksource
  *			is based
author	Linus Torvalds <torvalds@linux-foundation.org>	2026-04-14 20:27:07 +0300
committer	Linus Torvalds <torvalds@linux-foundation.org>	2026-04-14 20:27:07 +0300
commit	c1fe867b5bf9c57ab7856486d342720e2b205eed (patch)
tree	3c9a31afe6b81498b821304dfa107dad7ee2da60 /include/linux/clocksource.h
parent	1d5e40351e7d521d7d143447d57315b6eb1e1160 (diff)
parent	ff1c0c5d07028a84837950b619d30da623f8ddb2 (diff)
download	linux-c1fe867b5bf9c57ab7856486d342720e2b205eed.tar.xz