Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"The irq department delivers:
- new core infrastructure to allow better management of multi-queue
devices (interrupt spreading, node aware descriptor allocation ...)
- a new interrupt flow handler to support the new fangled Intel VMD
devices.
- yet another new interrupt controller driver.
- a series of fixes which addresses sparse warnings, missing
includes, missing static declarations etc from Ben Dooks.
- a fix for the error handling in the hierarchical domain allocation
code.
- the usual pile of small updates to core and driver code"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
genirq: Fix missing irq allocation affinity hint
irqdomain: Fix irq_domain_alloc_irqs_recursive() error handling
irq/Documentation: Correct result of echnoing 5 to smp_affinity
MAINTAINERS: Remove Jiang Liu from irq domains
genirq/msi: Fix broken debug output
genirq: Add a helper to spread an affinity mask for MSI/MSI-X vectors
genirq/msi: Make use of affinity aware allocations
genirq: Use affinity hint in irqdesc allocation
genirq: Add affinity hint to irq allocation
genirq: Introduce IRQD_AFFINITY_MANAGED flag
genirq/msi: Remove unused MSI_FLAG_IDENTITY_MAP
irqchip/s3c24xx: Fixup IO accessors for big endian
irqchip/exynos-combiner: Fix usage of __raw IO
irqdomain: Fix disposal of mappings for interrupt hierarchies
irqchip/aspeed-vic: Add irq controller for Aspeed
doc/devicetree: Add Aspeed VIC bindings
x86/PCI/VMD: Use untracked irq handler
genirq: Add untracked irq handler
irqchip/mips-gic: Populate irq_domain names
irqchip/gicv3-its: Implement two-level(indirect) device table support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"This update provides the following changes:
- The rework of the timer wheel which addresses the shortcomings of
the current wheel (cascading, slow search for next expiring timer,
etc). That's the first major change of the wheel in almost 20
years since Finn implemted it.
- A large overhaul of the clocksource drivers init functions to
consolidate the Device Tree initialization
- Some more Y2038 updates
- A capability fix for timerfd
- Yet another clock chip driver
- The usual pile of updates, comment improvements all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
tick/nohz: Optimize nohz idle enter
clockevents: Make clockevents_subsys static
clocksource/drivers/time-armada-370-xp: Fix return value check
timers: Implement optimization for same expiry time in mod_timer()
timers: Split out index calculation
timers: Only wake softirq if necessary
timers: Forward the wheel clock whenever possible
timers/nohz: Remove pointless tick_nohz_kick_tick() function
timers: Optimize collect_expired_timers() for NOHZ
timers: Move __run_timers() function
timers: Remove set_timer_slack() leftovers
timers: Switch to a non-cascading wheel
timers: Reduce the CPU index space to 256k
timers: Give a few structs and members proper names
hlist: Add hlist_is_singular_node() helper
signals: Use hrtimer for sigtimedwait()
timers: Remove the deprecated mod_timer_pinned() API
timers, net/ipv4/inet: Initialize connection request timers as pinned
timers, drivers/tty/mips_ejtag: Initialize the poll timer as pinned
timers, drivers/tty/metag_da: Initialize the poll timer as pinned
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
"The main changes in this cycle were:
- Intel-SoC enhancements (Andy Shevchenko)
- Intel CPU symbolic model definition rework (Dave Hansen)
- ... other misc changes"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
x86/sfi: Enable enumeration of SD devices
x86/pci: Use MRFLD abbreviation for Merrifield
x86/platform/intel-mid: Make vertical indentation consistent
x86/platform/intel-mid: Mark regulators explicitly defined
x86/platform/intel-mid: Rename mrfl.c to mrfld.c
x86/platform/intel-mid: Enable spidev on Intel Edison boards
x86/platform/intel-mid: Extend PWRMU to support Penwell
x86/pci, x86/platform/intel_mid_pci: Remove duplicate power off code
x86/platform/intel-mid: Add pinctrl for Intel Merrifield
x86/platform/intel-mid: Enable GPIO expanders on Edison
x86/platform/intel-mid: Add Power Management Unit driver
x86/platform/atom/punit: Enable support for Merrifield
x86/platform/intel_mid_pci: Rework IRQ0 workaround
x86, thermal: Clean up and fix CPU model detection for intel_soc_dts_thermal
x86, mmc: Use Intel family name macros for mmc driver
x86/intel_telemetry: Use Intel family name macros for telemetry driver
x86/acpi/lss: Use Intel family name macros for the acpi_lpss driver
x86/cpufreq: Use Intel family name macros for the intel_pstate cpufreq driver
x86/platform: Use new Intel model number macros
x86/intel_idle: Use Intel family macros for intel_idle
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 stackdump update from Ingo Molnar:
"A number of stackdump enhancements"
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/dumpstack: Add show_stack_regs() and use it
printk: Make the printk*once() variants return a value
x86/dumpstack: Honor supplied @regs arg
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
"Various x86 low level modifications:
- preparatory work to support virtually mapped kernel stacks (Andy
Lutomirski)
- support for 64-bit __get_user() on 32-bit kernels (Benjamin
LaHaise)
- (involved) workaround for Knights Landing CPU erratum (Dave Hansen)
- MPX enhancements (Dave Hansen)
- mremap() extension to allow remapping of the special VDSO vma, for
purposes of user level context save/restore (Dmitry Safonov)
- hweight and entry code cleanups (Borislav Petkov)
- bitops code generation optimizations and cleanups with modern GCC
(H. Peter Anvin)
- syscall entry code optimizations (Paolo Bonzini)"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
x86/mm/cpa: Add missing comment in populate_pdg()
x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs
x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2
x86/smp: Remove unnecessary initialization of thread_info::cpu
x86/smp: Remove stack_smp_processor_id()
x86/uaccess: Move thread_info::addr_limit to thread_struct
x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err
x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct
x86/dumpstack: When OOPSing, rewind the stack before do_exit()
x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm
x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPS
x86/dumpstack: Try harder to get a call trace on stack overflow
x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables()
x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated
x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()
x86/mm: Use pte_none() to test for empty PTE
x86/mm: Disallow running with 32-bit PTEs to work around erratum
x86/mm: Ignore A/D bits in pte/pmd/pud_none()
x86/mm: Move swap offset/type up in PTE to work around erratum
x86/entry: Inline enter_from_user_mode()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull NOHZ updates from Ingo Molnar:
- fix system/idle cputime leaked on cputime accounting (all nohz
configs) (Rik van Riel)
- remove the messy, ad-hoc irqtime account on nohz-full and make it
compatible with CONFIG_IRQ_TIME_ACCOUNTING=y instead (Rik van Riel)
- cleanups (Frederic Weisbecker)
- remove unecessary irq disablement in the irqtime code (Rik van Riel)
* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cputime: Drop local_irq_save/restore from irqtime_account_irq()
sched/cputime: Reorganize vtime native irqtime accounting headers
sched/cputime: Clean up the old vtime gen irqtime accounting completely
sched/cputime: Replace VTIME_GEN irq time code with IRQ_TIME_ACCOUNTING code
sched/cputime: Count actually elapsed irq & softirq time
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- introduce and use task_rcu_dereference()/try_get_task_struct() to fix
and generalize task_struct handling (Oleg Nesterov)
- do various per entity load tracking (PELT) fixes and optimizations
(Peter Zijlstra)
- cputime virt-steal time accounting enhancements/fixes (Wanpeng Li)
- introduce consolidated cputime output file cpuacct.usage_all and
related refactorings (Zhao Lei)
- ... plus misc fixes and enhancements
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Panic on scheduling while atomic bugs if kernel.panic_on_warn is set
sched/cpuacct: Introduce cpuacct.usage_all to show all CPU stats together
sched/cpuacct: Use loop to consolidate code in cpuacct_stats_show()
sched/cpuacct: Merge cpuacct_usage_index and cpuacct_stat_index enums
sched/fair: Rework throttle_count sync
sched/core: Fix sched_getaffinity() return value kerneldoc comment
sched/fair: Reorder cgroup creation code
sched/fair: Apply more PELT fixes
sched/fair: Fix PELT integrity for new tasks
sched/cgroup: Fix cpu_cgroup_fork() handling
sched/fair: Fix PELT integrity for new groups
sched/fair: Fix and optimize the fork() path
sched/cputime: Add steal time support to full dynticks CPU time accounting
sched/cputime: Fix prev steal time accouting during CPU hotplug
KVM: Fix steal clock warp during guest CPU hotplug
sched/debug: Always show 'nr_migrations'
sched/fair: Use task_rcu_dereference()
sched/api: Introduce task_rcu_dereference() and try_get_task_struct()
sched/idle: Optimize the generic idle loop
sched/fair: Fix the wrong throttled clock time for cfs_rq_clock_task()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"With over 300 commits it's been a busy cycle - with most of the work
concentrated on the tooling side (as it should).
The main kernel side enhancements were:
- Add per event callchain limit: Recently we introduced a sysctl to
tune the max-stack for all events for which callchains were
requested:
$ sysctl kernel.perf_event_max_stack
kernel.perf_event_max_stack = 127
Now this patch introduces a way to configure this per event, i.e.
this becomes possible:
$ perf record -e sched:*/max-stack=2/ -e block:*/max-stack=10/ -a
allowing finer tuning of how much buffer space callchains use.
This uses an u16 from the reserved space at the end, leaving
another u16 for future use.
There has been interest in even finer tuning, namely to control the
max stack for kernel and userspace callchains separately. Further
discussion is needed, we may for instance use the remaining u16 for
that and when it is present, assume that the sample_max_stack
introduced in this patch applies for the kernel, and the u16 left
is used for limiting the userspace callchain (Arnaldo Carvalho de
Melo)
- Optimize AUX event (hardware assisted side-band event) delivery
(Kan Liang)
- Rework Intel family name macro usage (this is partially x86 arch
work) (Dave Hansen)
- Refine and fix Intel LBR support (David Carrillo-Cisneros)
- Add support for Intel 'TopDown' events (Andi Kleen)
- Intel uncore PMU driver fixes and enhancements (Kan Liang)
- ... other misc changes.
Here's an incomplete list of the tooling enhancements (but there's
much more, see the shortlog and the git log for details):
- Support cross unwinding, i.e. collecting '--call-graph dwarf'
perf.data files in one machine and then doing analysis in another
machine of a different hardware architecture. This enables, for
instance, to do:
$ perf record -a --call-graph dwarf
on a x86-32 or aarch64 system and then do 'perf report' on it on a
x86_64 workstation (He Kuang)
- Allow reading from a backward ring buffer (one setup via
sys_perf_event_open() with perf_event_attr.write_backward = 1)
(Wang Nan)
- Finish merging initial SDT (Statically Defined Traces) support, see
cset comments for details about how it all works (Masami Hiramatsu)
- Support attaching eBPF programs to tracepoints (Wang Nan)
- Add demangling of symbols in programs written in the Rust language
(David Tolnay)
- Add support for tracepoints in the python binding, including an
example, that sets up and parses sched:sched_switch events,
tools/perf/python/tracepoint.py (Jiri Olsa)
- Introduce --stdio-color to set up the color output mode selection
in 'annotate' and 'report', allowing emit color escape sequences
when redirecting the output of these tools (Arnaldo Carvalho de
Melo)
- Add 'callindent' option to 'perf script -F', to indent the Intel PT
call stack, making this output more ftrace-like (Adrian Hunter,
Andi Kleen)
- Allow dumping the object files generated by llvm when processing
eBPF scriptlet events (Wang Nan)
- Add stackcollapse.py script to help generating flame graphs (Paolo
Bonzini)
- Add --ldlat option to 'perf mem' to specify load latency for loads
event (e.g. cpu/mem-loads/ ) (Jiri Olsa)
- Tooling support for Intel TopDown counters, recently added to the
kernel (Andi Kleen)"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (303 commits)
perf tests: Add is_printable_array test
perf tools: Make is_printable_array global
perf script python: Fix string vs byte array resolving
perf probe: Warn unmatched function filter correctly
perf cpu_map: Add more helpers
perf stat: Balance opening and reading events
tools: Copy linux/{hash,poison}.h and check for drift
perf tools: Remove include/linux/list.h from perf's MANIFEST
tools: Copy the bitops files accessed from the kernel and check for drift
Remove: kernel unistd*h files from perf's MANIFEST, not used
perf tools: Remove tools/perf/util/include/linux/const.h
perf tools: Remove tools/perf/util/include/asm/byteorder.h
perf tools: Add missing linux/compiler.h include to perf-sys.h
perf jit: Remove some no-op error handling
perf jit: Add missing curly braces
objtool: Initialize variable to silence old compiler
objtool: Add -I$(srctree)/tools/arch/$(ARCH)/include/uapi
perf record: Add --tail-synthesize option
perf session: Don't warn about out of order event if write_backward is used
perf tools: Enable overwrite settings
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"The locking tree was busier in this cycle than the usual pattern - a
couple of major projects happened to coincide.
The main changes are:
- implement the atomic_fetch_{add,sub,and,or,xor}() API natively
across all SMP architectures (Peter Zijlstra)
- add atomic_fetch_{inc/dec}() as well, using the generic primitives
(Davidlohr Bueso)
- optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
Waiman Long)
- optimize smp_cond_load_acquire() on arm64 and implement LSE based
atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
on arm64 (Will Deacon)
- introduce smp_acquire__after_ctrl_dep() and fix various barrier
mis-uses and bugs (Peter Zijlstra)
- after discovering ancient spin_unlock_wait() barrier bugs in its
implementation and usage, strengthen its semantics and update/fix
usage sites (Peter Zijlstra)
- optimize mutex_trylock() fastpath (Peter Zijlstra)
- ... misc fixes and cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
locking/static_keys: Fix non static symbol Sparse warning
locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
locking/atomic, arch/tile: Fix tilepro build
locking/atomic, arch/m68k: Remove comment
locking/atomic, arch/arc: Fix build
locking/Documentation: Clarify limited control-dependency scope
locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
locking/atomic, arch/mips: Convert to _relaxed atomics
locking/atomic, arch/alpha: Convert to _relaxed atomics
locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
locking/atomic: Fix atomic64_relaxed() bits
locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
"The biggest change in this cycle were SGI/UV related changes that
clean up and fix UV boot quirks and problems.
There's also various smaller cleanups and refinements"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Reorganize the GUID table to make it easier to read
x86/efi: Remove the unused efi_get_time() function
x86/efi: Update efi_thunk() to use the the arch_efi_call_virt*() macros
x86/uv: Update uv_bios_call() to use efi_call_virt_pointer()
efi: Convert efi_call_virt() to efi_call_virt_pointer()
x86/efi: Remove unused variable 'efi'
efi: Document #define FOO_PROTOCOL_GUID layout
efibc: Report more information in the error messages
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"The main changes in this cycle were:
- documentation updates
- miscellaneous fixes
- minor reorganization of code
- torture-test updates"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
rcu: Correctly handle sparse possible cpus
rcu: sysctl: Panic on RCU Stall
rcu: Fix a typo in a comment
rcu: Make call_rcu_tasks() tolerate first call with irqs disabled
rcu: Disable TASKS_RCU for usermode Linux
rcu: No ordering for rcu_assign_pointer() of NULL
rcutorture: Fix error return code in rcu_perf_init()
torture: Inflict default jitter
rcuperf: Don't treat gp_exp mis-setting as a WARN
rcutorture: Drop "-soundhw pcspkr" from x86 boot arguments
rcutorture: Don't specify the cpu type of QEMU on PPC
rcutorture: Make -soundhw a x86 specific option
rcutorture: Use vmlinux as the fallback kernel image
rcutorture/doc: Create initrd using dracut
torture: Stop onoff task if there is only one cpu
torture: Add starvation events to error summary
torture: Break online and offline functions out of torture_onoff()
torture: Forgive lengthy trace dumps and preemption
torture: Remove CONFIG_RCU_TORTURE_TEST_RUNNABLE, simplify code
torture: Simplify code, eliminate RCU_PERF_TEST_RUNNABLE
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon updates from Guenter Roeck:
- New drivers for FTS BMC "Teutates", TI INA3221, and Sensirion SHT3x.
- Added support for Microchip MCP9808 and TI TMP461.
- Cleanup and minor fixes in various drivers.
* tag 'hwmon-for-linus-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (37 commits)
Documentation: dtb: xgene: Add hwmon dts binding documentation
hwmon: (ftsteutates) Remove unused including <linux/version.h>
hwmon: (adt7411) set bit 3 in CFG1 register
hwmon: Add driver for FTS BMC chip "Teutates"
hwmon: (sht3x) add humidity heater element control
hwmon: (jc42) Add support for generic JC-42.4 devicetree binding
dt/bindings: Add bindings for JC-42.4 compatible temperature sensors
hwmon: (tmp102) Convert to use regmap, and drop local cache
hwmon: (tmp102) Rework chip configuration
hwmon: (tmp102) Improve handling of initial read delay
hwmon: (lm90) Drop unnecessary else statements
hwmon: (lm90) Use bool for valid flag
hwmon: (lm90) Read limit registers only once
hwmon: (lm90) Simplify read functions
hwmon: (lm90) Use devm_hwmon_device_register_with_groups
hwmon: (lm90) Use devm_add_action for cleanup
hwmon: (lm75) Convert to use regmap
hwmon: (lm75) Add update_interval attribute
hwmon: (lm75) Drop lm75_read_value and lm75_write_value
hwmon: (lm75) Handle cleanup with devm_add_action
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB updates from Greg KH:
"Here's the big USB driver update for 4.8-rc1. Lots of the normal
stuff in here, musb, gadget, xhci, and other updates and fixes. All
of the details are in the shortlog.
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (169 commits)
cdc-acm: beautify probe()
cdc-wdm: use the common CDC parser
cdc-acm: cleanup error handling
cdc-acm: use the common parser
usbnet: move the CDC parser into USB core
usb: musb: sunxi: Simplify dr_mode handling
usb: musb: sunxi: make unexported symbols static
usb: musb: cppi41: add dma channel tracepoints
usb: musb: cppi41: move struct cppi41_dma_channel to header
usb: musb: cleanup cppi_dma header
usb: musb: gadget: add usb-request tracepoints
usb: musb: host: add urb tracepoints
usb: musb: add tracepoints to dump interrupt events
usb: musb: add tracepoints for register access
usb: musb: dsps: use musb register read/write wrappers instead
usb: musb: switch dev_dbg to tracepoints
usb: musb: add tracepoints support for debugging
usb: quirks: Add no-lpm quirk for Elan
phy: rcar-gen3-usb2: fix mutex_lock calling in interrupt
phy: rockhip-usb: use devm_add_action_or_reset()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here is the big tty and serial driver update for 4.8-rc1.
Lots of good cleanups from Jiri on a number of vt and other tty
related things, and the normal driver updates. Full details are in
the shortlog.
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (90 commits)
tty/serial: atmel: enforce tasklet init and termination sequences
serial: sh-sci: Stop transfers in sci_shutdown()
serial: 8250_ingenic: drop #if conditional surrounding earlycon code
serial: 8250_mtk: drop !defined(MODULE) conditional
serial: 8250_uniphier: drop !defined(MODULE) conditional
earlycon: mark earlycon code as __used iif the caller is built-in
tty/serial/8250: use mctrl_gpio helpers
serial: mctrl_gpio: enable API usage only for initialized mctrl_gpios struct
serial: mctrl_gpio: add modem control read routine
tty/serial/8250: make UART_MCR register access consistent
serial: 8250_mid: Read RX buffer on RX DMA timeout for DNV
serial: 8250_dma: Export serial8250_rx_dma_flush()
dmaengine: hsu: Export hsu_dma_get_status()
tty: serial: 8250: add CON_CONSDEV to flags
tty: serial: samsung: add byte-order aware bit functions
tty: serial: samsung: fixup accessors for endian
serial: sirf: make fifo functions static
serial: mps2-uart: make driver explicitly non-modular
serial: mvebu-uart: free the IRQ in ->shutdown()
serial/bcm63xx_uart: use correct alias naming
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging and IIO driver updates from Greg KH:
"Here is the big Staging and IIO driver update for 4.8-rc1.
We ended up adding more code than removing, again, but it's not all
that bad. Lots of cleanups all over the staging tree, and new IIO
drivers, full details in the shortlog.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (417 commits)
drivers:iio:accel:mma8452: removed unwanted return statements
drivers:iio:accel:mma8452: added cleanup provision in case of failure.
iio: Add iio.git tree to MAINTAINERS
iio:st_pressure: clean useless static channel initializers
iio:st_pressure:lps22hb: temperature support
iio:st_pressure:lps22hb: open drain support
iio:st_pressure: temperature triggered buffering
iio:st_pressure: document sampling gains
iio:st_pressure: align storagebits on power of 2
iio:st_sensors: align on storagebits boundaries
staging:iio:lis3l02dq drop separate driver
iio: accel: st_accel: Add lis3l02dq support
iio: adc: add missing of_node references to iio_dev
iio: adc: ti-ads1015: add indio_dev->dev.of_node reference
iio: potentiometer: Fix typo in Kconfig
iio: potentiometer: mcp4531: Add device tree binding
iio: potentiometer: mcp4531: Add device tree binding documentation
iio: potentiometer: mcp4531: Add support for MCP454x, MCP456x, MCP464x and MCP466x
iio:imu:mpu6050: icm20608 initial support
iio: adc: max1363: Add device tree binding
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big char/misc driver update for 4.8-rc1.
Not a lot of stuff, but it's all over the place, full details are in
the shortlog. All of these have been in linux-next with no reported
issues for a while"
* tag 'char-misc-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (49 commits)
lkdtm: silence warnings about function declarations
lkdtm: hide unused functions
intel_th: pci: Add Kaby Lake PCH-H support
intel_th: Fix a deadlock in modprobing
dsp56k: prevent a harmless underflow
chardev: add missing line break in pr_warn
lkdtm: use struct arrays instead of enums
lkdtm: move jprobe entry points to start of source
lkdtm: reorganize module paramaters
lkdtm: rename globals for clarity
lkdtm: rename "count" to "crash_count"
lkdtm: remove intentional off-by-one array access
lkdtm: split remaining logic bug tests to separate file
lkdtm: split heap corruption tests to separate file
lkdtm: split memory permissions tests to separate file
lkdtm: split usercopy tests to separate file
lkdtm: drop "alloc_size" parameter
lkdtm: add usercopy test for blocking kernel text
extcon: adc-jack: add suspend/resume support
extcon: add missing of_node_put after calling of_parse_phandle
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Bob Peterson:
"We've got ten patches this time, half of which are related to a
plethora of nasty outcomes when inodes are transitioned from the
unlinked state to the free state. Small file systems are particularly
vulnerable to these problems, and it can manifest as mainly hangs, but
also file system corruption. The patches have been tested for
literally many weeks, with a very gruelling test, so I have a high
level of confidence.
- Andreas Gruenbacher wrote a series of five patches for various
lockups during the transition of inodes from unlinked to free.
The main patch is titled "Fix gfs2_lookup_by_inum lock inversion"
and the other four are support and cleanup patches related to that.
- Ben Marzinski contributed two patches with regard to a recreatable
problem when gfs2 tries to write a page to a file that is being
truncated, resulting in a BUG() in gfs2_remove_from_journal.
Note that Ben had to export vfs function __block_write_full_page to
get this to work properly. It's been posted a long time and he
talked to various VFS people about it, and nobody seemed to mind.
- I contributed 3 patches:
o The first one fixes a memory corruptor: a race in which one
process can overwrite the gl_object pointer set by another
process, causing kernel panic and other symptoms.
o The second patch fixes another race that resulted in a
false-positive BUG_ON. This occurred when resource group
reservations were freed by one process while another process
was trying to grab a new reservation in the same resource
group.
o The third patch fixes a problem with doing journal replay when
the journals are not all the same size"
* tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
GFS2: Fix gfs2_replay_incr_blk for multiple journal sizes
GFS2: Check rs_free with rd_rsspin protection
gfs2: writeout truncated pages
fs: export __block_write_full_page
gfs2: Lock holder cleanup
gfs2: Large-filesystem fix for 32-bit systems
gfs2: Get rid of gfs2_ilookup
gfs2: Fix gfs2_lookup_by_inum lock inversion
gfs2: Initialize iopen glock holder for new inodes
GFS2: don't set rgrp gl_object until it's inserted into rgrp tree
|
|
Pull networking fixes from David Miller:
1) Fix memory leak in nftables, from Liping Zhang.
2) Need to check result of vlan_insert_tag() in batman-adv otherwise we
risk NULL skb derefs, from Sven Eckelmann.
3) Check for dev_alloc_skb() failures in cfg80211, from Gregory
Greenman.
4) Handle properly when we have ppp_unregister_channel() happening in
parallel with ppp_connect_channel(), from WANG Cong.
5) Fix DCCP deadlock, from Eric Dumazet.
6) Bail out properly in UDP if sk_filter() truncates the packet to be
smaller than even the space that the protocol headers need. From
Michal Kubecek.
7) Similarly for rose, dccp, and sctp, from Willem de Bruijn.
8) Make TCP challenge ACKs less predictable, from Eric Dumazet.
9) Fix infinite loop in bgmac_dma_tx_add() from Florian Fainelli.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
packet: propagate sock_cmsg_send() error
net/mlx5e: Fix del vxlan port command buffer memset
packet: fix second argument of sock_tx_timestamp()
net: switchdev: change ageing_time type to clock_t
Update maintainer for EHEA driver.
net/mlx4_en: Add resilience in low memory systems
net/mlx4_en: Move filters cleanup to a proper location
sctp: load transport header after sk_filter
net/sched/sch_htb: clamp xstats tokens to fit into 32-bit int
net: cavium: liquidio: Avoid dma_unmap_single on uninitialized ndata
net: nb8800: Fix SKB leak in nb8800_receive()
et131x: Fix logical vs bitwise check in et131x_tx_timeout()
vlan: use a valid default mtu value for vlan over macsec
net: bgmac: Fix infinite loop in bgmac_dma_tx_add()
mlxsw: spectrum: Prevent invalid ingress buffer mapping
mlxsw: spectrum: Prevent overwrite of DCB capability fields
mlxsw: spectrum: Don't emit errors when PFC is disabled
mlxsw: spectrum: Indicate support for autonegotiation
mlxsw: spectrum: Force link training according to admin state
r8152: add MODULE_VERSION
...
|
|
radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags.
Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot()
leading to crash:
RIP: radix_tree_next_slot include/linux/radix-tree.h:473
find_get_pages_tag+0x334/0x930 mm/filemap.c:1452
....
Call Trace:
pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960
mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516
ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736
do_writepages+0x97/0x100 mm/page-writeback.c:2364
__filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300
filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490
ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115
vfs_fsync_range+0x10a/0x250 fs/sync.c:195
vfs_fsync fs/sync.c:209
do_fsync+0x42/0x70 fs/sync.c:219
SYSC_fdatasync fs/sync.c:232
SyS_fdatasync+0x19/0x20 fs/sync.c:230
entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207
We must reset iterator's tags to bail out from radix_tree_next_slot()
and go to the slow-path in radix_tree_next_chunk().
Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup")
Link: http://lkml.kernel.org/r/1468495196-10604-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The memory controller has quite a bit of state that usually outlives the
cgroup and pins its CSS until said state disappears. At the same time
it imposes a 16-bit limit on the CSS ID space to economically store IDs
in the wild. Consequently, when we use cgroups to contain frequent but
small and short-lived jobs that leave behind some page cache, we quickly
run into the 64k limitations of outstanding CSSs. Creating a new cgroup
fails with -ENOSPC while there are only a few, or even no user-visible
cgroups in existence.
Although pinning CSSs past cgroup removal is common, there are only two
instances that actually need an ID after a cgroup is deleted: cache
shadow entries and swapout records.
Cache shadow entries reference the ID weakly and can deal with the CSS
having disappeared when it's looked up later. They pose no hurdle.
Swap-out records do need to pin the css to hierarchically attribute
swapins after the cgroup has been deleted; though the only pages that
remain swapped out after offlining are tmpfs/shmem pages. And those
references are under the user's control, so they are manageable.
This patch introduces a private 16-bit memcg ID and switches swap and
cache shadow entries over to using that. This ID can then be recycled
after offlining when the CSS remains pinned only by objects that don't
specifically need it.
This script demonstrates the problem by faulting one cache page in a new
cgroup and deleting it again:
set -e
mkdir -p pages
for x in `seq 128000`; do
[ $((x % 1000)) -eq 0 ] && echo $x
mkdir /cgroup/foo
echo $$ >/cgroup/foo/cgroup.procs
echo trex >pages/$x
echo $$ >/cgroup/cgroup.procs
rmdir /cgroup/foo
done
When run on an unpatched kernel, we eventually run out of possible IDs
even though there are no visible cgroups:
[root@ham ~]# ./cssidstress.sh
[...]
65000
mkdir: cannot create directory '/cgroup/foo': No space left on device
After this patch, the IDs get released upon cgroup destruction and the
cache and css objects get released once memory reclaim kicks in.
[hannes@cmpxchg.org: init the IDR]
Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org
Fixes: b2052564e66d ("mm: memcontrol: continue cache reclaim from offlined groups")
Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: John Garcia <john.garcia@mesosphere.io>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Nikolay Borisov <kernel@kyup.com>
Cc: <stable@vger.kernel.org> [3.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
macsec can't cope with mtu frames which need vlan tag insertion, and
vlan device set the default mtu equal to the underlying dev's one.
By default vlan over macsec devices use invalid mtu, dropping
all the large packets.
This patch adds a netif helper to check if an upper vlan device
needs mtu reduction. The helper is used during vlan devices
initialization to set a valid default and during mtu updating to
forbid invalid, too bit, mtu values.
The helper currently only check if the lower dev is a macsec device,
if we get more users, we need to update only the helper (possibly
reserving an additional IFF bit).
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Conflicts:
tools/testing/selftests/x86/Makefile
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The VM_BUG_ON_PAGE in page_move_anon_rmap() is more trouble than it's
worth: the syzkaller fuzzer hit it again. It's still wrong for some THP
cases, because linear_page_index() was never intended to apply to
addresses before the start of a vma.
That's easily fixed with a signed long cast inside linear_page_index();
and Dmitry has tested such a patch, to verify the false positive. But
why extend linear_page_index() just for this case? when the avoidance in
page_move_anon_rmap() has already grown ugly, and there's no reason for
the check at all (nothing else there is using address or index).
Remove address arg from page_move_anon_rmap(), remove VM_BUG_ON_PAGE,
remove CONFIG_DEBUG_VM PageTransHuge adjustment.
And one more thing: should the compound_head(page) be done inside or
outside page_move_anon_rmap()? It's usually pushed down to the lowest
level nowadays (and mm/memory.c shows no other explicit use of it), so I
think it's better done in page_move_anon_rmap() than by caller.
Fixes: 0798d3c022dc ("mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap()")
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1607120444540.12528@eggly.anvils
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org> [4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
I found a race condition triggering VM_BUG_ON() in freeze_page(), when
running a testcase with 3 processes:
- process 1: keep writing thp,
- process 2: keep clearing soft-dirty bits from virtual address of process 1
- process 3: call migratepages for process 1,
The kernel message is like this:
kernel BUG at /src/linux-dev/mm/huge_memory.c:3096!
invalid opcode: 0000 [#1] SMP
Modules linked in: cfg80211 rfkill crc32c_intel ppdev serio_raw pcspkr virtio_balloon virtio_console parport_pc parport pvpanic acpi_cpufreq tpm_tis tpm i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy virtio_pci virtio_ring virtio
CPU: 0 PID: 28863 Comm: migratepages Not tainted 4.6.0-v4.6-160602-0827-+ #2
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff880037320000 ti: ffff88007cdd0000 task.ti: ffff88007cdd0000
RIP: 0010:[<ffffffff811f8e06>] [<ffffffff811f8e06>] split_huge_page_to_list+0x496/0x590
RSP: 0018:ffff88007cdd3b70 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff88007c7b88c0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000700000200 RDI: ffffea0003188000
RBP: ffff88007cdd3bb8 R08: 0000000000000001 R09: 00003ffffffff000
R10: ffff880000000000 R11: ffffc000001fffff R12: ffffea0003188000
R13: ffffea0003188000 R14: 0000000000000000 R15: 0400000000000080
FS: 00007f8ec241d740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8ec1f3ed20 CR3: 000000003707b000 CR4: 00000000000006f0
Call Trace:
? list_del+0xd/0x30
queue_pages_pte_range+0x4d1/0x590
__walk_page_range+0x204/0x4e0
walk_page_range+0x71/0xf0
queue_pages_range+0x75/0x90
? queue_pages_hugetlb+0x190/0x190
? new_node_page+0xc0/0xc0
? change_prot_numa+0x40/0x40
migrate_to_node+0x71/0xd0
do_migrate_pages+0x1c3/0x210
SyS_migrate_pages+0x261/0x290
entry_SYSCALL_64_fastpath+0x1a/0xa4
Code: e8 b0 87 fb ff 0f 0b 48 c7 c6 30 32 9f 81 e8 a2 87 fb ff 0f 0b 48 c7 c6 b8 46 9f 81 e8 94 87 fb ff 0f 0b 85 c0 0f 84 3e fd ff ff <0f> 0b 85 c0 0f 85 a6 00 00 00 48 8b 75 c0 4c 89 f7 41 be f0 ff
RIP split_huge_page_to_list+0x496/0x590
I'm not sure of the full scenario of the reproduction, but my debug
showed that split_huge_pmd_address(freeze=true) returned without running
main code of pmd splitting because pmd_present(*pmd) in precheck somehow
returned 0. If this happens, the subsequent try_to_unmap() fails and
returns non-zero (because page_mapcount() still > 0), and finally
VM_BUG_ON() fires. This patch tries to fix it by prechecking pmd state
inside ptl.
Link: http://lkml.kernel.org/r/1466990929-7452-1-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The vtime irqtime accounting headers are very scattered and convoluted
right now. Reorganize them such that it is obvious that only
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE does use it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1468421405-20056-5-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Vtime generic irqtime accounting has been removed but there are a few
remnants to clean up:
* The vtime_accounting_cpu_enabled() check in irq entry was only used
by CONFIG_VIRT_CPU_ACCOUNTING_GEN. We can safely remove it.
* Without the vtime_accounting_cpu_enabled(), we no longer need to
have a vtime_common_account_irq_enter() indirect function.
* Move vtime_account_irq_enter() implementation under
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE which is the last user.
* The vtime_account_user() call was only used on irq entry for
CONFIG_VIRT_CPU_ACCOUNTING_GEN. We can remove that too.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1468421405-20056-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The CONFIG_VIRT_CPU_ACCOUNTING_GEN irq time tracking code does not
appear to currently work right.
On CPUs without nohz_full=, only tick based irq time sampling is
done, which breaks down when dealing with a nohz_idle CPU.
On firewalls and similar systems, no ticks may happen on a CPU for a
while, and the irq time spent may never get accounted properly. This
can cause issues with capacity planning and power saving, which use
the CPU statistics as inputs in decision making.
Remove the VTIME_GEN vtime irq time code, and replace it with the
IRQ_TIME_ACCOUNTING code, when selected as a config option by the user.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1468421405-20056-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next
Jonathan writes:
Third set of IIO new device support, features and cleanups for the 4.8 cycle.
New core features
- Selection of the clock source for IIO timestamps. This is done per device
as it makes little sense to have events in one timebase and data timestamped
on another. Biggest reason for this is that we currently use a clock
source which is non monotonic which can result in 'interesting' data sets.
(Includes export for get_monotonic_corse64 which Thomas Gleixner didn't mind
in an earlier version.)
- MAINTAINERS add the git tree to the list for IIO.
New device support + a kind of indirect staging graduation.
* Broadcom iproc-static-adc
- new driver
* mcp4531
- support for MCP454x, MCP456x, MCP464x and MCP466x potentiometers
* mpu6050
- support the IC20608 6 axis motion tracking device
* st-sensors
- support the lis3l02dq + drop the lis3l02dq driver from staging.
The general purpose driver is missing event support, but good to get
rid of this driver which was rather long in the tooth.
New driver features
* ak8975
- Add vid regulator support and refactor handling in general.
- Allow a delay after enabling regulators.
- Runtime and system PM.
* bmg160
- filter frequency control support.
* bmp280
- SPI device support.
- EOC interrupt support for the BMP085
- power management support.
- supply regulator support.
- reset gpio support
- dt bindings for reset gpio and regulators.
- of table to support device tree registration
* max1363
- Device tree bindings.
* mcp4531
- Device tree bindings.
* st-pressure
- temperature channels as part of triggered buffer (previously not due
probably to alignment issues - see below).
- lps22hb open drain interrupt support.
- lps22hb temperature channel support
Cleanups and reworkings.
* numerous ADC drivers
- ensure the iio_dev->dev.of_node is set to the parent dev.of_node so
as to allow client bindings to find the device.
* ak8975
- Fix incorrect handling of missing regulator
- make sure power is down and remove.
* bmp280
- read the calibration data only once as it doesn't change.
* isl29125
- Use a few macros to make code a touch more readable.
* mma8452
- fix a memory leak on error.
- drop an unecessary bit of return value handling.
* potentiometer kconfig
- typo fix.
* st-pressure
- drop some uninformative default assignments of elements of the channel
array structure (aids readability).
* st-sensors
- Harden interrupt handling considerably. These are actually all using
level interrupts, but at least two known boards have them wired to
edge only interrupt chips. Hence a slightly interesting bit of handling
is needed in which we first allow for the easy option (level triggered) and
secondly check the status registers before reenabling edge interrupts and
fall back to a tight loop in the thread until we successfully clear the
interrupt. No harm is done if we never succeed in doing so. It's an odd
patch that has been through a lot of revisions to reach a consensus on how
to handle what is basically broken hardware (which the previous defaults
allowed to kind of work).
- Fix alignment to defined storagebytes boundaries.
- Ensure alignment of power of 2 byte boundaries. This has always in theory
been part of the ABI of IIO, but we missed a few that snuck in that need
fixing. The effect was minor as they were only followed by timestamp
channels which were correctly aligned,
- Add some docs to explain the gain calculations.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kishon/linux-phy into usb-testing
Kishon writes:
phy: for 4.8 -rc1
*) Add a new phy_ops for setting the phy mode
*) Add a new phy driver for DA8xx SoC USB PHY
*) Minor fixes and cleanups
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon into char-misc-testing
Chanwoo writes:
Update
extcot://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon.git
tags/extcon-next-for-4.8
n for 4.8
Detailed description for patchset:
1. Update the extcon-gpio.c driver
- Use PM wakeirq APIs and support to check the state of external connector
when wake-up from suspend state if the interrupt of external connector is
not used as wakeup source.
- Support for ACPI gpio interface
2. Remove deprecated extcon APIs using the legacy cable name
- The extcon framework handle the external connector only by unique id
instead of legacy cable name to prevent the problem.
- Removed functions
: extcon_get_cable_state()
: extcon_set_cable_state()
: extcon_register_interest()
: extcon_unregister_interest()
- It has the dependency on the axp288_charger.c driver.
So, this pull request includes the 'ib-extcon-powersupply-4.8'
immutable branch to protect the merge conflict.
3. Support the resource-managed function for extcon_register_notifier
- Add the devm_extcon_register/unregister_notifier() funticon to handle
the resource automatically by resource managed functions and split out
the resource-managed function from extcon core to seprate file(devres.c).
4. Supprot the suspend/resume for extcon-adc-jack.c driver
- Add the support the suspend/resume function to use extcon-adc-jack.c
as wakeup source.
5. Fix the minor issue
- Check the return value of find_cable_index_by_id()
- Move the struct extcon_cable to extcon core from header file
because it should be only handled on extcon core.
- Add the missing of_node_put() after calling of_parse_phandle()
to decrement the reference count.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"This contains three commits to fix memory corruption bugs with certain
Apple AirPort cards, plus a fix for a X86_BUG() ID definitions collision
bug in asm/cpufeatures.h"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/quirks: Add early quirk to reset Apple AirPort card
x86/quirks: Reintroduce scanning of secondary buses
x86/quirks: Apply nvidia_bugs quirk only on root bus
x86/cpu: Fix duplicated X86_BUG(9) macro
|
|
Sockets can have a filter program attached that drops or trims
incoming packets based on the filter program return value.
Rose requires data packets to have at least ROSE_MIN_LEN bytes. It
verifies this on arrival in rose_route_frame and unconditionally pulls
the bytes in rose_recvmsg. The filter can trim packets to below this
value in-between, causing pull to fail, leaving the partial header at
the time of skb_copy_datagram_msg.
Place a lower bound on the size to which sk_filter may trim packets
by introducing sk_filter_trim_cap and call this for rose packets.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SFI specification v0.8.2 defines type of devices which are connected to
SD bus. In particularly WiFi dongle is a such.
Add a callback to enumerate the devices connected to SD bus.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1468322192-62080-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
posix_acl: de-union a_refcount and a_rcu
nfs_atomic_open(): prevent parallel nfs_lookup() on a negative hashed
Use the right predicate in ->atomic_open() instances
|
|
Currently the two are unioned together, but I don't think that's safe.
It looks like get_cached_acl could race with the last put in
posix_acl_release. get_cached_acl calls atomic_inc_not_zero on
a_refcount, but that field could have already been clobbered by
call_rcu, and may no longer be zero. Fix this by de-unioning the two
fields.
Fixes: b8a7a3a66747 (posix_acl: Inode acl caching fixes)
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This reverts commit 2c95afc1e83d93fac3be6923465e1753c2c53b0a.
Stephane reported the following regression:
> Since Andi added:
>
> commit 2c95afc1e83d93fac3be6923465e1753c2c53b0a
> Author: Andi Kleen <ak@linux.intel.com>
> Date: Thu Jun 9 06:14:38 2016 -0700
>
> perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86
>
> $ perf stat -e ref-cycles ls
> <not counted> ....
>
> fails systematically because the ref-cycles is now used by the
> watchdog and given this is a system-wide pinned event, it monopolizes
> the fixed counter 2 which is the only counter able to measure this event.
Since the next merge window is near, fix the regression for now
by reverting the commit.
Reported-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The EFI firmware on Macs contains a full-fledged network stack for
downloading OS X images from osrecovery.apple.com. Unfortunately
on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331
wireless card on every boot and leaves it enabled even after
ExitBootServices has been called. The card continues to assert its IRQ
line, causing spurious interrupts if the IRQ is shared. It also corrupts
memory by DMAing received packets, allowing for remote code execution
over the air. This only stops when a driver is loaded for the wireless
card, which may be never if the driver is not installed or blacklisted.
The issue seems to be constrained to the Broadcom 4331. Chris Milsted
has verified that the newer Broadcom 4360 built into the MacBookPro11,3
(2013/2014) does not exhibit this behaviour. The chances that Apple will
ever supply a firmware fix for the older machines appear to be zero.
The solution is to reset the card on boot by writing to a reset bit in
its mmio space. This must be done as an early quirk and not as a plain
vanilla PCI quirk to successfully combat memory corruption by DMAed
packets: Matthew Garrett found out in 2012 that the packets are written
to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html).
This type of memory is made available to the page allocator by
efi_free_boot_services(). Plain vanilla PCI quirks run much later, in
subsys initcall level. In-between a time window would be open for memory
corruption. Random crashes occurring in this time window and attributed
to DMAed packets have indeed been observed in the wild by Chris
Bainbridge.
When Matthew Garrett analyzed the memory corruption issue in 2012, he
sought to fix it with a grub quirk which transitions the card to D3hot:
http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56
This approach does not help users with other bootloaders and while it
may prevent DMAed packets, it does not cure the spurious interrupts
emanating from the card. Unfortunately the card's mmio space is
inaccessible in D3hot, so to reset it, we have to undo the effect of
Matthew's grub patch and transition the card back to D0.
Note that the quirk takes a few shortcuts to reduce the amount of code:
The size of BAR 0 and the location of the PM capability is identical
on all affected machines and therefore hardcoded. Only the address of
BAR 0 differs between models. Also, it is assumed that the BCMA core
currently mapped is the 802.11 core. The EFI driver seems to always take
care of this.
Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback
towards finding the best solution to this problem.
The following should be a comprehensive list of affected models:
iMac13,1 2012 21.5" [Root Port 00:1c.3 = 8086:1e16]
iMac13,2 2012 27" [Root Port 00:1c.3 = 8086:1e16]
Macmini5,1 2011 i5 2.3 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini5,2 2011 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini5,3 2011 i7 2.0 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini6,1 2012 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1e12]
Macmini6,2 2012 i7 2.3 GHz [Root Port 00:1c.1 = 8086:1e12]
MacBookPro8,1 2011 13" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro8,2 2011 15" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro8,3 2011 17" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro9,1 2012 15" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro9,2 2012 13" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro10,1 2012 15" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro10,2 2012 13" [Root Port 00:1c.1 = 8086:1e12]
For posterity, spurious interrupts caused by the Broadcom 4331 wireless
card resulted in splats like this (stacktrace omitted):
irq 17: nobody cared (try booting with the "irqpoll" option)
handlers:
[<ffffffff81374370>] pcie_isr
[<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci]
[<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec]
Disabling IRQ #17
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732
Tested-by: Konstantin Simanov <k.simanov@stlk.ru> # [MacBookPro8,1]
Tested-by: Lukas Wunner <lukas@wunner.de> # [MacBookPro9,1]
Tested-by: Bryan Paradis <bryan.paradis@gmail.com> # [MacBookPro9,2]
Tested-by: Andrew Worsley <amworsley@gmail.com> # [MacBookPro10,1]
Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris Milsted <cmilsted@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Michael Buesch <m@bues.ch>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: b43-dev@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: stable@vger.kernel.org # 123456789abc: x86/quirks: Apply nvidia_bugs quirk only on root bus
Cc: stable@vger.kernel.org # 123456789abc: x86/quirks: Reintroduce scanning of secondary buses
Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de
[ Did minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Thanks to all the work that was done by Andy Lutomirski and others,
enter_from_user_mode() and prepare_exit_to_usermode() are now called only with
interrupts disabled. Let's provide them a version of user_enter()/user_exit()
that skips saving and restoring the interrupt flag.
On an AMD-based machine I tested this patch on, with force-enabled
context tracking, the speed-up in system calls was 90 clock cycles or 6%,
measured with the following simple benchmark:
#include <sys/signal.h>
#include <time.h>
#include <unistd.h>
#include <stdio.h>
unsigned long rdtsc()
{
unsigned long result;
asm volatile("rdtsc; shl $32, %%rdx; mov %%eax, %%eax\n"
"or %%rdx, %%rax" : "=a" (result) : : "rdx");
return result;
}
int main()
{
unsigned long tsc1, tsc2;
int pid = getpid();
int i;
tsc1 = rdtsc();
for (i = 0; i < 100000000; i++)
kill(pid, SIGWINCH);
tsc2 = rdtsc();
printf("%ld\n", tsc2 - tsc1);
}
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1466434712-31440-2-git-send-email-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Re-organize the GUID table so that every GUID takes a single line.
This makes each line super long, but if you have a large enough terminal
(or zoom out of a small terminal) then you can see the structure at
a glance - which is more readable than it was the case with the
multi-line layout.
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Joe Perches <joe@perches.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20160627104920.GA9099@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Add possibility for 32-bit user-space applications to move
the vDSO mapping.
Previously, when a user-space app called mremap() for the vDSO
address, in the syscall return path it would land on the previous
address of the vDSOpage, resulting in segmentation violation.
Now it lands fine and returns to userspace with a remapped vDSO.
This will also fix the context.vdso pointer for 64-bit, which does
not affect the user of vDSO after mremap() currently, but this
may change in the future.
As suggested by Andy, return -EINVAL for mremap() that would
split the vDSO image: that operation cannot possibly result in
a working system so reject it.
Renamed and moved the text_mapping structure declaration inside
map_vdso(), as it used only there and now it complements the
vvar_mapping variable.
There is still a problem for remapping the vDSO in glibc
applications: the linker relocates addresses for syscalls
on the vDSO page, so you need to relink with the new
addresses.
Without that the next syscall through glibc may fail:
Program received signal SIGSEGV, Segmentation fault.
#0 0xf7fd9b80 in __kernel_vsyscall ()
#1 0xf7ec8238 in _exit () from /usr/lib32/libc.so.6
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160628113539.13606-2-dsafonov@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Have printk*once() return a bool which denotes whether the string was
printed or not so that calling code can react accordingly.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1467671487-10344-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
http://git.linaro.org/people/daniel.lezcano/linux into timers/core
Pull the clockevents/clocksource tree from Daniel Lezcano:
- Convert the clocksource-probe init functions to return a value in order to
prepare the consolidation of the drivers using the DT. It is a big patchset
but went through 01.org (kbuild bot), linux next and kernel-ci (continuous
integration) (Daniel Lezcano)
- Fix a bad error handling by returning the right value for cadence_ttc
(Christophe Jaillet)
- Fix typo in the Kconfig for the Samsung pwm (Alexandre Belloni)
- Change functions to static for armada-370-xp and digicolor (Ben Dooks)
- Add support for the rk3399 SoC timer by adding bindings and a slight
change in the base address. Take the opportunity to add the DYNIRQ flag
(Huang Tao)
- Fix endian accessors for the Samsung pwm timer (Matthew Leach)
- Add Oxford Semiconductor RPS Dual Timer driver (Neil Armstrong)
- Add a kernel parameter to swich on/off the event stream feature of the arch
arm timer (Will Deacon)
|
|
|
|
We now have implicit batching in the timer wheel. The slack API is no longer
used, so remove it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrew F. Davis <afd@ti.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jaehoon Chung <jh80.chung@samsung.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Nyman <mathias.nyman@intel.com>
Cc: Pali Rohár <pali.rohar@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.189813118@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The current timer wheel has some drawbacks:
1) Cascading:
Cascading can be an unbound operation and is completely pointless in most
cases because the vast majority of the timer wheel timers are canceled or
rearmed before expiration. (They are used as timeout safeguards, not as
real timers to measure time.)
2) No fast lookup of the next expiring timer:
In NOHZ scenarios the first timer soft interrupt after a long NOHZ period
must fast forward the base time to the current value of jiffies. As we
have no way to find the next expiring timer fast, the code loops linearly
and increments the base time one by one and checks for expired timers
in each step. This causes unbound overhead spikes exactly in the moment
when we should wake up as fast as possible.
After a thorough analysis of real world data gathered on laptops,
workstations, webservers and other machines (thanks Chris!) I came to the
conclusion that the current 'classic' timer wheel implementation can be
modified to address the above issues.
The vast majority of timer wheel timers is canceled or rearmed before
expiry. Most of them are timeouts for networking and other I/O tasks. The
nature of timeouts is to catch the exception from normal operation (TCP ack
timed out, disk does not respond, etc.). For these kinds of timeouts the
accuracy of the timeout is not really a concern. Timeouts are very often
approximate worst-case values and in case the timeout fires, we already
waited for a long time and performance is down the drain already.
The few timers which actually expire can be split into two categories:
1) Short expiry times which expect halfways accurate expiry
2) Long term expiry times are inaccurate today already due to the
batching which is done for NOHZ automatically and also via the
set_timer_slack() API.
So for long term expiry timers we can avoid the cascading property and just
leave them in the less granular outer wheels until expiry or
cancelation. Timers which are armed with a timeout larger than the wheel
capacity are no longer cascaded. We expire them with the longest possible
timeout (6+ days). We have not observed such timeouts in our data collection,
but at least we handle them, applying the rule of the least surprise.
To avoid extending the wheel levels for HZ=1000 so we can accomodate the
longest observed timeouts (5 days in the network conntrack code) we reduce the
first level granularity on HZ=1000 to 4ms, which effectively is the same as
the HZ=250 behaviour. From our data analysis there is nothing which relies on
that 1ms granularity and as a side effect we get better batching and timer
locality for the networking code as well.
Contrary to the classic wheel the granularity of the next wheel is not the
capacity of the first wheel. The granularities of the wheels are in the
currently chosen setting 8 times the granularity of the previous wheel.
So for HZ=250 we end up with the following granularity levels:
Level Offset Granularity Range
0 0 4 ms 0 ms - 252 ms
1 64 32 ms 256 ms - 2044 ms (256ms - ~2s)
2 128 256 ms 2048 ms - 16380 ms (~2s - ~16s)
3 192 2048 ms (~2s) 16384 ms - 131068 ms (~16s - ~2m)
4 256 16384 ms (~16s) 131072 ms - 1048572 ms (~2m - ~17m)
5 320 131072 ms (~2m) 1048576 ms - 8388604 ms (~17m - ~2h)
6 384 1048576 ms (~17m) 8388608 ms - 67108863 ms (~2h - ~18h)
7 448 8388608 ms (~2h) 67108864 ms - 536870911 ms (~18h - ~6d)
That's a worst case inaccuracy of 12.5% for the timers which are queued at the
beginning of a level.
So the new wheel concept addresses the old issues:
1) Cascading is avoided completely
2) By keeping the timers in the bucket until expiry/cancelation we can track
the buckets which have timers enqueued in a bucket bitmap and therefore can
look up the next expiring timer very fast and O(1).
A further benefit of the concept is that the slack calculation which is done
on every timer start is no longer necessary because the granularity levels
provide natural batching already.
Our extensive testing with various loads did not show any performance
degradation vs. the current wheel implementation.
This patch does not address the 'fast lookup' issue as we wanted to make sure
that there is no regression introduced by the wheel redesign. The
optimizations are in follow up patches.
This patch contains fixes from Anna-Maria Gleixner and Richard Cochran.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.108621834@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
We want to store the array index in the flags space. 256k CPUs should be
enough for a while.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.030144293@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Required to figure out whether the entry is the only one in the hlist.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094341.867631372@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
We switched all users to initialize the timers as pinned and call
mod_timer(). Remove the now unused timer API function.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094341.706205231@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|