summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-12-18Merge branches 'pm-opp', 'pm-cpufreq' and 'pm-tools'Rafael J. Wysocki5-38/+97
* pm-opp: PM / OPP: do error handling at the bottom of dev_pm_opp_add_dynamic() PM / OPP: handle allocation of device_opp in a separate routine PM / OPP: reuse find_device_opp() instead of duplicating code PM / OPP: Staticize __dev_pm_opp_remove() PM / OPP: replace kfree with kfree_rcu while freeing 'struct device_opp' * pm-cpufreq: MAINTAINERS: add entry for intel_pstate intel_pstate: Add a few comments intel_pstate: add kernel parameter to force loading * pm-tools: Revert "tools: cpupower: fix return checks for sysfs_get_idlestate_count()"
2014-12-18Merge branch 'pm-runtime'Rafael J. Wysocki53-97/+68
* pm-runtime: power / PM: Eliminate CONFIG_PM_RUNTIME NFC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM SCSI / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM tracing / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM x86 / PM: Replace CONFIG_PM_RUNTIME in io_apic.c PM: Remove the SET_PM_RUNTIME_PM_OPS() macro mmc: atmel-mci: use SET_RUNTIME_PM_OPS() macro PM / Kconfig: Replace PM_RUNTIME with PM in dependencies ARM / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM sound / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM phy / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM video / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM tty / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM spi: Replace CONFIG_PM_RUNTIME with CONFIG_PM
2014-12-18Merge branches 'acpi-fan', 'acpi-video' and 'acpi-ec'Rafael J. Wysocki3-3/+17
* acpi-fan: ACPI / Fan: Use bus id as the name for non PNP0C0B (Fan) devices * acpi-video: ACPI / video: update the skip case for acpi_video_device_in_dod() * acpi-ec: ACPI / EC: Fix unexpected ec_remove_handlers() invocations
2014-12-18Merge branches 'acpi-scan', 'acpi-utils' and 'acpi-pm'Rafael J. Wysocki4-15/+14
* acpi-scan: ACPI / scan: Change the level of _DEP-related messages to KERN_DEBUG * acpi-utils: ACPI / utils: Drop error messages from acpi_evaluate_reference() * acpi-pm: ACPI / PM: Do not disable wakeup GPEs that have not been enabled
2014-12-16MAINTAINERS: add entry for intel_pstateKristen Carlson Accardi1-0/+6
Add entry for intel_pstate. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-15ACPI / video: update the skip case for acpi_video_device_in_dod()Aaron Lu1-2/+8
If the firmware has declared more than 8 video output devices, and the one that control the internal panel's backlight is listed after the first 8 output devices, the _DOD will not include it due to the current i915 operation region implementation. As a result, we will not create a backlight device for it while we should. Solve this problem by special case the firmware that has 8+ output devices in that if we see such a firmware, we do not test if the device is in _DOD list. The creation of the backlight device will also enable the firmware to emit events on backlight hotkey press when the acpi_osi= cmdline option is specified on those affected ASUS laptops. Link: https://bugzilla.kernel.org/show_bug.cgi?id=70241 Reported-and-tested-by: Oleksij Rempel <linux@rempel-privat.de> Reported-and-tested-by: Dmitry Tunin <hanipouspilot@gmail.com> Reported-and-tested-by: Jimbo <jaime.91@hotmail.es> Cc: 3.18+ <stable@vger.kernel.org> # 3.18+ Signed-off-by: Aaron Lu <aaron.lu@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-15power / PM: Eliminate CONFIG_PM_RUNTIMERafael J. Wysocki1-4/+0
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks depending on CONFIG_PM_RUNTIME within #ifdef blocks depending on CONFIG_PM may be dropped now. Do that in drivers/power/pm2301_charger.c. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-15NFC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PMRafael J. Wysocki1-1/+1
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks depending on CONFIG_PM_RUNTIME may now be changed to depend on CONFIG_PM. Replace CONFIG_PM_RUNTIME with CONFIG_PM in drivers/nfc/trf7970a.c. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-15SCSI / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PMRafael J. Wysocki5-29/+12
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks depending on CONFIG_PM_RUNTIME may now be changed to depend on CONFIG_PM. Replace CONFIG_PM_RUNTIME with CONFIG_PM everywhere under drivers/scsi/ and in include/scsi/scsi_device.h. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Aaron Lu <aaron.lu@intel.com> Acked-by: Christoph Hellwig <hch@lst.de>
2014-12-15ACPI / EC: Fix unexpected ec_remove_handlers() invocationsLv Zheng1-0/+2
The ec_remove_handlers() is invoked without checking EC_FLAGS_HANDLERS_INSTALLED, this patch enhances this check to avoid issues that acpi_disable_gpe() is invoked unexpectedly to reduce the GPE runtime count. This may happen when the EC handler installation failed on some platforms. Reported-by: Venkat Raghavulu <venkat.raghavulu@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-15Revert "tools: cpupower: fix return checks for sysfs_get_idlestate_count()"Prarit Bhargava1-4/+4
This reverts commit 16b7c275c055cc36218404b5d147be7f76575087. My previous commit 16b7c275c055 ("tools: cpupower: fix return checks for sysfs_get_idlestate_count()") was not correct. After looking at the changelog for cpupower I noticed that Thomas had changed the return of sysfs_get_idlestate_count() to an unsigned int to simplify the code. The problem is really that both he (in his original change) and I (in my new change) missed the obvious that sysfs_get_idlestate_count() can't return -ENODEV. It should just return 0 for "no c-states". Fixes: 16b7c275c055 (tools: cpupower: fix return checks for ...) Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-13tracing / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PMRafael J. Wysocki1-1/+1
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected) PM_RUNTIME is always set if PM is set, so files that are build conditionally if CONFIG_PM_RUNTIME is set may now be build if CONFIG_PM is set. Replace CONFIG_PM_RUNTIME with CONFIG_PM in kernel/trace/Makefile for this reason. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Steven Rostedt <rostedt@goodmis.org.
2014-12-13x86 / PM: Replace CONFIG_PM_RUNTIME in io_apic.cRafael J. Wysocki1-1/+1
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks depending on CONFIG_PM_RUNTIME may now be changed to depend on CONFIG_PM. Replace CONFIG_PM_RUNTIME with CONFIG_PM in x86/kernel/apic/io_apic.c. Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-13PM: Remove the SET_PM_RUNTIME_PM_OPS() macroUlf Hansson1-2/+0
There're now no users left of the SET_PM_RUNTIME_PM_OPS() macro, since all have converted to use the SET_RUNTIME_PM_OPS() macro instead, so let's remove it. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-13mmc: atmel-mci: use SET_RUNTIME_PM_OPS() macroLudovic Desroches1-1/+1
The currently used SET_PM_RUNTIME_PM_OPS() macro is defined to the SET_RUNTIME_PM_OPS() macro. Convert to the later, since that's the proper one to use. Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-13PM / Kconfig: Replace PM_RUNTIME with PM in dependenciesRafael J. Wysocki9-10/+10
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected) PM_RUNTIME is always set if PM is set, so Kconfig options depending on CONFIG_PM_RUNTIME may now be changed to depend on CONFIG_PM. Replace PM_RUNTIME with PM in Kconfig dependencies throughout the tree. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Felipe Balbi <balbi@ti.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Tejun Heo <tj@kernel.org>
2014-12-13ARM / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PMRafael J. Wysocki6-7/+7
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks depending on CONFIG_PM_RUNTIME may now be changed to depend on CONFIG_PM. Replace CONFIG_PM_RUNTIME with CONFIG_PM everywhere in the code under arch/arm/ (the defconfig files will be modified later). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Nishanth Menon <nm@ti.com> Acked-by: Sekhar Nori <nsekhar@ti.com> Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
2014-12-13sound / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PMRafael J. Wysocki12-18/+15
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks depending on CONFIG_PM_RUNTIME may now be changed to depend on CONFIG_PM. Replace CONFIG_PM_RUNTIME with CONFIG_PM everywhere under sound/. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Nicolin Chen <nicoleotsuka@gmail.com> Acked-by: Brian Austin <brian.austin@cirrus.com> Acked-by: Mark Brown <broonie@kernel.org>
2014-12-13phy / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PMRafael J. Wysocki2-2/+2
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks depending on CONFIG_PM_RUNTIME may now be changed to depend on CONFIG_PM. Replace CONFIG_PM_RUNTIME with CONFIG_PM everywhere under drivers/phy/. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
2014-12-13video / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PMRafael J. Wysocki2-3/+3
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks depending on CONFIG_PM_RUNTIME may now be changed to depend on CONFIG_PM. The alternative of CONFIG_PM_SLEEP and CONFIG_PM_RUNTIME may be replaced with CONFIG_PM too. Make these changes in 2 files under drivers/video/. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Acked-by: Jingoo Han <jg1.han@samsung.com>
2014-12-13tty / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PMRafael J. Wysocki5-9/+6
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks depending on CONFIG_PM_RUNTIME may now be changed to depend on CONFIG_PM. Replace CONFIG_PM_RUNTIME with CONFIG_PM everywhere under drivers/tty/. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-13spi: Replace CONFIG_PM_RUNTIME with CONFIG_PMRafael J. Wysocki6-9/+9
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks depending on CONFIG_PM_RUNTIME may now be changed to depend on CONFIG_PM. Replace CONFIG_PM_RUNTIME with CONFIG_PM everywhere under drivers/spi/. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Mark Brown <broonie@kernel.org>
2014-12-13ACPI / PM: Do not disable wakeup GPEs that have not been enabledRafael J. Wysocki2-2/+11
In some cases acpi_device_wakeup() may be called to ensure wakeup power to be off for a given device even though that device's wakeup GPE has not been enabled so far. It calls acpi_disable_gpe() on a GPE that's not enabled and this causes ACPICA to return the AE_LIMIT status code from that call which then is reported as an error by the ACPICA's debug facilities (if enabled). This may lead to a fair amount of confusion, so introduce a new ACPI device wakeup flag to store the wakeup GPE status and avoid disabling wakeup GPEs that have not been enabled. Reported-and-tested-by: Venkat Raghavulu <venkat.raghavulu@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-13ACPI / utils: Drop error messages from acpi_evaluate_reference()Rafael J. Wysocki1-11/+1
Some of the error messages printed by acpi_evaluate_reference() with the KERN_ERR priority should really be debug messages, but then they would be redundant, because acpi_util_eval_error() is called too at the same spots (except for one). Drop the kernel messages from there entirely and leave the acpi_util_eval_error() to handle the debug printing. In one case, replace the kernel message with a call to acpi_util_eval_error(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-13ACPI / scan: Change the level of _DEP-related messages to KERN_DEBUGRafael J. Wysocki1-2/+2
Two _DEP-related failure messages are printed as dev_err() which is unnecessary and annoying. Use dev_dbg() to print them. While at it, one of the messages should actually say it is related to _DEP, so modify it to that effect. Fixes: 40e7fcb19293 (ACPI: Add _DEP support to fix battery issue on Asus T100TA) Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-11Merge branch 'fixes'Rafael J. Wysocki2-3/+3
* fixes: PM / OPP: remove double calls to find_device_opp() PM / OPP: set new_opp->dev_opp to a valid dev_opp leds: leds-gpio: Fix the "default-state" property check
2014-12-11Merge tag 'pm+acpi-3.19-rc1' of ↵Linus Torvalds220-1526/+4877
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: "This time we have some more new material than we used to have during the last couple of development cycles. The most important part of it to me is the introduction of a unified interface for accessing device properties provided by platform firmware. It works with Device Trees and ACPI in a uniform way and drivers using it need not worry about where the properties come from as long as the platform firmware (either DT or ACPI) makes them available. It covers both devices and "bare" device node objects without struct device representation as that turns out to be necessary in some cases. This has been in the works for quite a few months (and development cycles) and has been approved by all of the relevant maintainers. On top of that, some drivers are switched over to the new interface (at25, leds-gpio, gpio_keys_polled) and some additional changes are made to the core GPIO subsystem to allow device drivers to manipulate GPIOs in the "canonical" way on platforms that provide GPIO information in their ACPI tables, but don't assign names to GPIO lines (in which case the driver needs to do that on the basis of what it knows about the device in question). That also has been approved by the GPIO core maintainers and the rfkill driver is now going to use it. Second is support for hardware P-states in the intel_pstate driver. It uses CPUID to detect whether or not the feature is supported by the processor in which case it will be enabled by default. However, it can be disabled entirely from the kernel command line if necessary. Next is support for a platform firmware interface based on ACPI operation regions used by the PMIC (Power Management Integrated Circuit) chips on the Intel Baytrail-T and Baytrail-T-CR platforms. That interface is used for manipulating power resources and for thermal management: sensor temperature reporting, trip point setting and so on. Also the ACPI core is now going to support the _DEP configuration information in a limited way. Basically, _DEP it supposed to reflect off-the-hierarchy dependencies between devices which may be very indirect, like when AML for one device accesses locations in an operation region handled by another device's driver (usually, the device depended on this way is a serial bus or GPIO controller). The support added this time is sufficient to make the ACPI battery driver work on Asus T100A, but it is general enough to be able to cover some other use cases in the future. Finally, we have a new cpufreq driver for the Loongson1B processor. In addition to the above, there are fixes and cleanups all over the place as usual and a traditional ACPICA update to a recent upstream release. As far as the fixes go, the ACPI LPSS (Low-power Subsystem) driver for Intel platforms should be able to handle power management of the DMA engine correctly, the cpufreq-dt driver should interact with the thermal subsystem in a better way and the ACPI backlight driver should handle some more corner cases, among other things. On top of the ACPICA update there are fixes for race conditions in the ACPICA's interrupt handling code which might lead to some random and strange looking failures on some systems. In the cleanups department the most visible part is the series of commits targeted at getting rid of the CONFIG_PM_RUNTIME configuration option. That was triggered by a discussion regarding the generic power domains code during which we realized that trying to support certain combinations of PM config options was painful and not really worth it, because nobody would use them in production anyway. For this reason, we decided to make CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and that lead to the conclusion that the latter became redundant and CONFIG_PM could be used instead of it. The material here makes that replacement in a major part of the tree, but there will be at least one more batch of that in the second part of the merge window. Specifics: - Support for retrieving device properties information from ACPI _DSD device configuration objects and a unified device properties interface for device drivers (and subsystems) on top of that. As stated above, this works with Device Trees and ACPI and allows device drivers to be written in a platform firmware (DT or ACPI) agnostic way. The at25, leds-gpio and gpio_keys_polled drivers are now going to use this new interface and the GPIO subsystem is additionally modified to allow device drivers to assign names to GPIO resources returned by ACPI _CRS objects (in case _DSD is not present or does not provide the expected data). The changes in this set are mostly from Mika Westerberg, Rafael J Wysocki, Aaron Lu, and Darren Hart with some fixes from others (Fabio Estevam, Geert Uytterhoeven). - Support for Hardware Managed Performance States (HWP) as described in Volume 3, section 14.4, of the Intel SDM in the intel_pstate driver. CPUID is used to detect whether or not the feature is supported by the processor. If supported, it will be enabled automatically unless the intel_pstate=no_hwp switch is present in the kernel command line. From Dirk Brandewie. - New Intel Broadwell-H ID for intel_pstate (Dirk Brandewie). - Support for firmware interface based on ACPI operation regions used by the PMIC chips on the Intel Baytrail-T and Baytrail-T-CR platforms for power resource control and thermal management (Aaron Lu). - Limited support for retrieving off-the-hierarchy dependencies between devices from ACPI _DEP device configuration objects and deferred probing support for the ACPI battery driver based on the _DEP information to make that driver work on Asus T100A (Lan Tianyu). - New cpufreq driver for the Loongson1B processor (Kelvin Cheung). - ACPICA update to upstream revision 20141107 which only affects tools (Bob Moore). - Fixes for race conditions in the ACPICA's interrupt handling code and in the ACPI code related to system suspend and resume (Lv Zheng and Rafael J Wysocki). - ACPI core fix for an RCU-related issue in the ioremap() regions management code that slowed down significantly after CPUs had been allowed to enter idle states even if they'd had RCU callbakcs queued and triggered some problems in certain proprietary graphics driver (and elsewhere). The fix replaces synchronize_rcu() in that code with synchronize_rcu_expedited() which makes the issue go away. From Konstantin Khlebnikov. - ACPI LPSS (Low-Power Subsystem) driver fix to handle power management of the DMA engine included into the LPSS correctly. The problem is that the DMA engine doesn't have ACPI PM support of its own and it simply is turned off when the last LPSS device having ACPI PM support goes into D3cold. To work around that, the PM domain used by the ACPI LPSS driver is redesigned so at least one device with ACPI PM support will be on as long as the DMA engine is in use. From Andy Shevchenko. - ACPI backlight driver fix to avoid using it on "Win8-compatible" systems where it doesn't work and where it was used by default by mistake (Aaron Lu). - Assorted minor ACPI core fixes and cleanups from Tomasz Nowicki, Sudeep Holla, Huang Rui, Hanjun Guo, Fabian Frederick, and Ashwin Chaugule (mostly related to the upcoming ARM64 support). - Intel RAPL (Running Average Power Limit) power capping driver fixes and improvements including new processor IDs (Jacob Pan). - Generic power domains modification to power up domains after attaching devices to them to meet the expectations of device drivers and bus types assuming devices to be accessible at probe time (Ulf Hansson). - Preliminary support for controlling device clocks from the generic power domains core code and modifications of the ARM/shmobile platform to use that feature (Ulf Hansson). - Assorted minor fixes and cleanups of the generic power domains core code (Ulf Hansson, Geert Uytterhoeven). - Assorted minor fixes and cleanups of the device clocks control code in the PM core (Geert Uytterhoeven, Grygorii Strashko). - Consolidation of device power management Kconfig options by making CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and removing the latter which is now redundant (Rafael J Wysocki and Kevin Hilman). That is the first batch of the changes needed for this purpose. - Core device runtime power management support code cleanup related to the execution of callbacks (Andrzej Hajda). - cpuidle ARM support improvements (Lorenzo Pieralisi). - cpuidle cleanup related to the CPUIDLE_FLAG_TIME_VALID flag and a new MAINTAINERS entry for ARM Exynos cpuidle (Daniel Lezcano and Bartlomiej Zolnierkiewicz). - New cpufreq driver callback (->ready) to be executed when the cpufreq core is ready to use a given policy object and cpufreq-dt driver modification to use that callback for cooling device registration (Viresh Kumar). - cpufreq core fixes and cleanups (Viresh Kumar, Vince Hsu, James Geboski, Tomeu Vizoso). - Assorted fixes and cleanups in the cpufreq-pcc, intel_pstate, cpufreq-dt, pxa2xx cpufreq drivers (Lenny Szubowicz, Ethan Zhao, Stefan Wahren, Petr Cvek). - OPP (Operating Performance Points) framework modification to allow OPPs to be removed too and update of a few cpufreq drivers (cpufreq-dt, exynos5440, imx6q, cpufreq) to remove OPPs (added during initialization) on driver removal (Viresh Kumar). - Hibernation core fixes and cleanups (Tina Ruchandani and Markus Elfring). - PM Kconfig fix related to CPU power management (Pankaj Dubey). - cpupower tool fix (Prarit Bhargava)" * tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (120 commits) i2c-omap / PM: Drop CONFIG_PM_RUNTIME from i2c-omap.c dmaengine / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM tools: cpupower: fix return checks for sysfs_get_idlestate_count() drivers: sh / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM e1000e / igb / PM: Eliminate CONFIG_PM_RUNTIME MMC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM MFD / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM misc / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM media / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM input / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM leds: leds-gpio: Fix multiple instances registration without 'label' property iio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM hsi / OMAP / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM i2c-hid / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM drm / exynos / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM gpio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM hwrandom / exynos / PM: Use CONFIG_PM in #ifdef block / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM USB / PM: Drop CONFIG_PM_RUNTIME from the USB core PM: Merge the SET*_RUNTIME_PM_OPS() macros ...
2014-12-11Merge tag 'pci-v3.19-changes' of ↵Linus Torvalds26-211/+396
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI changes from Bjorn Helgaas: "Here are the PCI changes intended for v3.19. I don't think there's anything very exciting here, but there was a lot of MSI-related stuff coming via Thomas. Details: NUMA - Allow numa_node override via sysfs (Prarit Bhargava) Resource management - Restore detection of read-only BARs (Myron Stowe) - Shrink decoding-disabled window while sizing BARs (Myron Stowe) - Add informational printk for invalid BARs (Myron Stowe) - Remove fixed parameter in pci_iov_resource_bar() (Myron Stowe) MSI - Add pci_msi_ignore_mask to prevent writes to MSI/MSI-X Mask Bits (Yijing Wang) - Revert "PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()" (Yijing Wang) - s390/MSI: Use __msi_mask_irq() instead of default_msi_mask_irq() (Yijing Wang) Virtualization - xen: Process failure for pcifront_(re)scan_root() (Chen Gang) - Make FLR and AF FLR reset warning messages different (Gavin Shan) Generic host bridge driver - Allocate config space windows after limiting bus number range (Lorenzo Pieralisi) - Convert to DT resource parsing API (Lorenzo Pieralisi) Freescale Layerscape - Add Freescale Layerscape PCIe driver (Minghuan Lian) NVIDIA Tegra - Do not build on 64-bit ARM (Thierry Reding) - Add Kconfig help text (Thierry Reding) Renesas R-Car - Make rcar_pci static (Jingoo Han) Samsung Exynos - Add exynos prefix to add_pcie_port(), pcie_init() (Jingoo Han) ST Microelectronics SPEAr13xx - Add spear prefix to add_pcie_port(), pcie_init() (Jingoo Han) - Make spear13xx_add_pcie_port() __init (Jingoo Han) - Remove unnecessary OOM message (Jingoo Han) TI DRA7xx - Add dra7xx prefix to add_pcie_port() (Jingoo Han) - Make dra7xx_add_pcie_port() __init (Jingoo Han) TI Keystone - Make ks_dw_pcie_msi_domain_ops static (Jingoo Han) - Remove unnecessary OOM message (Jingoo Han) Miscellaneous - Delete unnecessary NULL pointer checks (Markus Elfring) - Remove unused to_hotplug_slot() (Gavin Shan) - Whitespace cleanup (Jingoo Han) - Simplify if-return sequences (Quentin Lambert)" * tag 'pci-v3.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (28 commits) PCI: Remove fixed parameter in pci_iov_resource_bar() PCI: Add informational printk for invalid BARs PCI: tegra: Add Kconfig help text PCI: tegra: Do not build on 64-bit ARM PCI: spear: Remove unnecessary OOM message PCI: mvebu: Add a blank line after declarations PCI: designware: Add a blank line after declarations PCI: exynos: Remove unnecessary return statement PCI: imx6: Use tabs for indentation PCI: keystone: Remove unnecessary OOM message PCI: Remove unused and broken to_hotplug_slot() PCI: Make FLR and AF FLR reset warning messages different PCI: dra7xx: Add __init annotation to dra7xx_add_pcie_port() PCI: spear: Add __init annotation to spear13xx_add_pcie_port() PCI: spear: Rename add_pcie_port(), pcie_init() to spear13xx_add_pcie_port(), etc. PCI: dra7xx: Rename add_pcie_port() to dra7xx_add_pcie_port() PCI: layerscape: Add Freescale Layerscape PCIe driver PCI: Simplify if-return sequences PCI: Delete unnecessary NULL pointer checks PCI: Shrink decoding-disabled window while sizing BARs ...
2014-12-11Merge tag 'ktest-v3.19' of ↵Linus Torvalds1-11/+26
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest Pull ktest changes from Steven Rostedt: "The following ktest updates were done: - Fix handling the make kernelrelease change - Fix make_min_config that was broken by new bisect_config changes - Allow tests to undefine default options (not just being able to override them) - Print name of test (if defined) to start of test output" * tag 'ktest-v3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest: ktest: Add back "tail -1" to kernelrelease make ktest: Add name to running title ktest: Allow tests to undefine default options ktest: Fix make_min_config to handle new assign_configs call ktest: Use make -s kernelrelease
2014-12-11Merge tag 'trace-seq-buf-3.19' of ↵Linus Torvalds12-144/+783
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull nmi-safe seq_buf printk update from Steven Rostedt: "This code is a fork from the trace-3.19 pull as it needed the trace_seq clean ups from that branch. This code solves the issue of performing stack dumps from NMI context. The issue is that printk() is not safe from NMI context as if the NMI were to trigger when a printk() was being performed, the NMI could deadlock from the printk() internal locks. This has been seen in practice. With lots of review from Petr Mladek, this code went through several iterations, and we feel that it is now at a point of quality to be accepted into mainline. Here's what is contained in this patch set: - Creates a "seq_buf" generic buffer utility that allows a descriptor to be passed around where functions can write their own "printk()" formatted strings into it. The generic version was pulled out of the trace_seq() code that was made specifically for tracing. - The seq_buf code was change to model the seq_file code. I have a patch (not included for 3.19) that converts the seq_file.c code over to use seq_buf.c like the trace_seq.c code does. This was done to make sure that seq_buf.c is compatible with seq_file.c. I may try to get that patch in for 3.20. - The seq_buf.c file was moved to lib/ to remove it from being dependent on CONFIG_TRACING. - The printk() was updated to allow for a per_cpu "override" of the internal calls. That is, instead of writing to the console, a call to printk() may do something else. This made it easier to allow the NMI to change what printk() does in order to call dump_stack() without needing to update that code as well. - Finally, the dump_stack from all CPUs via NMI code was converted to use the seq_buf code. The caller to trigger the NMI code would wait till all the NMIs finished, and then it would print the seq_buf data to the console safely from a non NMI context One added bonus is that this code also makes the NMI dump stack work on PREEMPT_RT kernels. As printk() includes sleeping locks on PREEMPT_RT, printk() only writes to console if the console does not use any rt_mutex converted spin locks. Which a lot do" * tag 'trace-seq-buf-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: x86/nmi: Fix use of unallocated cpumask_var_t printk/percpu: Define printk_func when printk is not defined x86/nmi: Perform a safe NMI stack trace on all CPUs printk: Add per_cpu printk func to allow printk to be diverted seq_buf: Move the seq_buf code to lib/ seq-buf: Make seq_buf_bprintf() conditional on CONFIG_BINARY_PRINTF tracing: Add seq_buf_get_buf() and seq_buf_commit() helper functions tracing: Have seq_buf use full buffer seq_buf: Add seq_buf_can_fit() helper function tracing: Add paranoid size check in trace_printk_seq() tracing: Use trace_seq_used() and seq_buf_used() instead of len tracing: Clean up tracing_fill_pipe_page() seq_buf: Create seq_buf_used() to find out how much was written tracing: Add a seq_buf_clear() helper and clear len and readpos in init tracing: Convert seq_buf fields to be like seq_file fields tracing: Convert seq_buf_path() to be like seq_path() tracing: Create seq_buf layer in trace_seq
2014-12-11Merge tag 'ftracetest-3.19' of ↵Linus Torvalds14-13/+484
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace self-test updates from Steven Rostedt: "Updates for the ftrace self tests: - Added kprobes on ftrace testcase - Sort test cases - Add file to hold helper functions - Use logfile name supported by busybox's mktemp - Clear trace buffer after running kprobe test - Fix show descriptions when run on dash shell - Add --verbose option for showing echo output" * tag 'ftracetest-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftracetest: Add --verbose option for showing echo output ftracetest: Fix to show descriptions on dash ftracetest: Add basic event tracing test cases ftracetest: Clear trace buffer after running kprobe testcases ftracetest: Use logfile name supported by busybox's mktemp ftracetest: Add a couple of ftrace test cases ftracetest: Add functions file that holds helper functions ftracetest: Sort testcases ftracetest: Add kprobes on ftrace testcase
2014-12-11Merge tag 'trace-3.19' of ↵Linus Torvalds38-1312/+1680
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "There was a lot of clean ups and minor fixes. One of those clean ups was to the trace_seq code. It also removed the return values to the trace_seq_*() functions and use trace_seq_has_overflowed() to see if the buffer filled up or not. This is similar to work being done to the seq_file code as well in another tree. Some of the other goodies include: - Added some "!" (NOT) logic to the tracing filter. - Fixed the frame pointer logic to the x86_64 mcount trampolines - Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems. That is, the ftrace trampoline can be dynamically allocated and be called directly by functions that only have a single hook to them" * tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (55 commits) tracing: Truncated output is better than nothing tracing: Add additional marks to signal very large time deltas Documentation: describe trace_buf_size parameter more accurately tracing: Allow NOT to filter AND and OR clauses tracing: Add NOT to filtering logic ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter ftrace/x86: Get rid of ftrace_caller_setup ftrace/x86: Have save_mcount_regs macro also save stack frames if needed ftrace/x86: Add macro MCOUNT_REG_SIZE for amount of stack used to save mcount regs ftrace/x86: Simplify save_mcount_regs on getting RIP ftrace/x86: Have save_mcount_regs store RIP in %rdi for first parameter ftrace/x86: Rename MCOUNT_SAVE_FRAME and add more detailed comments ftrace/x86: Move MCOUNT_SAVE_FRAME out of header file ftrace/x86: Have static tracing also use ftrace_caller_setup ftrace/x86: Have static function tracing always test for function graph kprobes: Add IPMODIFY flag to kprobe_ftrace_ops ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict kprobes/ftrace: Recover original IP if pre_handler doesn't change it tracing/trivial: Fix typos and make an int into a bool tracing: Deletion of an unnecessary check before iput() ...
2014-12-11Merge branch 'akpm' (patchbomb from Andrew)Linus Torvalds142-3940/+3655
Merge first patchbomb from Andrew Morton: - a few minor cifs fixes - dma-debug upadtes - ocfs2 - slab - about half of MM - procfs - kernel/exit.c - panic.c tweaks - printk upates - lib/ updates - checkpatch updates - fs/binfmt updates - the drivers/rtc tree - nilfs - kmod fixes - more kernel/exit.c - various other misc tweaks and fixes * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits) exit: pidns: fix/update the comments in zap_pid_ns_processes() exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting exit: exit_notify: re-use "dead" list to autoreap current exit: reparent: call forget_original_parent() under tasklist_lock exit: reparent: avoid find_new_reaper() if no children exit: reparent: introduce find_alive_thread() exit: reparent: introduce find_child_reaper() exit: reparent: document the ->has_child_subreaper checks exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper() exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting exit: proc: don't try to flush /proc/tgid/task/tgid exit: release_task: fix the comment about group leader accounting exit: wait: drop tasklist_lock before psig->c* accounting exit: wait: don't use zombie->real_parent exit: wait: cleanup the ptrace_reparented() checks usermodehelper: kill the kmod_thread_locker logic usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper() fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races ...
2014-12-11exit: pidns: fix/update the comments in zap_pid_ns_processes()Oleg Nesterov1-4/+24
The comments in zap_pid_ns_processes() are not clear, we need to explain how this code actually works. 1. "Ignore SIGCHLD" looks like optimization but it is not, we also need this for correctness. 2. The comment above sys_wait4() could tell more. EXIT_ZOMBIE child is only possible if it has exited before we ignored SIGCHLD. Or if it is traced from the parent namespace, but in this case it will be reaped by debugger after detach, sys_wait4() acts as a synchronization point. 3. The comment about TASK_DEAD (EXIT_DEAD in fact) children is outdated. Contrary to what it says we do not need to make sure they all go away after 0a01f2cc390e "pidns: Make the pidns proc mount/umount logic obvious". At the same time, we do need to wait for nr_hashed==init_pids, but the reasons are quite different and not obvious: setns(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exitingOleg Nesterov1-0/+2
alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it fails because disable_pid_allocation() was called by the exiting child_reaper. We could simply move get_pid_ns() down to successful return, but this fix tries to be as trivial as possible. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Sterling Alexander <stalexan@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11exit: exit_notify: re-use "dead" list to autoreap currentOleg Nesterov1-4/+2
After the previous change we can add just the exiting EXIT_DEAD task to the "dead" list and remove another release_task(tsk). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11exit: reparent: call forget_original_parent() under tasklist_lockOleg Nesterov1-24/+23
Shift "release dead children" loop from forget_original_parent() to its caller, exit_notify(). It is safe to reap them even if our parent reaps us right after we drop tasklist_lock, those children no longer have any connection to the exiting task. And this allows us to avoid write_lock_irq(tasklist_lock) right after it was released by forget_original_parent(), we can simply call it with tasklist_lock held. While at it, move the comment about forget_original_parent() up to this function. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11exit: reparent: avoid find_new_reaper() if no childrenOleg Nesterov1-0/+3
Now that pid_ns logic was isolated we can change forget_original_parent() to return right after find_child_reaper() when father->children is empty, there is nothing to reparent in this case. In particular this avoids find_alive_thread() and this can help if the whole process exits and it has a lot of PF_EXITING threads at the start of the thread list, this can easily lead to O(nr_threads ** 2) iterations. Trivial test case (tested under KVM, 2 CPUs): static void *tfunc(void *arg) { pause(); return NULL; } static int child(unsigned int nt) { pthread_t pt; while (nt--) assert(pthread_create(&pt, NULL, tfunc, NULL) == 0); pthread_kill(pt, SIGTRAP); pause(); return 0; } int main(int argc, const char *argv[]) { int stat; unsigned int nf = atoi(argv[1]); unsigned int nt = atoi(argv[2]); while (nf--) { if (!fork()) return child(nt); wait(&stat); assert(stat == SIGTRAP); } return 0; } $ time ./test 16 16536 shows: real user sys - 5m37.628s 0m4.437s 8m5.560s + 0m50.032s 0m7.130s 1m4.927s Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11exit: reparent: introduce find_alive_thread()Oleg Nesterov1-13/+19
Add the new simple helper to factor out the for_each_thread() code in find_child_reaper() and find_new_reaper(). It can also simplify the potential PF_EXITING -> exit_state change, plus perhaps we can change this code to take SIGNAL_GROUP_EXIT into account. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Lennart Poettering <lennart@poettering.net> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11exit: reparent: introduce find_child_reaper()Oleg Nesterov1-21/+35
find_new_reaper() does 2 completely different things. Not only it finds a reaper, it also updates pid_ns->child_reaper or kills the whole namespace if the caller is ->child_reaper. Now that has_child_subreaper logic doesn't depend on child_reaper check we can move that pid_ns code into a separate helper. IMHO this makes the code more clean, and this allows the next changes. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Lennart Poettering <lennart@poettering.net> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11exit: reparent: document the ->has_child_subreaper checksOleg Nesterov1-8/+6
Swap the "init_task" and same_thread_group() checks. This way it is more simple to document these checks and we can remove the link to the previous discussion on lkml. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Lennart Poettering <lennart@poettering.net> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper()Oleg Nesterov1-5/+3
Change find_new_reaper() to use for_each_thread() instead of deprecated while_each_thread(). We do not bother to check "thread != father" in the 1st loop, we can rely on PF_EXITING check. Note: this means the minor behavioural change: for_each_thread() starts from the group leader. But this should be fine, nobody should make any assumption about do_wait(__WNOTHREAD) when it comes to reparented tasks. And this can avoid the pointless reparenting to a short-living thread While zombie leaders are not that common. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Lennart Poettering <lennart@poettering.net> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparentingOleg Nesterov1-2/+4
find_new_reaper() assumes that "has_child_subreaper" logic is safe as long as we are not the exiting ->child_reaper and this is doubly wrong: 1. In fact it is safe if "pid_ns->child_reaper == father"; there must be no children after zap_pid_ns_processes() returns, so it doesn't matter what we return in this case and even pid_ns->child_reaper is wrong otherwise: we can't reparent to ->child_reaper == current. This is not a bug, but this is confusing. 2. It is not safe if we are not pid_ns->child_reaper but from the same thread group. We drop tasklist_lock before zap_pid_ns_processes(), so another thread can lock it and choose the new reaper from the upper namespace if has_child_subreaper == T, and this is obviously wrong. This is not that bad, zap_pid_ns_processes() won't return until the the new reaper reaps all zombies, but this should be fixed anyway. We could change for_each_thread() loop to use ->exit_state instead of PF_EXITING which we had to use until 8aac62706ada, or we could change copy_signal() to check CLONE_NEWPID before setting has_child_subreaper, but lets change this code so that it is clear we can't look outside of our namespace, otherwise same_thread_group(reaper, child_reaper) check will look wrong and confusing anyway. We can simply start from "father" and fix the problem. We can't wrongly return a thread from the same thread group if ->is_child_subreaper == T, we know that all threads have PF_EXITING set. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Lennart Poettering <lennart@poettering.net> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparentingOleg Nesterov1-1/+1
The ->has_child_subreaper code in find_new_reaper() finds alive "thread" but returns another "reaper" thread which can be dead. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Lennart Poettering <lennart@poettering.net> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11exit: proc: don't try to flush /proc/tgid/task/tgidOleg Nesterov1-0/+3
proc_flush_task_mnt() always tries to flush task/pid, but this is pointless if we reap the leader. d_invalidate() is recursive, and if nothing else the next d_hash_and_lookup(tgid) should fail anyway. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11exit: release_task: fix the comment about group leader accountingOleg Nesterov1-7/+4
Contrary to what the comment in __exit_signal() says we do account the group leader. Fix this and explain why. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11exit: wait: drop tasklist_lock before psig->c* accountingOleg Nesterov1-7/+5
wait_task_zombie() no longer needs tasklist_lock to accumulate the psig->c* counters, we can drop it right after cmpxchg(exit_state). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11exit: wait: don't use zombie->real_parentOleg Nesterov1-12/+11
1. wait_task_zombie() uses p->real_parent to get psig/siglock. This is correct but needs tasklist_lock, ->real_parent can exit. We can use "current" instead. This is our natural child, its parent must be our sub-thread. 2. Read psig/sig outside of ->siglock, ->signal is no longer protected by this lock. 3. Fix the outdated comments about tasklist_lock. We can not race with __exit_signal(), the whole thread group is dead, nobody but us can call it. Also clarify the usage of ->stats_lock and ->siglock. Note: thread_group_cputime_adjusted() is sub-optimal in this case, we probably want to export cputime_adjust() to avoid thread_group_cputime(). The comment says "all threads" but there are no other threads. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11exit: wait: cleanup the ptrace_reparented() checksOleg Nesterov1-8/+6
Now that EXIT_DEAD is the terminal state we can kill "int traced" variable and check "state == EXIT_DEAD" instead to cleanup the code. In particular, this way it is clear that the check obviously doesn't need tasklist_lock. Also fix the type of "unsigned long state", "long" was always wrong although this doesn't matter because cmpxchg/xchg uses typeof(*ptr). [akpm@linux-foundation.org: don't make me google the C Operator Precedence table] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11usermodehelper: kill the kmod_thread_locker logicOleg Nesterov1-30/+3
Now that we do not call kernel_thread(CLONE_VFORK) from the worker thread we can not deadlock if do_execve() in turn triggers another call_usermodehelper(), we can remove the kmod_thread_locker code. Note: we should probably kill khelper_wq and simply use one of the global workqueues, say, system_unbound_wq, this special wq for umh buys nothing nowadays. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>