summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2013-10-12USB: add a private-data pointer to struct usb_ttAlan Stern1-0/+1
For improved scheduling of transfers through a Transaction Translator, ehci-hcd will need to store a bunch of information associated with the FS/LS bus on the downstream side of the TT. This patch adds a pointer for such HCD-private data to the usb_tt structure. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-12USB: NS_TO_US should round upAlan Stern1-3/+2
Host controller drivers use the NS_TO_US macro to convert transaction times, which are computed in nanoseconds, to microseconds for scheduling. Periodic scheduling requires worst-case estimates, but the macro does its conversion using round-to-nearest. This patch changes it to use round-up, giving a correct worst-case value. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-12usb-anchor: Delay usb_wait_anchor_empty_timeout wake up till completion is doneHans de Goede1-0/+3
usb_wait_anchor_empty_timeout() should wait till the completion handler has run. Both the zd1211rw driver and the uas driver (in its task mgmt) depend on the completion handler having completed when usb_wait_anchor_empty_timeout() returns, as they read state set by the completion handler after an usb_wait_anchor_empty_timeout() call. But __usb_hcd_giveback_urb() calls usb_unanchor_urb before calling the completion handler. This is necessary as the completion handler may re-submit and re-anchor the urb. But this introduces a race where the state these drivers want to read has not been set yet by the completion handler (this race is easily triggered with the uas task mgmt code). I've considered adding an anchor_count to struct urb, which would be incremented on anchor and decremented on unanchor, and then only actually do the anchor / unanchor on 0 -> 1 and 1 -> 0 transtions, combined with moving the unanchor call in hcd_giveback_urb to after calling the completion handler. But this will only work if urb's are only re-anchored to the same anchor as they were anchored to before the completion handler ran. And at least one driver re-anchors to another anchor from the completion handler (rtlwifi). So I have come up with this patch instead, which adds the ability to suspend wakeups of usb_wait_anchor_empty_timeout() waiters to the usb_anchor functionality, and uses this in __usb_hcd_giveback_urb() to delay wake-ups until the completion handler has run. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Oliver Neukum <oliver@neukum.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-12usb-anchor: Ensure poisened gets initialized to 0Hans de Goede1-0/+1
And do so in a way which ensures that any fields added in the future will also get properly zero-ed. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Oliver Neukum <oliver@neukum.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-11Merge remote-tracking branch 'spi/topic/s3c64xx' into spi-loopMark Brown1-0/+2
2013-10-11spi: Provide common spi_message processing loopMark Brown1-2/+19
The loops which SPI controller drivers use to process the list of transfers in a spi_message are typically very similar and have some error prone areas such as the handling of /CS. Help simplify drivers by factoring this code out into the core - if drivers provide a transfer_one() function instead of a transfer_one_message() function the core will handle processing at the message level. /CS can be controlled by either setting cs_gpio or providing a set_cs function. If this is not possible for hardware reasons then both can be omitted and the driver should continue to implement manual /CS handling. This is a first step in refactoring and it is expected that there will be further enhancements, for example factoring out of the mapping of transfers for DMA and the initiation and completion of interrupt driven transfers. Signed-off-by: Mark Brown <broonie@linaro.org>
2013-10-11spi: Provide per-message prepare and unprepare operationsMark Brown1-0/+11
Many SPI drivers perform setup and tear down on every message, usually doing things like DMA mapping the message. Provide hooks for them to use to provide such operations. This is of limited value for drivers that implement transfer_one_message() but will be of much greater utility with future factoring out of standard implementations of that function. Signed-off-by: Mark Brown <broonie@linaro.org>
2013-10-11Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds1-7/+0
Pull drm fixes from Dave Airlie: "All over the map.. - nouveau: disable MSI, needs more work, will try again next merge window - radeon: audio + uvd regression fixes, dpm fixes, reset fixes - i915: the dpms fix might fix your haswell And one pain in the ass revert, so we have VGA arbitration that when implemented 4-5 years ago really hoped that GPUs could remove themselves from arbitration completely once they had a kernel driver. It seems Intel hw designers decided that was too nice a facility to allow us to have so they removed it when they went on-die (so since Ironlake at least). Now Alex Williamson added support for VGA arbitration for newer GPUs however this now exposes itself to userspace as requireing arbitration of GPU VGA regions and the X server gets involved and disables things that it can't handle when VGA access is possibly required around every operation. So in order to not break userspace we just reverted things back to the old known broken status so maybe we can try and design out way out. Ville also had a patch to use stop machine for the two times Intel needs to access VGA space, that might be acceptable with some rework, but for now myself and Daniel agreed to just go back" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (23 commits) Revert "i915: Update VGA arbiter support for newer devices" Revert "drm/i915: Delay disabling of VGA memory until vgacon->fbcon handoff is done" drm/radeon: re-enable sw ACR support on pre-DCE4 drm/radeon/dpm: disable bapm on TN asics drm/radeon: improve soft reset on CIK drm/radeon: improve soft reset on SI drm/radeon/dpm: off by one in si_set_mc_special_registers() drm/radeon/dpm/btc: off by one in btc_set_mc_special_registers() drm/radeon: forever loop on error in radeon_do_test_moves() drm/radeon: fix hw contexts for SUMO2 asics drm/radeon: fix typo in CP DMA register headers drm/radeon/dpm: disable multiple UVD states drm/radeon: use hw generated CTS/N values for audio drm/radeon: fix N/CTS clock matching for audio drm/radeon: use 64-bit math to calculate CTS values for audio (v2) drm/edid: catch kmalloc failure in drm_edid_to_speaker_allocation Revert "drm/fb-helper: don't sleep for screen unblank when an oops is in progress" drm/gma500: fix things after get/put page helpers drm/nouveau/mc: disable msi support by default, it's busted in tons of places drm/i915: Only apply DPMS to the encoder if enabled ...
2013-10-11regulator: Add REGULATOR_LINEAR_RANGE macroAxel Lin1-0/+9
Add REGULATOR_LINEAR_RANGE macro and convert regulator drivers to use it. Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Mark Brown <broonie@linaro.org>
2013-10-11regulator: Remove max_uV from struct regulator_linear_rangeAxel Lin1-2/+0
linear ranges means each range has linear voltage settings. So we can calculate max_uV for each linear range in regulator core rather than set the max_uV field in drivers. Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Mark Brown <broonie@linaro.org>
2013-10-11powerpc: Prepare to support kernel handling of IOMMU map/unmapAlexey Kardashevskiy2-1/+17
The current VFIO-on-POWER implementation supports only user mode driven mapping, i.e. QEMU is sending requests to map/unmap pages. However this approach is really slow, so we want to move that to KVM. Since H_PUT_TCE can be extremely performance sensitive (especially with network adapters where each packet needs to be mapped/unmapped) we chose to implement that as a "fast" hypercall directly in "real mode" (processor still in the guest context but MMU off). To be able to do that, we need to provide some facilities to access the struct page count within that real mode environment as things like the sparsemem vmemmap mappings aren't accessible. This adds an API function realmode_pfn_to_page() to get page struct when MMU is off. This adds to MM a new function put_page_unless_one() which drops a page if counter is bigger than 1. It is going to be used when MMU is off (for example, real mode on PPC64) and we want to make sure that page release will not happen in real mode as it may crash the kernel in a horrible way. CONFIG_SPARSEMEM_VMEMMAP and CONFIG_FLATMEM are supported. Cc: linux-mm@kvack.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-11hashtable: add hash_for_each_possible_rcu_notrace()Alexey Kardashevskiy1-0/+15
This adds hash_for_each_possible_rcu_notrace() which is basically a notrace clone of hash_for_each_possible_rcu() which cannot be used in real mode due to its tracing/debugging capability. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-11Merge branch 'core/urgent' into sched/coreIngo Molnar5-1/+54
Merge in asm goto fix, to be able to apply the asm/rmwcc.h fix. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-11compiler/gcc4: Add quirk for 'asm goto' miscompilation bugIngo Molnar1-0/+15
Fengguang Wu, Oleg Nesterov and Peter Zijlstra tracked down a kernel crash to a GCC bug: GCC miscompiles certain 'asm goto' constructs, as outlined here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 Implement a workaround suggested by Jakub Jelinek. Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Suggested-by: Jakub Jelinek <jakub@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-11Revert "i915: Update VGA arbiter support for newer devices"Dave Airlie1-7/+0
This reverts commit 81b5c7bc8de3e6f63419139c2fc91bf81dea8a7d. Adding drm/i915 into the vga arbiter chain means that X (in a piece of well-meant paranoia) will do a get/put on the vga decoding around _every_ accel call down into the ddx. Which results in some nice performance disasters [1]. This really breaks userspace, by disabling DRI for everyone, and stops OpenGL from working, this isn't limited to just the i915 but both the integrated and discrete GPUs on multi-gpu systems, in other words this causes untold worlds of pain, Ville tried to come up with a Great Hack to fiddle the required VGA I/O ops behind everyone's back using stop_machine, but that didn't really work out [2]. Given that we're fairly late in the -rc stage for such games let's just revert this all. One thing we might want to keep is to delay the disabling of the vga decoding until the fbdev emulation and the fbcon screen is set up. If we kill vga mem decoding beforehand fbcon can end up with a white square in the top-left corner it tried to save from the vga memory for a seamless transition. And we have bug reports on older platforms which seem to match these symptoms. But again that's something to play around with in -next. References: [1] http://lists.x.org/archives/xorg-devel/2013-September/037763.html References: [2] http://www.spinics.net/lists/intel-gfx/msg34062.html Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-10-11pinctrl: single: Add support for auxdataTony Lindgren1-0/+12
For omaps, we still have dependencies to the legacy code for handling the PRM (Power Reset Management) interrupts, and also for reconfiguring the io wake-up chain after changes. Let's pass the PRM interrupt and the rearm functions via auxdata. Then when at some point we have a proper PRM driver, we can get the interrupt via device tree and set up the rearm function as exported function in the PRM driver. By using auxdata we can remove a dependency to the wake-up events for converting omap3 to be device tree only. Cc: Peter Ujfalusi <peter.ujfalusi@ti.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Prakash Manjunathappa <prakash.pm@ti.com> Cc: Roger Quadros <rogerq@ti.com> Cc: Haojian Zhuang <haojian.zhuang@gmail.com> Cc: linux-kernel@vger.kernel.org Reviewed-by: Kevin Hilman <khilman@linaro.org> Tested-by: Kevin Hilman <khilman@linaro.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Tony Lindgren <tony@atomide.com>
2013-10-10Merge tag 'random_for_linus' of ↵Linus Torvalds2-0/+15
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random Pull /dev/random changes from Ted Ts'o: "These patches are designed to enable improvements to /dev/random for non-x86 platforms, in particular MIPS and ARM" * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: random: allow architectures to optionally define random_get_entropy() random: run random_int_secret_init() run after all late_initcalls
2013-10-10random: allow architectures to optionally define random_get_entropy()Theodore Ts'o1-0/+14
Allow architectures which have a disabled get_cycles() function to provide a random_get_entropy() function which provides a fine-grained, rapidly changing counter that can be used by the /dev/random driver. For example, an architecture might have a rapidly changing register used to control random TLB cache eviction, or DRAM refresh that doesn't meet the requirements of get_cycles(), but which is good enough for the needs of the random driver. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-10-10Merge branch 'for-john' of ↵John W. Linville1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
2013-10-10usb: phy: Add RCAR Gen2 USB phyValentine Barshak1-0/+22
This adds RCAR Gen2 USB phy support. The driver configures USB channels 0/2 which are shared between PCI USB hosts and USBHS/USBSS devices. It also controls internal USBHS phy. Signed-off-by: Valentine Barshak <valentine.barshak@cogentembedded.com> Signed-off-by: Felipe Balbi <balbi@ti.com>
2013-10-10IB/mlx5: Fix eq names to display nicely in /proc/interruptsSagi Grimberg1-1/+1
It's helpful for a driver to put the pci slot name in its interrupt names, so /proc/interrupts will show the pci slot of the device. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10mlx5: Fix layout of struct mlx5_init_segEli Cohen1-1/+1
The layout of struct health_buffer was not according to firmware specification. Fix it to comply. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10mlx5: Remove checksum on command interface commandsEli Cohen2-4/+2
Checksum calculations consume CPU resources and can be significant to the rate of resource creation/destruction. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10Merge branch 'fortglx/3.13/time' of ↵Ingo Molnar1-2/+0
git://git.linaro.org/people/jstultz/linux into timers/core Pull more timekeeping items for v3.13 from John Stultz: * Small cleanup in the clocksource code. * Fix for rtc-pl031 to let it work with alarmtimers. * Move arm64 to using the generic sched_clock framework & resulting cleanup in the generic sched_clock code. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-10inet: includes a sock_common in request_sockEric Dumazet1-24/+2
TCP listener refactoring, part 5 : We want to be able to insert request sockets (SYN_RECV) into main ehash table instead of the per listener hash table to allow RCU lookups and remove listener lock contention. This patch includes the needed struct sock_common in front of struct request_sock This means there is no more inet6_request_sock IPv6 specific structure. Following inet_request_sock fields were renamed as they became macros to reference fields from struct sock_common. Prefix ir_ was chosen to avoid name collisions. loc_port -> ir_loc_port loc_addr -> ir_loc_addr rmt_addr -> ir_rmt_addr rmt_port -> ir_rmt_port iif -> ir_iif Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-10of: only include prom.h on sparcRob Herring1-0/+2
The dependency on prom.h by the core DT code is now removed and only sparc needs to include prom.h for the core code. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Grant Likely <grant.likely@linaro.org>
2013-10-10of: move of_translate_dma_address to of_address.hRob Herring1-0/+4
of_translate_dma_address is implemented in common code, so move the declaration there too. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Grant Likely <grant.likely@linaro.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org
2013-10-10of: move of_address_to_resource and of_iomap declarations from sparcRob Herring1-13/+17
Move of_address_to_resource and of_iomap declarations to common code. These only differ on sparc, but the declarations are the same and don't need to be in arch header. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Grant Likely <grant.likely@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: sparclinux@vger.kernel.org
2013-10-10of: implement of_node_to_nid as a weak functionRob Herring1-7/+4
Implement of_node_to_nid as weak function to remove the dependency on asm/prom.h. This is in preparation to make prom.h optional. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Grant Likely <grant.likely@linaro.org>
2013-10-10of: implement pci_address_to_pio as weak functionRob Herring1-4/+1
Implement pci_address_to_pio as weak function to remove the dependency on asm/prom.h. This is in preparation to make prom.h optional. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Grant Likely <grant.likely@linaro.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Grant Likely <grant.likely@linaro.org>
2013-10-10of: introduce common FDT machine related functionsRob Herring1-0/+5
Introduce common of_flat_dt_match_machine and of_flat_dt_get_machine_name functions to unify architectures' handling of machine level model and compatible properties. Several architectures match the root compatible string with an arch specific list of machine descriptors duplicating the same search algorithm. Create a common implementation with a simple architecture specific hook to iterate over each machine's match table. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Grant Likely <grant.likely@linaro.org>
2013-10-10sched_clock: Remove sched_clock_func() hookStephen Boyd1-2/+0
Nobody is using sched_clock_func() anymore now that sched_clock supports up to 64 bits. Remove the hook so that new code only uses sched_clock_register(). Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-10-09of: remove early_init_dt_setup_initrd_archRob Herring1-10/+0
All arches do essentially the same thing now for early_init_dt_setup_initrd_arch, so it can now be removed. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Mark Salter <msalter@redhat.com> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Acked-by: Grant Likely <grant.likely@linaro.org>
2013-10-09of: Introduce common early_init_dt_scanRob Herring1-0/+2
Most architectures scan the all the same items early in the FDT and none are really architecture specific. Create a common early_init_dt_scan to unify the early scan of root, memory, and chosen nodes in the flattened DT. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Grant Likely <grant.likely@linaro.org>
2013-10-09of: create unflatten_and_copy_device_treeRob Herring1-0/+2
Several architectures using DT support built-in dtb's in the init section. These platforms need to copy the dtb from init since the strings are referenced after unflattening. Every arch has their own copying routine which do the same thing. Create a common function, unflatten_and_copy_device_tree, to copy the dtb when unflattening the dtb. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Grant Likely <grant.likely@linaro.org>
2013-10-09regmap: Provide asynchronous write and update bits operationsMark Brown1-0/+31
Make it easier for drivers to include single register writes in asynchronous sequences by providing async versions of the write and update bits operations. The update bits operations are only likely to be effective when used with devices that have caches but this is common enough to be useful. Signed-off-by: Mark Brown <broonie@linaro.org>
2013-10-09sched/numa: Skip some page migrations after a shared faultRik van Riel1-1/+4
Shared faults can lead to lots of unnecessary page migrations, slowing down the system, and causing private faults to hit the per-pgdat migration ratelimit. This patch adds sysctl numa_balancing_migrate_deferred, which specifies how many shared page migrations to skip unconditionally, after each page migration that is skipped because it is a shared fault. This reduces the number of page migrations back and forth in shared fault situations. It also gives a strong preference to the tasks that are already running where most of the memory is, and to moving the other tasks to near the memory. Testing this with a much higher scan rate than the default still seems to result in fewer page migrations than before. Memory seems to be somewhat better consolidated than previously, with multi-instance specjbb runs on a 4 node system. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-62-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09mm: numa: Revert temporarily disabling of NUMA migrationRik van Riel1-1/+0
With the scan rate code working (at least for multi-instance specjbb), the large hammer that is "sched: Do not migrate memory immediately after switching node" can be replaced with something smarter. Revert temporarily migration disabling and all traces of numa_migrate_seq. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-61-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09sched/numa: Remove the numa_balancing_scan_period_reset sysctlMel Gorman2-4/+0
With scan rate adaptions based on whether the workload has properly converged or not there should be no need for the scan period reset hammer. Get rid of it. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-60-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09sched/numa: Adjust scan rate in task_numa_placementRik van Riel1-0/+9
Adjust numa_scan_period in task_numa_placement, depending on how much useful work the numa code can do. The more local faults there are in a given scan window the longer the period (and hence the slower the scan rate) during the next window. If there are excessive shared faults then the scan period will decrease with the amount of scaling depending on whether the ratio of shared/private faults. If the preferred node changes then the scan rate is reset to recheck if the task is properly placed. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-59-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09sched/numa: Be more careful about joining numa groupsRik van Riel1-0/+1
Due to the way the pid is truncated, and tasks are moved between CPUs by the scheduler, it is possible for the current task_numa_fault to group together tasks that do not actually share memory together. This patch adds a few easy sanity checks to task_numa_fault, joining tasks together if they share the same tsk->mm, or if the fault was on a page with an elevated mapcount, in a shared VMA. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-57-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09sched/numa: Add debuggingIngo Molnar1-0/+6
Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1381141781-10992-53-git-send-email-mgorman@suse.de
2013-10-09sched/numa: Call task_numa_free() from do_execve()Rik van Riel1-0/+4
It is possible for a task in a numa group to call exec, and have the new (unrelated) executable inherit the numa group association from its former self. This has the potential to break numa grouping, and is trivial to fix. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-51-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09sched/numa: Use group fault statistics in numa placementMel Gorman1-0/+1
This patch uses the fraction of faults on a particular node for both task and group, to figure out the best node to place a task. If the task and group statistics disagree on what the preferred node should be then a full rescan will select the node with the best combined weight. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-50-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09sched/numa: Stay on the same node if CLONE_VMRik van Riel1-1/+1
A newly spawned thread inside a process should stay on the same NUMA node as its parent. This prevents processes from being "torn" across multiple NUMA nodes every time they spawn a new thread. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-49-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09mm: numa: Do not group on RO pagesPeter Zijlstra1-2/+5
And here's a little something to make sure not the whole world ends up in a single group. As while we don't migrate shared executable pages, we do scan/fault on them. And since everybody links to libc, everybody ends up in the same group. Suggested-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1381141781-10992-47-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09sched/numa: Report a NUMA task group IDMel Gorman1-0/+5
It is desirable to model from userspace how the scheduler groups tasks over time. This patch adds an ID to the numa_group and reports it via /proc/PID/status. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-45-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09sched/numa: Use {cpu, pid} to create task groups for shared faultsPeter Zijlstra2-0/+14
While parallel applications tend to align their data on the cache boundary, they tend not to align on the page or THP boundary. Consequently tasks that partition their data can still "false-share" pages presenting a problem for optimal NUMA placement. This patch uses NUMA hinting faults to chain tasks together into numa_groups. As well as storing the NID a task was running on when accessing a page a truncated representation of the faulting PID is stored. If subsequent faults are from different PIDs it is reasonable to assume that those two tasks share a page and are candidates for being grouped together. Note that this patch makes no scheduling decisions based on the grouping information. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1381141781-10992-44-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09mm: numa: Change page last {nid,pid} into {cpu,pid}Peter Zijlstra3-53/+63
Change the per page last fault tracking to use cpu,pid instead of nid,pid. This will allow us to try and lookup the alternate task more easily. Note that even though it is the cpu that is store in the page flags that the mpol_misplaced decision is still based on the node. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1381141781-10992-43-git-send-email-mgorman@suse.de [ Fixed build failure on 32-bit systems. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09sched/numa: Introduce migrate_swap()Peter Zijlstra1-0/+2
Use the new stop_two_cpus() to implement migrate_swap(), a function that flips two tasks between their respective cpus. I'm fairly sure there's a less crude way than employing the stop_two_cpus() method, but everything I tried either got horribly fragile and/or complex. So keep it simple for now. The notable detail is how we 'migrate' tasks that aren't runnable anymore. We'll make it appear like we migrated them before they went to sleep. The sole difference is the previous cpu in the wakeup path, so we override this. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Link: http://lkml.kernel.org/r/1381141781-10992-39-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>