Age | Commit message (Collapse) | Author | Files | Lines |
|
The recent rework of the TSC calibration code introduced a regression on UV
systems as it added a call to tsc_early_init() which initializes the TSC
ADJUST values before acpi_boot_table_init(). In the case of UV systems,
that is a necessary step that calls uv_system_init(). This informs
tsc_sanitize_first_cpu() that the kernel runs on a platform with async TSC
resets as documented in commit 341102c3ef29 ("x86/tsc: Add option that TSC
on Socket 0 being non-zero is valid")
Fix it by skipping the early tsc initialization on UV systems and let TSC
init tests take place later in tsc_init().
Fixes: cf7a63ef4e02 ("x86/tsc: Calibrate tsc only once")
Suggested-by: Hedi Berriche <hedi.berriche@hpe.com>
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Russ Anderson <rja@hpe.com>
Reviewed-by: Dimitri Sivanich <sivanich@hpe.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Xiaoming Gao <gxm.linux.kernel@gmail.com>
Cc: Rajvi Jingar <rajvi.jingar@intel.com>
Link: https://lkml.kernel.org/r/20181002180144.923579706@stormcage.americas.sgi.com
|
|
Introduce is_early_uv_system() which uses efi.uv_systab to decide early
in the boot process whether the kernel runs on a UV system.
This is needed to skip other early setup/init code that might break
the UV platform if done too early such as before necessary ACPI tables
parsing takes place.
Suggested-by: Hedi Berriche <hedi.berriche@hpe.com>
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Russ Anderson <rja@hpe.com>
Reviewed-by: Dimitri Sivanich <sivanich@hpe.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Xiaoming Gao <gxm.linux.kernel@gmail.com>
Cc: Rajvi Jingar <rajvi.jingar@intel.com>
Link: https://lkml.kernel.org/r/20181002180144.801700401@stormcage.americas.sgi.com
|
|
When FW floods the driver with control messages try to exit the cmsg
processing loop every now and then to avoid soft lockups. Cmsg
processing is generally very lightweight so 512 seems like a reasonable
budget, which should not be exceeded under normal conditions.
Fixes: 77ece8d5f196 ("nfp: add control vNIC datapath")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Tested-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for 4.20
First set of new features for 4.20. mt76 driver is going through major
refactoring and that's why there are so many mt76 patches. iwlwifi is
also under heavy development and smaller changes to other drivers.
Also wireless-drivers was merged to fix a conflict between the two trees.
Major changes:
ath10k
* limit available channels via DT ieee80211-freq-limit
wil6210
* add 802.11r Fast Roaming support for AP and station modes
* add support for channel 4
iwlwifi
* new FW API handling
* some improvements in the PCI recovery mechanism
* enable a new scanning feature;
* continued work on HE (mostly radiotap)
* TKIP implementation in new devices
* work continues for new 22560 hardware
mt76
* add support for Alfa AWUS036ACM
* lots of refactoring to make it easier to add new hardware support
* prepare for adding mt76x0e (pci-e variant) support
* add CONFIG_MT76x0E kconfig symbol
brcmfmac
* add support CYW89342 mini-PCIe device
* add 4-way handshake offload detection for FT-802.1X
* enable NL80211_EXT_FEATURE_CQM_RSSI_LIST
* fix for proper support of 160MHz bandwidth
rtl8xxxu
* add rtl8188ctv support
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2018-10-02
This series contains updates to ice driver only.
Anirudh expands the use of VSI handles across the rest of the driver,
which includes refactoring the code to correctly use VSI handles. After
a reset, ensure that all configurations for a VSI get re-applied before
moving on to rebuilding the next VSI.
Dave fixed the driver to check the current link state after reset to
ensure that the correct link state of a port is reported. Fixed an
issue where if the driver is unloaded when traffic is in progress,
errors are generated.
Preethi breaks up the IRQ tracker into a software and hardware IRQ
tracker, where the software IRQ tracker tracks only the PF's IRQ
requests and does not play any role in the VF initialization. The
hardware IRQ tracker represents the device's interrupt space and will be
looked up to see if the device has run our of interrupts when a
interrupt has to be allocated in the device for either PF or VF.
Md Fahad adds support for enabling/disabling RSS via ethtool.
Brett aligns the ice_reset_req enum values to the values that the
hardware understands. Also added initial support for dynamic interrupt
moderation in the ice driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix a commit 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing
continuation lines") regression with the `declance' driver, which caused
the adapter identification message to be split between two lines, e.g.:
declance.c: v0.011 by Linux MIPS DECstation task force
tc6: PMAD-AA
, addr = 08:00:2b:1b:2a:6a, irq = 14
tc6: registered as eth0.
Address that properly, by printing identification with a single call,
making the messages now look like:
declance.c: v0.011 by Linux MIPS DECstation task force
tc6: PMAD-AA, addr = 08:00:2b:1b:2a:6a, irq = 14
tc6: registered as eth0.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Fixes: 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sudarsana Reddy Kalluru says:
====================
qed*: Driver support for 20G link speed.
The patch series adds driver support for configuring/reading the 20G link
speed.
Please consider applying this to "net-next".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add driver support for reading/configuring the 20G link speed via ethtool.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add driver support for configuring/reading the 20G link speed.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During certain heavy network loads TX could time out
with TX ring dump.
TX is sometimes never restarted after reaching
"tx_stop_threshold" because function "fec_enet_tx_queue"
only tests the first queue.
In addition the TX timeout callback function failed to
recover because it also operated only on the first queue.
Signed-off-by: Rickard x Andersson <rickaran@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace "fallthru" with a proper "fall through" annotation.
This fix is part of the ongoing efforts to enabling
-Wimplicit-fallthrough
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This helper is unused since commit 988cf74deb45 ("inet:
Stop generating UFO packets.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This list is a leftover from fakelb driver which had always a full mesh
topology. Idea was to remember all phy's which are currently used by the
subsystem and deliver everything out. The hwsim driver works differently
each phy has a list of other phy's to deliver frames which allows a
own mesh topology.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
If the driver is unloaded when traffic is in progress, errors are
generated. Fix this by releasing qvectors and NAPI handler on remove.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Currently there is no support for dynamic interrupt moderation. This
patch adds some initial code to support this. The following changes
were made:
1. Currently we are using multiple members to store the interrupt
granularity (itr_gran_25/50/100/200). This is not necessary because
we can query the device to determine what the interrupt granularity
should be set to, done by a new function ice_get_itr_intrl_gran.
2. Added intrl to ice_q_vector structure to support interrupt rate
limiting.
3. Added the function ice_intrl_usecs_to_reg for converting to a value
in usecs that the device understands.
4. Added call to write to the GLINT_RATE register. Disable intrl by
default for now.
5. Changed rx/tx_itr_setting to itr_setting because having both seems
redundant because a ring is either Tx or Rx.
6. Initialize itr_setting for both Tx/Rx rings in ice_vsi_alloc_rings()
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Currently the ice_reset_req enum values have to be translated into
a different set of values that the hardware understands for the same
reset types. Avoid this translation by aligning ice_reset_req enum
values to the ones that the hardware understands.
Also add and else if block to check for ICE_RESET_EMPR and put a dev_dbg
message in the else case.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This patch implements ethtool hook for enabling/disabling
RSS. While disabling RSS, the LUT should be cleared. And
the LUT should be reconfigured while enabling RSS.
Signed-off-by: Md Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
For the PF driver, when mapping interrupts to queues, we need to request
IRQs from the kernel and we also have to allocate interrupts from
the device.
Similarly, when the VF driver (iavf.ko) initializes, it requests the kernel
IRQs that it needs but it can't directly allocate interrupts in the device.
Instead, it sends a mailbox message to the ice driver, which then allocates
interrupts in the device on the VF driver's behalf.
Currently both these cases end up having to reserve entries in
pf->irq_tracker but irq_tracker itself is sized based on how many vectors
the PF driver needs. Under the right circumstances, the VF driver can fail
to get entries in irq_tracker, which will result in the VF driver failing
probe.
To fix this, sw_irq_tracker and hw_irq_tracker are introduced. The
sw_irq_tracker tracks only the PF's IRQ request and doesn't play any
role in VF init. hw_irq_tracker represents the device's interrupt space.
When interrupts have to be allocated in the device for either PF or VF,
hw_irq_tracker will be looked up to see if the device has run out of
interrupts.
Signed-off-by: Preethi Banala <preethi.banala@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
We are currently replaying the link state of a port after a reset, but
it is possible that the link state of a port can change during the reset
process. So check for the current link state of a port during the rebuild
process of a reset.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Currently, switch filters get replayed after reset. In addition to
filters, other VSI attributes (like RSS configuration, Tx scheduler
configuration, etc.) also need to be replayed after reset.
Thus, instead of replaying based on functional blocks (i.e. replay
all filters for all VSIs, followed by RSS configuration replay for
all VSIs, and so on), it makes more sense to have the replay centered
around a VSI. In other words, replay all configurations for a VSI before
moving on to rebuilding the next VSI.
To that effect, this patch introduces a VSI replay framework in a new
function ice_vsi_replay_all. Currently it only replays switch filters,
but it will be expanded in the future to replay additional VSI attributes.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This patch is a continuation of the previous patch where VSI
handles are used instead of VSI numbers.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
A VSI handle is just a number the driver maintains to uniquely identify
a VSI. A VSI handle is backed by a VSI number in the hardware. When
interacting when the hardware, VSI handles are converted into VSI numbers.
In commit 0f9d5027a749 ("ice: Refactor VSI allocation, deletion and
rebuild flow"), VSI handles were introduced but it was used only
when creating and deleting VSIs. This patch is part one of two patches
that expands the use of VSI handles across the rest of the driver. Also
in this patch, certain parts of the code had to be refactored to correctly
use VSI handles.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Explicitly forbid creating cgroup local storage maps with zero value
size, as it makes no sense and might even cause a panic.
Reported-by: syzbot+18628320d3b14a5c459c@syzkaller.appspotmail.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Bartlomiej writes:
"fbdev fixes for v4.19-rc7:
- fix OMAPFB_MEMORY_READ ioctl to not leak kernel memory in omapfb driver
(Tomi Valkeinen)
- add missing prepare/unprepare clock operations in pxa168fb driver
(Lubomir Rintel)
- add nobgrt option in efifb driver to disable ACPI BGRT logo restore
(Hans de Goede)
- fix spelling mistake in fall-through annotation in stifb driver
(Gustavo A. R. Silva)
- fix URL for uvesafb repository in the documentation (Adam Jackson)"
* tag 'fbdev-v4.19-rc7' of https://github.com/bzolnier/linux:
video/fbdev/stifb: Fix spelling mistake in fall-through annotation
uvesafb: Fix URLs in the documentation
efifb: BGRT: Add nobgrt option
fbdev/omapfb: fix omapfb_memory_read infoleak
pxa168fb: prepare the clock
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Ulf writes:
"MMC core:
- Fixup conversion of debounce time to/from ms/us
MMC host:
- sdhi: Fixup whitelisting for Gen3 types"
* tag 'mmc-v4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: slot-gpio: Fix debounce time to use miliseconds again
mmc: core: Fix debounce time to use microseconds
mmc: sdhi: sys_dmac: check for all Gen3 types when whitelisting
|
|
Sergey Suloev reported a crash happening in drm_client_dev_hotplug()
when fbdev had failed to register.
[ 9.124598] vc4_hdmi 3f902000.hdmi: ASoC: Failed to create component debugfs directory
[ 9.147667] vc4_hdmi 3f902000.hdmi: vc4-hdmi-hifi <-> 3f902000.hdmi mapping ok
[ 9.155184] vc4_hdmi 3f902000.hdmi: ASoC: no DMI vendor name!
[ 9.166544] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
[ 9.173840] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops [vc4])
[ 9.181029] vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops [vc4])
[ 9.188519] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4])
[ 9.195690] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4])
[ 9.203523] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4])
[ 9.215032] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4])
[ 9.274785] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops [vc4])
[ 9.290246] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
[ 9.297464] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 9.304600] [drm] Driver supports precise vblank timestamp query.
[ 9.382856] vc4-drm soc:gpu: [drm:drm_fb_helper_fbdev_setup [drm_kms_helper]] *ERROR* Failed to set fbdev configuration
[ 10.404937] Unable to handle kernel paging request at virtual address 00330a656369768a
[ 10.441620] [00330a656369768a] address between user and kernel address ranges
[ 10.449087] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 10.454762] Modules linked in: brcmfmac vc4 drm_kms_helper cfg80211 drm rfkill smsc95xx brcmutil usbnet drm_panel_orientation_quirks raspberrypi_hwmon bcm2835_dma crc32_ce pwm_bcm2835 bcm2835_rng virt_dma rng_core i2c_bcm2835 ip_tables x_tables ipv6
[ 10.477296] CPU: 2 PID: 45 Comm: kworker/2:1 Not tainted 4.19.0-rc5 #3
[ 10.483934] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
[ 10.489966] Workqueue: events output_poll_execute [drm_kms_helper]
[ 10.596515] Process kworker/2:1 (pid: 45, stack limit = 0x000000007e8924dc)
[ 10.603590] Call trace:
[ 10.606259] drm_client_dev_hotplug+0x5c/0xb0 [drm]
[ 10.611303] drm_kms_helper_hotplug_event+0x30/0x40 [drm_kms_helper]
[ 10.617849] output_poll_execute+0xc4/0x1e0 [drm_kms_helper]
[ 10.623616] process_one_work+0x1c8/0x318
[ 10.627695] worker_thread+0x48/0x428
[ 10.631420] kthread+0xf8/0x128
[ 10.634615] ret_from_fork+0x10/0x18
[ 10.638255] Code: 54000220 f9401261 aa1303e0 b4000141 (f9400c21)
[ 10.644456] ---[ end trace c75b4a4b0e141908 ]---
The reason for this is that drm_fbdev_cma_init() removes the drm_client
when fbdev registration fails, but it doesn't remove the client from the
drm_device client list. So the client list now has a pointer that points
into the unknown and we have a 'use after free' situation.
Split drm_client_new() into drm_client_init() and drm_client_add() to fix
removal in the error path.
Fixes: 894a677f4b3e ("drm/cma-helper: Use the generic fbdev emulation")
Reported-by: Sergey Suloev <ssuloev@orpaltech.com>
Cc: Stefan Wahren <stefan.wahren@i2se.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20181001194536.57756-1-noralf@tronnes.org
|
|
Automatic NUMA Balancing uses a multi-stage pass to decide whether a page
should migrate to a local node. This filter avoids excessive ping-ponging
if a page is shared or used by threads that migrate cross-node frequently.
Threads inherit both page tables and the preferred node ID from the
parent. This means that threads can trigger hinting faults earlier than
a new task which delays scanning for a number of seconds. As it can be
load balanced very early in its lifetime there can be an unnecessary delay
before it starts migrating thread-local data. This patch migrates private
pages faster early in the lifetime of a thread using the sequence counter
as an identifier of new tasks.
With this patch applied, STREAM performance is the same as 4.17 even though
processes are not spread cross-node prematurely. Other workloads showed
a mix of minor gains and losses. This is somewhat expected most workloads
are not very sensitive to the starting conditions of a process.
4.19.0-rc5 4.19.0-rc5 4.17.0
numab-v1r1 fastmigrate-v1r1 vanilla
MB/sec copy 43298.52 ( 0.00%) 47335.46 ( 9.32%) 47219.24 ( 9.06%)
MB/sec scale 30115.06 ( 0.00%) 32568.12 ( 8.15%) 32527.56 ( 8.01%)
MB/sec add 32825.12 ( 0.00%) 36078.94 ( 9.91%) 35928.02 ( 9.45%)
MB/sec triad 32549.52 ( 0.00%) 35935.94 ( 10.40%) 35969.88 ( 10.51%)
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Rik van Riel <riel@surriel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jirka Hladky <jhladky@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux-MM <linux-mm@kvack.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181001100525.29789-3-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Rate limiting of page migrations due to automatic NUMA balancing was
introduced to mitigate the worst-case scenario of migrating at high
frequency due to false sharing or slowly ping-ponging between nodes.
Since then, a lot of effort was spent on correctly identifying these
pages and avoiding unnecessary migrations and the safety net may no longer
be required.
Jirka Hladky reported a regression in 4.17 due to a scheduler patch that
avoids spreading STREAM tasks wide prematurely. However, once the task
was properly placed, it delayed migrating the memory due to rate limiting.
Increasing the limit fixed the problem for him.
Currently, the limit is hard-coded and does not account for the real
capabilities of the hardware. Even if an estimate was attempted, it would
not properly account for the number of memory controllers and it could
not account for the amount of bandwidth used for normal accesses. Rather
than fudging, this patch simply eliminates the rate limiting.
However, Jirka reports that a STREAM configuration using multiple
processes achieved similar performance to 4.16. In local tests, this patch
improved performance of STREAM relative to the baseline but it is somewhat
machine-dependent. Most workloads show little or not performance difference
implying that there is not a heavily reliance on the throttling mechanism
and it is safe to remove.
STREAM on 2-socket machine
4.19.0-rc5 4.19.0-rc5
numab-v1r1 noratelimit-v1r1
MB/sec copy 43298.52 ( 0.00%) 44673.38 ( 3.18%)
MB/sec scale 30115.06 ( 0.00%) 31293.06 ( 3.91%)
MB/sec add 32825.12 ( 0.00%) 34883.62 ( 6.27%)
MB/sec triad 32549.52 ( 0.00%) 34906.60 ( 7.24%
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Rik van Riel <riel@surriel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jirka Hladky <jhladky@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux-MM <linux-mm@kvack.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181001100525.29789-2-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Since 890658b7ab48 ("locking/mutex: Kill arch specific code"), there
are no mutex header files under arch/, so we can remove the redundant
entry from MAINTAINERS.
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Jason Low <jason.low2@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181001142856.GC9716@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
fd_install() moves the reference given to it into the file descriptor table
of the current process. If the current process is multithreaded, then
immediately after fd_install(), another thread can close() the file
descriptor and cause the file's resources to be cleaned up.
Since the reference to "lessee" is held by the file, we must not access
"lessee" after the fd_install() call.
As far as I can tell, to reach this codepath, the caller must have an open
file descriptor to a DRI device in master mode. I'm not sure what the
requirements for that are.
Signed-off-by: Jann Horn <jannh@google.com>
Fixes: 62884cd386b8 ("drm: Add four ioctls for managing drm mode object leases [v7]")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20181001153117.216923-1-jannh@google.com
|
|
There were supposed to be two blocks - one for each direction
cfg80211 <-> driver, clean up the code to restore that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
There isn't really any need for us to be sending this from
two different places - move cfg80211_init_wdev() later and
send the notification from there, removing it from the non-
netdev case.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
We currently have two places that do similar things, depending
on whether it's a wdev with or without netdev.
Combine the code to avoid having to duplicate all new additions.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add some error messages to nl80211_parse_chandef() to make
failures here - especially with disabled channels - easier
to diagnose.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
There's no reason for drivers to be able to access the
cfg80211 internal cookie counter; move it out of the
wiphy into the rdev structure.
While at it, also make it never assign 0 as a cookie
(we consider that invalid in some places), and warn if
we manage to do that for some reason (wrapping is not
likely to happen with a u64.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Since my change to split out the regulatory init to occur later,
any issues during earlier cfg80211_init() or errors during the
platform device allocation would lead to crashes later. Make this
more robust by checking that the earlier initialization succeeded.
Fixes: d7be102f2945 ("cfg80211: initialize regulatory keys/database later")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The variable j will be initialized at trailing step.
Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
nl80211_prepare_wdev_dump() is using the output skb to look up
the network namespace, but this isn't really necessary, it can
just as well use the input skb which is available as cb->skb,
the sk is the same anyway.
Therefore, remove the redundant argument.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Allow userspace to enable fine timing measurement responder
functionality with configurable lci/civic parameters in AP mode.
This can be done at AP start or changing beacon parameters.
A new EXT_FEATURE flag is introduced for drivers to advertise
the capability.
Also nl80211 API support for retrieving statistics is added.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org>
[remove unused cfg80211_ftm_responder_params, clarify docs,
move validation into policy]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
A simple cleanup, reuse the event definition that we already have.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Use the NL80211_KEY_IDX attribute inside the NL80211_ATTR_KEY in
NL80211_CMD_GET_KEY responses to comply with nl80211_key_policy.
This is unlikely to affect existing userspace.
Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Fix to return a negative error code -ENOMEM from the kmemdup
error handling case instead of 0.
Fixes: 09b4a4faf9d0 ("mac80211: introduce capability flags for VHT EXT NSS support")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
There's a bit of duplicated code to initialize a wdev, pull it out
into a separate function to call from both places.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Drivers that do not have the BUFF_MMPDU_TXQ flag set will not have a
TXQ for the special TID = 16.
In this case, the last member in the *struct ieee80211_sta* txq array
will be NULL.
We must check this in order not to get a NULL pointer dereference when
iterating the txq array.
Signed-off-by: Erik Stromdahl <erik.stromdahl@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The check for !scan_plan is redunant as this has been checked
in the proceeding statement and the code returns -ENOBUFS if
it is true. Remove the redundant check.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The iterator in list_for_each_entry_safe is never null, therefore, remove
the redundant null pointer check.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
If NUMA improvement from the task migration is going to be very
minimal, then avoid task migration.
Specjbb2005 results (8 warehouses)
Higher bops are better
2 Socket - 2 Node Haswell - X86
JVMS Prev Current %Change
4 198512 205910 3.72673
1 313559 318491 1.57291
2 Socket - 4 Node Power8 - PowerNV
JVMS Prev Current %Change
8 74761.9 74935.9 0.232739
1 214874 226796 5.54837
2 Socket - 2 Node Power9 - PowerNV
JVMS Prev Current %Change
4 180536 189780 5.12031
1 210281 205695 -2.18089
4 Socket - 4 Node Power7 - PowerVM
JVMS Prev Current %Change
8 56511.4 60370 6.828
1 104899 108100 3.05151
1/7 cases is regressing, if we look at events migrate_pages seem
to vary the most especially in the regressing case. Also some
amount of variance is expected between different runs of
Specjbb2005.
Some events stats before and after applying the patch.
perf stats 8th warehouse Multi JVM 2 Socket - 2 Node Haswell - X86
Event Before After
cs 13,818,546 13,801,554
migrations 1,149,960 1,151,541
faults 385,583 433,246
cache-misses 55,259,546,768 55,168,691,835
sched:sched_move_numa 2,257 2,551
sched:sched_stick_numa 9 24
sched:sched_swap_numa 512 904
migrate:mm_migrate_pages 2,225 1,571
vmstat 8th warehouse Multi JVM 2 Socket - 2 Node Haswell - X86
Event Before After
numa_hint_faults 72692 113682
numa_hint_faults_local 62270 102163
numa_hit 238762 240181
numa_huge_pte_updates 48 36
numa_interleave 75 64
numa_local 238676 240103
numa_other 86 78
numa_pages_migrated 2225 1564
numa_pte_updates 98557 134080
perf stats 8th warehouse Single JVM 2 Socket - 2 Node Haswell - X86
Event Before After
cs 3,173,490 3,079,150
migrations 36,966 31,455
faults 108,776 99,081
cache-misses 12,200,075,320 11,588,126,740
sched:sched_move_numa 1,264 1
sched:sched_stick_numa 0 0
sched:sched_swap_numa 0 0
migrate:mm_migrate_pages 899 36
vmstat 8th warehouse Single JVM 2 Socket - 2 Node Haswell - X86
Event Before After
numa_hint_faults 21109 430
numa_hint_faults_local 17120 77
numa_hit 72934 71277
numa_huge_pte_updates 42 0
numa_interleave 33 22
numa_local 72866 71218
numa_other 68 59
numa_pages_migrated 915 23
numa_pte_updates 42326 0
perf stats 8th warehouse Multi JVM 2 Socket - 2 Node Power9 - PowerNV
Event Before After
cs 8,312,022 8,707,565
migrations 231,705 171,342
faults 310,242 310,820
cache-misses 402,324,573 136,115,400
sched:sched_move_numa 193 215
sched:sched_stick_numa 0 6
sched:sched_swap_numa 3 24
migrate:mm_migrate_pages 93 162
vmstat 8th warehouse Multi JVM 2 Socket - 2 Node Power9 - PowerNV
Event Before After
numa_hint_faults 11838 8985
numa_hint_faults_local 11216 8154
numa_hit 90689 93819
numa_huge_pte_updates 0 0
numa_interleave 1579 882
numa_local 89634 93496
numa_other 1055 323
numa_pages_migrated 92 169
numa_pte_updates 12109 9217
perf stats 8th warehouse Single JVM 2 Socket - 2 Node Power9 - PowerNV
Event Before After
cs 2,170,481 2,152,072
migrations 10,126 10,704
faults 160,962 164,376
cache-misses 10,834,845 3,818,437
sched:sched_move_numa 10 16
sched:sched_stick_numa 0 0
sched:sched_swap_numa 0 7
migrate:mm_migrate_pages 2 199
vmstat 8th warehouse Single JVM 2 Socket - 2 Node Power9 - PowerNV
Event Before After
numa_hint_faults 403 2248
numa_hint_faults_local 358 1666
numa_hit 25898 25704
numa_huge_pte_updates 0 0
numa_interleave 207 200
numa_local 25860 25679
numa_other 38 25
numa_pages_migrated 2 197
numa_pte_updates 400 2234
perf stats 8th warehouse Multi JVM 4 Socket - 4 Node Power7 - PowerVM
Event Before After
cs 110,339,633 93,330,595
migrations 4,139,812 4,122,061
faults 863,622 865,979
cache-misses 231,838,045,660 225,395,083,479
sched:sched_move_numa 2,196 2,372
sched:sched_stick_numa 33 24
sched:sched_swap_numa 544 769
migrate:mm_migrate_pages 2,469 1,677
vmstat 8th warehouse Multi JVM 4 Socket - 4 Node Power7 - PowerVM
Event Before After
numa_hint_faults 85748 91638
numa_hint_faults_local 66831 78096
numa_hit 242213 242225
numa_huge_pte_updates 0 0
numa_interleave 0 2
numa_local 242211 242219
numa_other 2 6
numa_pages_migrated 2376 1515
numa_pte_updates 86233 92274
perf stats 8th warehouse Single JVM 4 Socket - 4 Node Power7 - PowerVM
Event Before After
cs 59,331,057 51,487,271
migrations 552,019 537,170
faults 266,586 256,921
cache-misses 73,796,312,990 70,073,831,187
sched:sched_move_numa 981 576
sched:sched_stick_numa 54 24
sched:sched_swap_numa 286 327
migrate:mm_migrate_pages 713 726
vmstat 8th warehouse Single JVM 4 Socket - 4 Node Power7 - PowerVM
Event Before After
numa_hint_faults 14807 12000
numa_hint_faults_local 5738 5024
numa_hit 36230 36470
numa_huge_pte_updates 0 0
numa_interleave 0 0
numa_local 36228 36465
numa_other 2 5
numa_pages_migrated 703 726
numa_pte_updates 14742 11930
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jirka Hladky <jhladky@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1537552141-27815-7-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Since this spinlock will only serialize the migrate rate limiting,
convert the spin_lock() to a spin_trylock(). If another thread is updating, this
task can move on.
Specjbb2005 results (8 warehouses)
Higher bops are better
2 Socket - 2 Node Haswell - X86
JVMS Prev Current %Change
4 205332 198512 -3.32145
1 319785 313559 -1.94693
2 Socket - 4 Node Power8 - PowerNV
JVMS Prev Current %Change
8 74912 74761.9 -0.200368
1 206585 214874 4.01239
2 Socket - 2 Node Power9 - PowerNV
JVMS Prev Current %Change
4 189162 180536 -4.56011
1 213760 210281 -1.62753
4 Socket - 4 Node Power7 - PowerVM
JVMS Prev Current %Change
8 58736.8 56511.4 -3.78877
1 105419 104899 -0.49327
Avoiding stretching of window intervals may be the reason for the
regression. Also code now uses READ_ONCE/WRITE_ONCE. That may
also be hurting performance to some extent.
Some events stats before and after applying the patch.
perf stats 8th warehouse Multi JVM 2 Socket - 2 Node Haswell - X86
Event Before After
cs 14,285,708 13,818,546
migrations 1,180,621 1,149,960
faults 339,114 385,583
cache-misses 55,205,631,894 55,259,546,768
sched:sched_move_numa 843 2,257
sched:sched_stick_numa 6 9
sched:sched_swap_numa 219 512
migrate:mm_migrate_pages 365 2,225
vmstat 8th warehouse Multi JVM 2 Socket - 2 Node Haswell - X86
Event Before After
numa_hint_faults 26907 72692
numa_hint_faults_local 24279 62270
numa_hit 239771 238762
numa_huge_pte_updates 0 48
numa_interleave 68 75
numa_local 239688 238676
numa_other 83 86
numa_pages_migrated 363 2225
numa_pte_updates 27415 98557
perf stats 8th warehouse Single JVM 2 Socket - 2 Node Haswell - X86
Event Before After
cs 3,202,779 3,173,490
migrations 37,186 36,966
faults 106,076 108,776
cache-misses 12,024,873,744 12,200,075,320
sched:sched_move_numa 931 1,264
sched:sched_stick_numa 0 0
sched:sched_swap_numa 1 0
migrate:mm_migrate_pages 637 899
vmstat 8th warehouse Single JVM 2 Socket - 2 Node Haswell - X86
Event Before After
numa_hint_faults 17409 21109
numa_hint_faults_local 14367 17120
numa_hit 73953 72934
numa_huge_pte_updates 20 42
numa_interleave 25 33
numa_local 73892 72866
numa_other 61 68
numa_pages_migrated 668 915
numa_pte_updates 27276 42326
perf stats 8th warehouse Multi JVM 2 Socket - 2 Node Power9 - PowerNV
Event Before After
cs 8,474,013 8,312,022
migrations 254,934 231,705
faults 320,506 310,242
cache-misses 110,580,458 402,324,573
sched:sched_move_numa 725 193
sched:sched_stick_numa 0 0
sched:sched_swap_numa 7 3
migrate:mm_migrate_pages 145 93
vmstat 8th warehouse Multi JVM 2 Socket - 2 Node Power9 - PowerNV
Event Before After
numa_hint_faults 22797 11838
numa_hint_faults_local 21539 11216
numa_hit 89308 90689
numa_huge_pte_updates 0 0
numa_interleave 865 1579
numa_local 88955 89634
numa_other 353 1055
numa_pages_migrated 149 92
numa_pte_updates 22930 12109
perf stats 8th warehouse Single JVM 2 Socket - 2 Node Power9 - PowerNV
Event Before After
cs 2,195,628 2,170,481
migrations 11,179 10,126
faults 149,656 160,962
cache-misses 8,117,515 10,834,845
sched:sched_move_numa 49 10
sched:sched_stick_numa 0 0
sched:sched_swap_numa 0 0
migrate:mm_migrate_pages 5 2
vmstat 8th warehouse Single JVM 2 Socket - 2 Node Power9 - PowerNV
Event Before After
numa_hint_faults 3577 403
numa_hint_faults_local 3476 358
numa_hit 26142 25898
numa_huge_pte_updates 0 0
numa_interleave 358 207
numa_local 26042 25860
numa_other 100 38
numa_pages_migrated 5 2
numa_pte_updates 3587 400
perf stats 8th warehouse Multi JVM 4 Socket - 4 Node Power7 - PowerVM
Event Before After
cs 100,602,296 110,339,633
migrations 4,135,630 4,139,812
faults 789,256 863,622
cache-misses 226,160,621,058 231,838,045,660
sched:sched_move_numa 1,366 2,196
sched:sched_stick_numa 16 33
sched:sched_swap_numa 374 544
migrate:mm_migrate_pages 1,350 2,469
vmstat 8th warehouse Multi JVM 4 Socket - 4 Node Power7 - PowerVM
Event Before After
numa_hint_faults 47857 85748
numa_hint_faults_local 39768 66831
numa_hit 240165 242213
numa_huge_pte_updates 0 0
numa_interleave 0 0
numa_local 240165 242211
numa_other 0 2
numa_pages_migrated 1224 2376
numa_pte_updates 48354 86233
perf stats 8th warehouse Single JVM 4 Socket - 4 Node Power7 - PowerVM
Event Before After
cs 58,515,496 59,331,057
migrations 564,845 552,019
faults 245,807 266,586
cache-misses 73,603,757,976 73,796,312,990
sched:sched_move_numa 996 981
sched:sched_stick_numa 10 54
sched:sched_swap_numa 193 286
migrate:mm_migrate_pages 646 713
vmstat 8th warehouse Single JVM 4 Socket - 4 Node Power7 - PowerVM
Event Before After
numa_hint_faults 13422 14807
numa_hint_faults_local 5619 5738
numa_hit 36118 36230
numa_huge_pte_updates 0 0
numa_interleave 0 0
numa_local 36116 36228
numa_other 2 2
numa_pages_migrated 616 703
numa_pte_updates 13374 14742
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jirka Hladky <jhladky@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1537552141-27815-6-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
migrate_task_rq_fair() resets the scan rate for NUMA balancing on every
cross-node migration. In the event of excessive load balancing due to
saturation, this may result in the scan rate being pegged at maximum and
further overloading the machine.
This patch only resets the scan if NUMA balancing is active, a preferred
node has been selected and the task is being migrated from the preferred
node as these are the most harmful. For example, a migration to the preferred
node does not justify a faster scan rate. Similarly, a migration between two
nodes that are not preferred is probably bouncing due to over-saturation of
the machine. In that case, scanning faster and trapping more NUMA faults
will further overload the machine.
Specjbb2005 results (8 warehouses)
Higher bops are better
2 Socket - 2 Node Haswell - X86
JVMS Prev Current %Change
4 203370 205332 0.964744
1 328431 319785 -2.63252
2 Socket - 4 Node Power8 - PowerNV
JVMS Prev Current %Change
1 206070 206585 0.249915
2 Socket - 2 Node Power9 - PowerNV
JVMS Prev Current %Change
4 188386 189162 0.41192
1 201566 213760 6.04963
4 Socket - 4 Node Power7 - PowerVM
JVMS Prev Current %Change
8 59157.4 58736.8 -0.710985
1 105495 105419 -0.0720413
Some events stats before and after applying the patch.
perf stats 8th warehouse Multi JVM 2 Socket - 2 Node Haswell - X86
Event Before After
cs 13,825,492 14,285,708
migrations 1,152,509 1,180,621
faults 371,948 339,114
cache-misses 55,654,206,041 55,205,631,894
sched:sched_move_numa 1,856 843
sched:sched_stick_numa 4 6
sched:sched_swap_numa 428 219
migrate:mm_migrate_pages 898 365
vmstat 8th warehouse Multi JVM 2 Socket - 2 Node Haswell - X86
Event Before After
numa_hint_faults 57146 26907
numa_hint_faults_local 51612 24279
numa_hit 238164 239771
numa_huge_pte_updates 16 0
numa_interleave 63 68
numa_local 238085 239688
numa_other 79 83
numa_pages_migrated 883 363
numa_pte_updates 67540 27415
perf stats 8th warehouse Single JVM 2 Socket - 2 Node Haswell - X86
Event Before After
cs 3,288,525 3,202,779
migrations 38,652 37,186
faults 111,678 106,076
cache-misses 12,111,197,376 12,024,873,744
sched:sched_move_numa 900 931
sched:sched_stick_numa 0 0
sched:sched_swap_numa 5 1
migrate:mm_migrate_pages 714 637
vmstat 8th warehouse Single JVM 2 Socket - 2 Node Haswell - X86
Event Before After
numa_hint_faults 18572 17409
numa_hint_faults_local 14850 14367
numa_hit 73197 73953
numa_huge_pte_updates 11 20
numa_interleave 25 25
numa_local 73138 73892
numa_other 59 61
numa_pages_migrated 712 668
numa_pte_updates 24021 27276
perf stats 8th warehouse Multi JVM 2 Socket - 2 Node Power9 - PowerNV
Event Before After
cs 8,451,543 8,474,013
migrations 202,804 254,934
faults 310,024 320,506
cache-misses 253,522,507 110,580,458
sched:sched_move_numa 213 725
sched:sched_stick_numa 0 0
sched:sched_swap_numa 2 7
migrate:mm_migrate_pages 88 145
vmstat 8th warehouse Multi JVM 2 Socket - 2 Node Power9 - PowerNV
Event Before After
numa_hint_faults 11830 22797
numa_hint_faults_local 11301 21539
numa_hit 90038 89308
numa_huge_pte_updates 0 0
numa_interleave 855 865
numa_local 89796 88955
numa_other 242 353
numa_pages_migrated 88 149
numa_pte_updates 12039 22930
perf stats 8th warehouse Single JVM 2 Socket - 2 Node Power9 - PowerNV
Event Before After
cs 2,049,153 2,195,628
migrations 11,405 11,179
faults 162,309 149,656
cache-misses 7,203,343 8,117,515
sched:sched_move_numa 22 49
sched:sched_stick_numa 0 0
sched:sched_swap_numa 0 0
migrate:mm_migrate_pages 1 5
vmstat 8th warehouse Single JVM 2 Socket - 2 Node Power9 - PowerNV
Event Before After
numa_hint_faults 1693 3577
numa_hint_faults_local 1669 3476
numa_hit 25177 26142
numa_huge_pte_updates 0 0
numa_interleave 194 358
numa_local 24993 26042
numa_other 184 100
numa_pages_migrated 1 5
numa_pte_updates 1577 3587
perf stats 8th warehouse Multi JVM 4 Socket - 4 Node Power7 - PowerVM
Event Before After
cs 94,515,937 100,602,296
migrations 4,203,554 4,135,630
faults 832,697 789,256
cache-misses 226,248,698,331 226,160,621,058
sched:sched_move_numa 1,730 1,366
sched:sched_stick_numa 14 16
sched:sched_swap_numa 432 374
migrate:mm_migrate_pages 1,398 1,350
vmstat 8th warehouse Multi JVM 4 Socket - 4 Node Power7 - PowerVM
Event Before After
numa_hint_faults 80079 47857
numa_hint_faults_local 68620 39768
numa_hit 241187 240165
numa_huge_pte_updates 0 0
numa_interleave 0 0
numa_local 241186 240165
numa_other 1 0
numa_pages_migrated 1347 1224
numa_pte_updates 80729 48354
perf stats 8th warehouse Single JVM 4 Socket - 4 Node Power7 - PowerVM
Event Before After
cs 63,704,961 58,515,496
migrations 573,404 564,845
faults 230,878 245,807
cache-misses 76,568,222,781 73,603,757,976
sched:sched_move_numa 509 996
sched:sched_stick_numa 31 10
sched:sched_swap_numa 182 193
migrate:mm_migrate_pages 541 646
vmstat 8th warehouse Single JVM 4 Socket - 4 Node Power7 - PowerVM
Event Before After
numa_hint_faults 8501 13422
numa_hint_faults_local 2960 5619
numa_hit 35526 36118
numa_huge_pte_updates 0 0
numa_interleave 0 0
numa_local 35526 36116
numa_other 0 2
numa_pages_migrated 539 616
numa_pte_updates 8433 13374
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jirka Hladky <jhladky@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1537552141-27815-5-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Currently task scan rate is reset when NUMA balancer migrates the task
to a different node. If NUMA balancer initiates a swap, reset is only
applicable to the task that initiates the swap. Similarly no scan rate
reset is done if the task is migrated across nodes by traditional load
balancer.
Instead move the scan reset to the migrate_task_rq. This ensures the
task moved out of its preferred node, either gets back to its preferred
node quickly or finds a new preferred node. Doing so, would be fair to
all tasks migrating across nodes.
Specjbb2005 results (8 warehouses)
Higher bops are better
2 Socket - 2 Node Haswell - X86
JVMS Prev Current %Change
4 200668 203370 1.3465
1 321791 328431 2.06345
2 Socket - 4 Node Power8 - PowerNV
JVMS Prev Current %Change
1 204848 206070 0.59654
2 Socket - 2 Node Power9 - PowerNV
JVMS Prev Current %Change
4 188098 188386 0.153112
1 200351 201566 0.606436
4 Socket - 4 Node Power7 - PowerVM
JVMS Prev Current %Change
8 58145.9 59157.4 1.73959
1 103798 105495 1.63491
Some events stats before and after applying the patch.
perf stats 8th warehouse Multi JVM 2 Socket - 2 Node Haswell - X86
Event Before After
cs 13,912,183 13,825,492
migrations 1,155,931 1,152,509
faults 367,139 371,948
cache-misses 54,240,196,814 55,654,206,041
sched:sched_move_numa 1,571 1,856
sched:sched_stick_numa 9 4
sched:sched_swap_numa 463 428
migrate:mm_migrate_pages 703 898
vmstat 8th warehouse Multi JVM 2 Socket - 2 Node Haswell - X86
Event Before After
numa_hint_faults 50155 57146
numa_hint_faults_local 45264 51612
numa_hit 239652 238164
numa_huge_pte_updates 36 16
numa_interleave 68 63
numa_local 239576 238085
numa_other 76 79
numa_pages_migrated 680 883
numa_pte_updates 71146 67540
perf stats 8th warehouse Single JVM 2 Socket - 2 Node Haswell - X86
Event Before After
cs 3,156,720 3,288,525
migrations 30,354 38,652
faults 97,261 111,678
cache-misses 12,400,026,826 12,111,197,376
sched:sched_move_numa 4 900
sched:sched_stick_numa 0 0
sched:sched_swap_numa 1 5
migrate:mm_migrate_pages 20 714
vmstat 8th warehouse Single JVM 2 Socket - 2 Node Haswell - X86
Event Before After
numa_hint_faults 272 18572
numa_hint_faults_local 186 14850
numa_hit 71362 73197
numa_huge_pte_updates 0 11
numa_interleave 23 25
numa_local 71299 73138
numa_other 63 59
numa_pages_migrated 2 712
numa_pte_updates 0 24021
perf stats 8th warehouse Multi JVM 2 Socket - 2 Node Power9 - PowerNV
Event Before After
cs 8,606,824 8,451,543
migrations 155,352 202,804
faults 301,409 310,024
cache-misses 157,759,224 253,522,507
sched:sched_move_numa 168 213
sched:sched_stick_numa 0 0
sched:sched_swap_numa 3 2
migrate:mm_migrate_pages 125 88
vmstat 8th warehouse Multi JVM 2 Socket - 2 Node Power9 - PowerNV
Event Before After
numa_hint_faults 4650 11830
numa_hint_faults_local 3946 11301
numa_hit 90489 90038
numa_huge_pte_updates 0 0
numa_interleave 892 855
numa_local 90034 89796
numa_other 455 242
numa_pages_migrated 124 88
numa_pte_updates 4818 12039
perf stats 8th warehouse Single JVM 2 Socket - 2 Node Power9 - PowerNV
Event Before After
cs 2,113,167 2,049,153
migrations 10,533 11,405
faults 142,727 162,309
cache-misses 5,594,192 7,203,343
sched:sched_move_numa 10 22
sched:sched_stick_numa 0 0
sched:sched_swap_numa 0 0
migrate:mm_migrate_pages 6 1
vmstat 8th warehouse Single JVM 2 Socket - 2 Node Power9 - PowerNV
Event Before After
numa_hint_faults 744 1693
numa_hint_faults_local 584 1669
numa_hit 25551 25177
numa_huge_pte_updates 0 0
numa_interleave 263 194
numa_local 25302 24993
numa_other 249 184
numa_pages_migrated 6 1
numa_pte_updates 744 1577
perf stats 8th warehouse Multi JVM 4 Socket - 4 Node Power7 - PowerVM
Event Before After
cs 101,227,352 94,515,937
migrations 4,151,829 4,203,554
faults 745,233 832,697
cache-misses 224,669,561,766 226,248,698,331
sched:sched_move_numa 617 1,730
sched:sched_stick_numa 2 14
sched:sched_swap_numa 187 432
migrate:mm_migrate_pages 316 1,398
vmstat 8th warehouse Multi JVM 4 Socket - 4 Node Power7 - PowerVM
Event Before After
numa_hint_faults 24195 80079
numa_hint_faults_local 21639 68620
numa_hit 238331 241187
numa_huge_pte_updates 0 0
numa_interleave 0 0
numa_local 238331 241186
numa_other 0 1
numa_pages_migrated 204 1347
numa_pte_updates 24561 80729
perf stats 8th warehouse Single JVM 4 Socket - 4 Node Power7 - PowerVM
Event Before After
cs 62,738,978 63,704,961
migrations 562,702 573,404
faults 228,465 230,878
cache-misses 75,778,067,952 76,568,222,781
sched:sched_move_numa 648 509
sched:sched_stick_numa 13 31
sched:sched_swap_numa 137 182
migrate:mm_migrate_pages 733 541
vmstat 8th warehouse Single JVM 4 Socket - 4 Node Power7 - PowerVM
Event Before After
numa_hint_faults 10281 8501
numa_hint_faults_local 3242 2960
numa_hit 36338 35526
numa_huge_pte_updates 0 0
numa_interleave 0 0
numa_local 36338 35526
numa_other 0 0
numa_pages_migrated 706 539
numa_pte_updates 10176 8433
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jirka Hladky <jhladky@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1537552141-27815-4-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|