summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-11-17Merge tag 'xtensa-20161116' of git://github.com/jcmvbkbc/linux-xtensaLinus Torvalds3-60/+37
Pull Xtensa fixes from Max Filippov: - fix register dumps, stack dumps and stack traces that got torn due to recent printk changes - wire up pkey_{mprotect,alloc,free} syscalls * tag 'xtensa-20161116' of git://github.com/jcmvbkbc/linux-xtensa: xtensa: wire up new pkey_{mprotect,alloc,free} syscalls xtensa: clean up printk usage for boot/crash logging
2016-11-17ARM: Fix XIP kernelsRussell King1-0/+5
Commit 7619751f8c90 ("ARM: 8595/2: apply more __ro_after_init") caused a regression with XIP kernels by moving the __ro_after_init data into the read-only section. With XIP kernels, the read-only section is located in read-only memory from the very beginning. Work around this by moving the __ro_after_init data back into the .data section, which will be in RAM, and hence will be writable. It should be noted that in doing so, this remains writable after init. Fixes: 7619751f8c90 ("ARM: 8595/2: apply more __ro_after_init") Reported-by: Andrea Merello <andrea.merello@gmail.com> Tested-by: Andrea Merello <andrea.merello@gmail.com> [ XIP stm32 ] Tested-by: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2016-11-17Merge branch 'drm-fixes-4.9' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie5-24/+28
into drm-fixes Just a few bug fixes for 4.9. The big one is Mario's prime fencing fix. * 'drm-fixes-4.9' of git://people.freedesktop.org/~agd5f/linux: drm/amdgpu:fix vpost_needed routine drm/amdgpu/powerplay: drop a redundant NULL check drm/amdgpu: Attach exclusive fence to prime exported bo's. (v5)
2016-11-17Merge branch 'mediatek-drm-fixes-2016-11-11' of ↵Dave Airlie6-21/+51
https://github.com/ckhu-mediatek/linux.git-tags into drm-fixes This branch include one patch to fix a typo, two patches to disable vblank interrupt, and three patches to support HDMI 4K resolution. * 'mediatek-drm-fixes-2016-11-11' of https://github.com/ckhu-mediatek/linux.git-tags: drm/mediatek: modify the factor to make the pll_rate set in the 1G-2G range drm/mediatek: enhance the HDMI driving current drm/mediatek: do mtk_hdmi_send_infoframe after HDMI clock enable drm/mediatek: clear IRQ status before enable OVL interrupt drm/mediatek: set vblank_disable_allowed to true drm/mediatek: fix a typo of OD_CFG to OD_RELAYMODE
2016-11-17net/phy/vitesse: Configure RGMII skew on VSC8601, if neededAlex1-1/+33
With RGMII, we need a 1.5 to 2ns skew between clock and data lines. The VSC8601 can handle this internally. While the VSC8601 can set more fine-grained delays, the standard skew settings work out of the box. The same heuristic is used to determine when this skew should be enabled as in vsc824x_config_init(). Tested on custom board with AM3352 SOC and VSC801 PHY. Signed-off-by: Alexandru Gagniuc <alex.g@adaptrum.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17cxgb4: do not call napi_hash_del()Eric Dumazet1-1/+0
Calling napi_hash_del() before netif_napi_del() is dangerous if a synchronize_rcu() is not enforced before NAPI struct freeing. Lets leave this detail to core networking stack and feel more comfortable. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hariprasad S <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17be2net: do not call napi_hash_del()Eric Dumazet1-1/+0
Calling napi_hash_del() before netif_napi_del() is dangerous if a synchronize_rcu() is not enforced before NAPI struct freeing. Lets leave this detail to core networking stack and feel more comfortable. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Sathya Perla <sathya.perla@broadcom.com> Cc: Ajit Khaparde <ajit.khaparde@broadcom.com> Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Cc: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17tools/power/acpi: Remove direct kernel source include referenceLv Zheng6-34/+56
Avoid breaking cross-compiled ACPI tools builds by rearranging the handling of kernel header files. This patch also contains OUTPUT/srctree cleanups in order to make above fix working for various build environments. Fixes: e323c02dee59 (ACPICA: MSVC9: Fix <sys/stat.h> inclusion order issue) Reported-and-tested-by: Yisheng Xie <xieyisheng1@huawei.com> Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-16virtio-net: add a missing synchronize_net()Eric Dumazet1-0/+5
It seems many drivers do not respect napi_hash_del() contract. When napi_hash_del() is used before netif_napi_del(), an RCU grace period is needed before freeing NAPI object. Fixes: 91815639d880 ("virtio-net: rx busy polling support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16gpio: Remove GPIO_DEVRES optionKeno Fischer2-5/+1
This option was added in 6a89a314ab107a12af08c71420c19a37a30fc2d3 to allow use of the devm_gpio_* functions without CONFIG_GPIOLIB. However, only a few months later in b69ac52449c658b7ac40034dc3c5f5f4a71a723d, CONFIG_GPIOLIB was added as a dependency, defeating the original purpose of this option. Instead of that patch, the original commit could have just been reverted (and in fact was partially so in 403c1d0be5ccbd750d25c59d8358843a81e52e3b). Further, since this option has a dependency on HAS_IOMEM, even though it does not require it, it causes build failures when !HAS_IOMEM (e.g. in a uml build). Fix that by completely removing the option, in essence completing the reversion of the original commit. Signed-off-by: Keno Fischer <keno@juliacomputing.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-11-16nvme/pci: Don't free queues on errorKeith Busch1-14/+4
The nvme_remove function tears down all allocated resources in the correct order, so no need to free queues on error during initialization. This fixes possible use-after-free errors when queues are still associated with a blk-mq hctx. Reported-by: Scott Bauer <scott.bauer@intel.com> Tested-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimbeg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-16Merge tag 'sunxi-clk-fixes-for-4.9' of ↵Stephen Boyd2-1/+13
https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into clk-fixes Pull Allwinner clock fixes from Maxime Ripard: Two fixes, one for the old clock code, one for the new implementation. * tag 'sunxi-clk-fixes-for-4.9' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux: clk: sunxi: Fix M factor computation for APB1 clk: sunxi-ng: sun6i-a31: Force AHB1 clock to use PLL6 as parent
2016-11-16clk: efm32gg: Pass correct type to hw provider registrationStephen Boyd1-1/+1
Dan Carpenter reports that we're passing a pointer to a pointer here when we should just be passing a pointer. Pass the right pointer so that the of_clk_hw_onecell_get() sees the appropriate data pointer on its end. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Stephen Boyd <stephen.boyd@linaro.org> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Fixes: 9337631f52a8 ("clk: efm32gg: Migrate to clk_hw based OF and registration APIs") Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2016-11-16clk: berlin: Pass correct type to hw provider registrationStephen Boyd2-2/+2
Dan Carpenter reports that we're passing a pointer to a pointer here when we should just be passing a pointer. Pass the right pointer so that the of_clk_hw_onecell_get() sees the appropriate data pointer on its end. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Jisheng Zhang <jszhang@marvell.com> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Cc: Stephen Boyd <stephen.boyd@linaro.org> Fixes: f6475e298297 ("clk: berlin: Migrate to clk_hw based registration and OF APIs") Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2016-11-16Merge branch 'thunderx-fixes'David S. Miller9-234/+274
Sunil Goutham says: ==================== net: thunderx: Miscellaneous fixes This patchset includes fixes for incorrect LMAC credits, unreliable driver statistics, memory leak upon interface down e.t.c Changes from v1: - As suggested replaced bit shifting with BIT() macro in the patch 'Fix configuration of L3/L4 length checking'. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16net: thunderx: Fix memory leak and other issues upon interface toggleSunil Goutham2-7/+18
This patch fixes the following 1. When interface is being teardown and queues are being cleaned up, free pending SKBs that are in SQ which are either not transmitted or freed as NAPI is disabled by that time. 2. While interface initialization, delay CFG_DONE notification till the end to avoid corner cases where TXQs are enabled but CQ interrupts are not which results blocking transmission and kicking off watchdog. 3. Check for IFF_UP while re-enabling RBDR interrupts from tasklet. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16net: thunderx: Fix VF driver's interface statisticsSunil Goutham6-196/+197
This patch fixes multiple issues 1. Convert all driver statistics to percpu counters for accuracy. 2. To avoid multiple CQEs posted by a TSO packet appended to HW, TSO pkt's SQE has 'post_cqe' not set but a dummy SQE is added for getting HW transmit completion notification. This dummy SQE has 'dont_send' set and HW drops the pkt pointed to in this thus Tx drop counter increases. This patch fixes this by subtracting SW tx tso counter from HW Tx drop counter for actual packet drop counter. 3. Reset all individual queue's and VNIC HW stats when interface is going down. 4. Getrid off unnecessary counters in hot path. 5. Bringout all CQE error stats i.e both Rx and Tx. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16net: thunderx: Fix configuration of L3/L4 length checkingSunil Goutham1-2/+5
This patch fixes enabling of HW verification of L3/L4 length and TCP/UDP checksum which is currently being cleared. Also fixed VLAN stripping config which is being cleared when multiqset is enabled. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16net: thunderx: Program LMAC credits based on MTUSunil Goutham6-30/+53
Programming LMAC credits taking 9K frame size by default is incorrect as for an interface which is one of the many on the same BGX/QLM no of credits available will be less as Tx FIFO will be divided across all interfaces. So let's say a BGX with 40G interface and another BGX with multiple 10G, bandwidth of 10G interfaces will be effected when traffic is running on both 40G and 10G interfaces simultaneously. This patch fixes this issue by programming credits based on netdev's MTU. Also fixed configuring MTU to HW and added CQE counter for pkts which exceed this value. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16net: thunderx: Introduce BGX_ID_MASK macro to extract bgx_idRadha Mohan Chintakuntla2-2/+4
This patch fixes the 'bgx_id' determination on 83xx where there are 4 BGX blocks instead of 2 on other platforms. Signed-off-by: Radha Mohan Chintakuntla <rchintakuntla@cavium.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16Merge branch 'fib-tables-fixes'David S. Miller3-6/+84
Alexander Duyck says: ==================== ipv4: Fix memory leaks and reference issues in fib This series fixes one major issue and one minor issue in the fib tables. The major issue is that we had lost the functionality that was flushing the local table entries from main after we had unmerged the two tries. In order to regain the functionality I have performed a partial revert and then moved the functionality for flushing the external entries from main into fib_unmerge. The minor issue was a memory leak that could occur in the event that we weren't able to add an alias to the local trie resulting in the fib alias being leaked. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16ipv4: Fix memory leak in exception case for splitting triesAlexander Duyck1-1/+3
Fix a small memory leak that can occur where we leak a fib_alias in the event of us not being able to insert it into the local table. Fixes: 0ddcf43d5d4a0 ("ipv4: FIB Local/MAIN table collapse") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16ipv4: Restore fib_trie_flush_external function and fix call orderingAlexander Duyck3-5/+81
The patch that removed the FIB offload infrastructure was a bit too aggressive and also removed code needed to clean up us splitting the table if additional rules were added. Specifically the function fib_trie_flush_external was called at the end of a new rule being added to flush the foreign trie entries from the main trie. I updated the code so that we only call fib_trie_flush_external on the main table so that we flush the entries for local from main. This way we don't call it for every rule change which is what was happening previously. Fixes: 347e3b28c1ba2 ("switchdev: remove FIB offload infrastructure") Reported-by: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16bpf: fix range arithmetic for bpf map accessJosef Bacik2-25/+50
I made some invalid assumptions with BPF_AND and BPF_MOD that could result in invalid accesses to bpf map entries. Fix this up by doing a few things 1) Kill BPF_MOD support. This doesn't actually get used by the compiler in real life and just adds extra complexity. 2) Fix the logic for BPF_AND, don't allow AND of negative numbers and set the minimum value to 0 for positive AND's. 3) Don't do operations on the ranges if they are set to the limits, as they are by definition undefined, and allowing arithmetic operations on those values could make them appear valid when they really aren't. This fixes the testcase provided by Jann as well as a few other theoretical problems. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16Merge branch 'for-linus' of ↵Linus Torvalds4-1/+14
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "A regression fix and bug fix bound for stable" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix fuse_write_end() if zero bytes were copied fuse: fix root dentry initialization
2016-11-16Merge tag 'mfd-fixes-4.9' of ↵Linus Torvalds5-27/+17
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD fixes from Lee Jones: - Fix PCI properties in intel-lpss-pci - Fix Resetting issue during suspend in intel-lpss-pci - Seperate IRQs for USBC device and CHRG in intel_soc_pmic_bxtwc - Add timeout to fix Resetting issue in stmpe - Ensure we 'put' reference to device when done in mfd-core * tag 'mfd-fixes-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: mfd: core: Fix device reference leak in mfd_clone_cell mfd: stmpe: Fix RESET regression on STMPE2401 mfd: intel_soc_pmic_bxtwc: Fix usbc interrupt mfd: intel-lpss: Do not put device in reset state on suspend mfd: lpss: Fix Intel Kaby Lake PCH-H properties
2016-11-16orangefs: add .owner to debugfs file_operationsMike Marshall1-0/+2
Without ".owner = THIS_MODULE" it is possible to crash the kernel by unloading the Orangefs module while someone is reading debugfs files. Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-11-16mfd: core: Fix device reference leak in mfd_clone_cellJohan Hovold1-0/+2
Make sure to drop the reference taken by bus_find_device_by_name() before returning from mfd_clone_cell(). Fixes: a9bbba996302 ("mfd: add platform_device sharing support for mfd") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2016-11-16mfd: stmpe: Fix RESET regression on STMPE2401Linus Walleij1-0/+2
Since commit c4dd1ba355aae2bc3d1213da6c66c53e3c31e028 ("mfd: stmpe: Add reset support for all STMPE variant") we're resetting the STMPE expanders before use. This caused a regression on the STMP2401 on the Nomadik NHK8815: stmpe-i2c 0-0043: stmpe2401 detected, chip id: 0x101 nmk-i2c 101f8000.i2c0: write to slave 0x43 timed out nmk-i2c 101f8000.i2c0: no ack received after address transmission stmpe-i2c 0-0044: stmpe2401 detected, chip id: 0x101 nmk-i2c 101f8000.i2c0: write to slave 0x44 timed out nmk-i2c 101f8000.i2c0: no ack received after address transmission It turns out that we start to poll for the reset bit to go low again too quickly: the STMPE2401 is not yet online and ready to be asked for the status of the RESET bit. By introducing a 10ms delay before starting to hammer the register for information, we get back to normal: stmpe-i2c 0-0043: stmpe2401 detected, chip id: 0x101 stmpe-i2c 0-0044: stmpe2401 detected, chip id: 0x101 Cc: stable@vger.kernel.org Cc: Amelie Delaunay <amelie.delaunay@st.com> Fixes: c4dd1ba355aa ("mfd: stmpe: Add reset support for all STMPE variant") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Patrice Chotard <patrice.chotard@st.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2016-11-16mfd: intel_soc_pmic_bxtwc: Fix usbc interruptHeikki Krogerus1-2/+4
The wcove USB Type-C driver is currently being flooded with interrupts that are not targeted to it. The reason for that is because all CHRG first level interrupts are mapped to it. This fixes the issue by introducing separate irq for the usbc device, and mapping only USB Type-C PHY interrupts to it. Fixes: 9c6235c86332 ("mfd: intel_soc_pmic_bxtwc: Add bxt_wcove_usbc device") Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2016-11-16mfd: intel-lpss: Do not put device in reset state on suspendAzhar Shaikh1-3/+0
Commit 41a3da2b8e163 ("mfd: intel-lpss: Save register context on suspend") saved the register context while going to suspend and also put the device in reset state. Due to the resetting of device, system cannot enter S3/S0ix states when no_console_suspend flag is enabled. The system and serial console both hang. The resetting of device is not needed while going to suspend. Hence remove this code. Cc: stable@vger.kernel.org Fixes: 41a3da2b8e163 ("mfd: intel-lpss: Save register context on suspend") Signed-off-by: Azhar Shaikh <azhar.shaikh@intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2016-11-16mfd: lpss: Fix Intel Kaby Lake PCH-H propertiesJarkko Nikula1-22/+9
There are a few issues on Intel Kaby Lake PCH-H properties added by commit a6a576b78e09 ("mfd: lpss: Add Intel Kaby Lake PCH-H PCI IDs"): - Input clock of I2C controller on Intel Kaby Lake PCH-H is 120 MHz not 133 MHz. This was probably copy-paste error from Intel Broxton I2C properties. - There is no default I2C SDA hold time specified which is used when ACPI doesn't provide it. I got information from Windows driver team that Kaby Lake PCH-H can use the same configuration than Intel Sunrisepoint PCH. - Common HS-UART properties are not used. Fix these by reusing the Sunrisepoint properties on Kaby Lake PCH-H. Fixes: a6a576b78e09 ("mfd: lpss: Add Intel Kaby Lake PCH-H PCI IDs") Reported-by: Xiang A Wang <xiang.a.wang@intel.com> Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2016-11-16sched/fair: Fix task group initializationVincent Guittot1-1/+1
The moves of tasks are now propagated down to root and the utilization of cfs_rq reflects reality so it doesn't need to be estimated at init. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: kernellwp@gmail.com Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1478598827-32372-7-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Propagate asynchrous detachVincent Guittot1-0/+6
A task can be asynchronously detached from cfs_rq when migrating between CPUs. The load of the migrated task is then removed from source cfs_rq during its next update. We use this event to set propagation flag. During the load balance, we take advantage of the update of blocked load to propagate any pending changes. The propagation relies on patch: "sched: Fix hierarchical order in rq->leaf_cfs_rq_list" ... which orders children and parents, to ensure that it's done in one pass. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: kernellwp@gmail.com Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1478598827-32372-6-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Propagate load during synchronous attach/detachVincent Guittot2-1/+188
When a task moves from/to a cfs_rq, we set a flag which is then used to propagate the change at parent level (sched_entity and cfs_rq) during next update. If the cfs_rq is throttled, the flag will stay pending until the cfs_rq is unthrottled. For propagating the utilization, we copy the utilization of group cfs_rq to the sched_entity. For propagating the load, we have to take into account the load of the whole task group in order to evaluate the load of the sched_entity. Similarly to what was done before the rewrite of PELT, we add a correction factor in case the task group's load is greater than its share so it will contribute the same load of a task of equal weight. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: kernellwp@gmail.com Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1478598827-32372-5-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Factorize PELT updateVincent Guittot1-51/+25
Every time we modify load/utilization of sched_entity, we start to sync it with its cfs_rq. This update is done in different ways: - when attaching/detaching a sched_entity, we update cfs_rq and then we sync the entity with the cfs_rq. - when enqueueing/dequeuing the sched_entity, we update both sched_entity and cfs_rq metrics to now. Use update_load_avg() everytime we have to update and sync cfs_rq and sched_entity before changing the state of a sched_enity. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: kernellwp@gmail.com Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1478598827-32372-4-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Fix hierarchical order in rq->leaf_cfs_rq_listVincent Guittot3-7/+49
Fix the insertion of cfs_rq in rq->leaf_cfs_rq_list to ensure that a child will always be called before its parent. The hierarchical order in shares update list has been introduced by commit: 67e86250f8ea ("sched: Introduce hierarchal order on shares update list") With the current implementation a child can be still put after its parent. Lets take the example of: root \ b /\ c d* | e* with root -> b -> c already enqueued but not d -> e so the leaf_cfs_rq_list looks like: head -> c -> b -> root -> tail The branch d -> e will be added the first time that they are enqueued, starting with e then d. When e is added, its parents is not already on the list so e is put at the tail : head -> c -> b -> root -> e -> tail Then, d is added at the head because its parent is already on the list: head -> d -> c -> b -> root -> e -> tail e is not placed at the right position and will be called the last whereas it should be called at the beginning. Because it follows the bottom-up enqueue sequence, we are sure that we will finished to add either a cfs_rq without parent or a cfs_rq with a parent that is already on the list. We can use this event to detect when we have finished to add a new branch. For the others, whose parents are not already added, we have to ensure that they will be added after their children that have just been inserted the steps before, and after any potential parents that are already in the list. The easiest way is to put the cfs_rq just after the last inserted one and to keep track of it untl the branch is fully added. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: kernellwp@gmail.com Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1478598827-32372-3-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Factorize attach/detach entityVincent Guittot1-22/+31
Factorize post_init_entity_util_avg() and part of attach_task_cfs_rq() in one function attach_entity_cfs_rq(). Create symmetric detach_entity_cfs_rq() function. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: kernellwp@gmail.com Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1478598827-32372-2-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Fix incorrect comment for capacity_marginMorten Rasmussen1-1/+1
The comment for capacity_margin introduced in: 3273163c6775 ("sched/fair: Let asymmetric CPU configurations balance at wake-up") ... got its usage the wrong way round - fix it. Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: freedom.tan@mediatek.com Cc: keita.kobayashi.ym@renesas.com Cc: mgalbraith@suse.de Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1476452472-24740-7-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Avoid pulling tasks from non-overloaded higher capacity groupsMorten Rasmussen1-0/+25
For asymmetric CPU capacity systems it is counter-productive for throughput if low capacity CPUs are pulling tasks from non-overloaded CPUs with higher capacity. The assumption is that higher CPU capacity is preferred over running alone in a group with lower CPU capacity. This patch rejects higher CPU capacity groups with one or less task per CPU as potential busiest group which could otherwise lead to a series of failing load-balancing attempts leading to a force-migration. Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: freedom.tan@mediatek.com Cc: keita.kobayashi.ym@renesas.com Cc: mgalbraith@suse.de Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1476452472-24740-5-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Add per-CPU min capacity to sched_group_capacityMorten Rasmussen3-7/+16
struct sched_group_capacity currently represents the compute capacity sum of all CPUs in the sched_group. Unless it is divided by the group_weight to get the average capacity per CPU, it hides differences in CPU capacity for mixed capacity systems (e.g. high RT/IRQ utilization or ARM big.LITTLE). But even the average may not be sufficient if the group covers CPUs of different capacities. Instead, by extending struct sched_group_capacity to indicate min per-CPU capacity in the group a suitable group for a given task utilization can more easily be found such that CPUs with reduced capacity can be avoided for tasks with high utilization (not implemented by this patch). Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: freedom.tan@mediatek.com Cc: keita.kobayashi.ym@renesas.com Cc: mgalbraith@suse.de Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1476452472-24740-4-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Consider spare capacity in find_idlest_group()Morten Rasmussen1-5/+45
In low-utilization scenarios comparing relative loads in find_idlest_group() doesn't always lead to the most optimum choice. Systems with groups containing different numbers of cpus and/or cpus of different compute capacity are significantly better off when considering spare capacity rather than relative load in those scenarios. In addition to existing load based search an alternative spare capacity based candidate sched_group is found and selected instead if sufficient spare capacity exists. If not, existing behaviour is preserved. Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: freedom.tan@mediatek.com Cc: keita.kobayashi.ym@renesas.com Cc: mgalbraith@suse.de Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1476452472-24740-3-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Compute task/cpu utilization at wake-up correctlyMorten Rasmussen1-4/+35
At task wake-up load-tracking isn't updated until the task is enqueued. The task's own view of its utilization contribution may therefore not be aligned with its contribution to the cfs_rq load-tracking which may have been updated in the meantime. Basically, the task's own utilization hasn't yet accounted for the sleep decay, while the cfs_rq may have (partially). Estimating the cfs_rq utilization in case the task is migrated at wake-up as task_rq(p)->cfs.avg.util_avg - p->se.avg.util_avg is therefore incorrect as the two load-tracking signals aren't time synchronized (different last update). To solve this problem, this patch synchronizes the task utilization with its previous rq before the task utilization is used in the wake-up path. Currently the update/synchronization is done _after_ the task has been placed by select_task_rq_fair(). The synchronization is done without having to take the rq lock using the existing mechanism used in remove_entity_load_avg(). Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: freedom.tan@mediatek.com Cc: keita.kobayashi.ym@renesas.com Cc: mgalbraith@suse.de Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1476452472-24740-2-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/x86: Do not clear PREEMPT_NEED_RESCHED on preempt count resetMartin Schwidefsky1-1/+7
The per-cpu preempt count of x86 contains two values, the actual preempt count and the inverted PREEMPT_NEED_RESCHED bit. If a corrupted preempt count is detected the preempt_count_set() function is used to reset the preempt count. In case the inverted PREEMPT_NEED_RESCHED bit is zero at the time of the reset, the preemption indication is lost. Use raw_cpu_cmpxchg_4() to reset only the count part and leave the PREEMPT_NEED_RESCHED bit as it is. This improves the kernel's behavior when it runs into preempt count leaks and tries to fix them up. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1478523660-733-1-git-send-email-schwidefsky@de.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/cpuacct: Avoid %lld seq_printf warningMartin Schwidefsky1-1/+1
For s390 kernel builds I keep getting this warning: kernel/sched/cpuacct.c: In function 'cpuacct_stats_show': kernel/sched/cpuacct.c:298:25: warning: format '%lld' expects argument of type 'long long int', but argument 4 has type 'clock_t {aka long int}' [-Wformat=] seq_printf(sf, "%s %lld\n", Silence the warning by adding an explicit cast. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161111142749.6545-1-schwidefsky@de.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar346-1504/+3174
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16x86/sysfb: Fix lfb_size calculationDavid Herrmann1-8/+17
The screen_info.lfb_size field is shifted by 16 bits *only* in case of VBE. This has historical reasons since VBE advertised it similarly. However, in case of EFI framebuffers, the size is no longer shifted. Fix the x86 simple-framebuffer setup code to use the correct size in the non-VBE case. While at it, avoid variable abbreviations and rename 'len' to 'length', and use the correct types matching the screen_info definition. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Gundersen <teg@jklm.no> Link: http://lkml.kernel.org/r/20161115120158.15388-3-dh.herrmann@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16x86/sysfb: Add support for 64bit EFI lfb_baseDavid Herrmann1-2/+16
The screen_info object was extended to support 64-bit lfb_base addresses in: ae2ee627dc87 ("efifb: Add support for 64-bit frame buffer addresses") However, the x86 simple-framebuffer setup code never made use of it. Fix it to properly assemble and verify the lfb_base before advertising simple-framebuffer devices. In particular, this means if VIDEO_CAPABILITY_64BIT_BASE is set, the screen_info->ext_lfb_base field will contain the upper 32bit of the actual lfb_base. Make sure the address is not 0 (i.e., unset), as well as does not overflow the physical address type. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Gundersen <teg@jklm.no> Link: http://lkml.kernel.org/r/20161115120158.15388-2-dh.herrmann@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16drm/i915: Assume non-DP++ port if dvo_port is HDMI and there's no AUX ch ↵Ville Syrjälä2-9/+24
specified in the VBT My heuristic for detecting type 1 DVI DP++ adaptors based on the VBT port information apparently didn't survive the reality of buggy VBTs. In this particular case we have a machine with a natice HDMI port, but the VBT tells us it's a DP++ port based on its capabilities. The dvo_port information in VBT does claim that we're dealing with a HDMI port though, but we have other machines which do the same even when they actually have DP++ ports. So that piece of information alone isn't sufficient to tell the two apart. After staring at a bunch of VBTs from various machines, I have to conclude that the only other semi-reliable clue we can use is the presence of the AUX channel in the VBT. On this particular machine AUX channel is specified as zero, whereas on some of the other machines which listed the DP++ port as HDMI have a non-zero AUX channel. I've also seen VBTs which have dvo_port a DP but have a zero AUX channel. I believe those we need to treat as DP ports, so we'll limit the AUX channel check to just the cases where dvo_port is HDMI. If we encounter any more serious failures with this heuristic I think we'll have to have to throw it out entirely. But that could mean that there is a risk of type 1 DVI dongle users getting greeted by a black screen, so I'd rather not go there unless absolutely necessary. v2: Remove the duplicate PORT_A check (Daniel) Fix some typos in the commit message Cc: Daniel Otero <daniel.otero@outlook.com> Cc: stable@vger.kernel.org Tested-by: Daniel Otero <daniel.otero@outlook.com> Fixes: d61992565bd3 ("drm/i915: Determine DP++ type 1 DVI adaptor presence based on VBT") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97994 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1478884464-14251-1-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (cherry picked from commit 7a17995a3dc8613f778a9e2fd20e870f17789544) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-11-16rtnetlink: fix rtnl message size computation for XDPSabrina Dubroca1-1/+2
rtnl_xdp_size() only considers the size of the actual payload attribute, and misses the space taken by the attribute used for nesting (IFLA_XDP). Fixes: d1fdd9138682 ("rtnl: add option for setting link xdp prog") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>