BMC/Intel-BMC/linux.git - Intel OpenBMC Linux kernel source tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2018-05-25	fsi/sbefifo: Add driver for the SBE FIFO	Benjamin Herrenschmidt	1	-0/+33
	This driver provides an in-kernel and a user API for accessing the command FIFO of the SBE (Self Boot Engine) of the POWER9 processor, via the FSI bus. It provides an in-kernel interface to submit command and receive responses, along with a helper to locate and analyse the response status block. It's a simple synchronous submit() type API. The user interface uses the write/read interface that an earlier version of this driver already provided, however it has some specific limitations in order to keep the driver simple and avoid using up a lot of kernel memory: - The user should perform a single write() with the command and a single read() to get the response (with a buffer big enough to hold the entire response). - On a write() the command is simply "stored" into a kernel buffer, it is submitted as one operation on the subsequent read(). This allows to have the code write directly from the FIFO into the user buffer and avoid hogging the SBE between the write() and read() syscall as it's critical that the SBE be freed asap to respond to the host. An extra write() will simply replace the previously written command. - A write of a single 4 bytes containing the value 0x52534554 in big endian will trigger a reset request. No read is necessary, the write() call will return when the reset has been acknowledged or times out. - The command is limited to 4K bytes. OpenBMC-Staging-Count: 1 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-05-25	fsi: Remove old sbefifo driver	Benjamin Herrenschmidt	1	-30/+0
	And remporarily disable build of fsi-occ OpenBMC-Staging-Count: 1 Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-05-25	fsi/fsi-master-gpio: Implement CRC error recovery	Benjamin Herrenschmidt	1	-0/+27
	The FSI protocol defines two modes of recovery from CRC errors, this implements both: - If the device returns an ECRC (it detected a CRC error in the command), then we simply issue the command again. - If the master detects a CRC error in the response, we send an E_POLL command which requests a resend of the response without actually re-executing the command (which could otherwise have unwanted side effects such as dequeuing a FIFO twice). OpenBMC-Staging-Count: 1 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Christopher Bostic <cbostic@linux.vnet.ibm.com> Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-05-18	ipmi: kcs_bmc: coding-style fixes	Haiyue Wang	1	-3/+5
	OpenBMC-Staging-Count: 1 Signed-off-by: Haiyue Wang <haiyue.wang@linux.intel.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> [This is is not the same commit as upstream, as the changes to use new type '__poll_t' from linux-4.16-rc1 are not included. This was done because the feature is not a small self contained backport. There is no impact on functionality for dev-4.13] Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-05-18	ipmi: add a KCS IPMI BMC driver	Haiyue Wang	1	-0/+14
	Provides a device driver for the KCS (Keyboard Controller Style) IPMI interface which meets the requirement of the BMC (Baseboard Management Controllers) side for handling the IPMI request from host system software. Signed-off-by: Haiyue Wang <haiyue.wang@linux.intel.com> [Removed the selectability of IPMI_KCS_BMC, as it doesn't do much good to have it by itself.] Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-05-18	clk: aspeed: Add 24MHz fixed clock	Lei YU	1	-0/+1
	Add a 24MHz fixed clock that is provided by the input oscillator. This clock will be used for certain devices, e.g. pwm. OpenBMC-Staging-Count: 1 Signed-off-by: Lei YU <mine260309@gmail.com> Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-04-19	drm/simple_kms_helper: Add {enable\|disable}_vblank callback support	Oleksandr Andrushchenko	1	-0/+18
	If simple_kms_helper based driver needs to work with vblanks, then it has to provide drm_driver.{enable\|disable}_vblank callbacks, because drm_simple_kms_helper.drm_crtc_funcs does not provide any. At the same time drm_driver.{enable\|disable}_vblank callbacks are marked as deprecated and shouldn't be used by new drivers. Fix this by extending drm_simple_kms_helper.drm_crtc_funcs to provide the missing callbacks. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/1518425574-32671-2-git-send-email-andr2000@gmail.com (cherry picked from commit ac86cba96e2a439883d452772013049f54df2042) Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-04-19	drm: introduce drm_dev_{get/put} functions	Aishwarya Pant	1	-2/+3
	Reference counting functions in the kernel typically use get/put suffixes. For maintaining coding style consistency, introduce drm_dev_{get/put} functions. All callers of drm_dev_ref() API have been converted in this patch and hence it has been dropped while the drm_dev_unref() API with non-trivial number of users remains for compatibility. The semantic patch scripts/coccinelle/api/drm-get-put.cocci has been updated with the new helper for conversion of drm_dev_unref() to drm_dev_put() Signed-off-by: Aishwarya Pant <aishpant@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/6babda56134035a98220d5d37a4fd4048df214ce.1506413698.git.aishpant@gmail.com (cherry picked from commit 9a96f55034e41b4e002b767e9218d55f03bdff7d) Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-04-19	clk: aspeed: Support second reset register	Joel Stanley	1	-0/+1
	The ast2500 has an additional reset register that contains resets not present in the ast2400. This enables support for this register, and adds the one reset line that is controlled by it. OpenBMC-Staging-Count: 1 Reviewed-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-04-19	Merge tag 'drm-for-v4.14' into dev-4.13	Joel Stanley	45	-490/+1024
	This is the DRM tree that was merged into the mainline 4.14 kernel. The ASPEED GFX DRM driver requires functionaliy that was included in this tree, so we merge it back into the 4.13 OpenBMC tree. This should have no impact on other parts of the BMC. Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-04-11	dt-binding: clk: npcm750: add binding	Tali Perry	1	-0/+44
	Nuvoton Poleg BMC NPCM7XX contains an integrated clock controller, which generates and supplies clocks to all modules within the BMC. OpenBMC-Staging-Count: 1 Signed-off-by: Tali Perry <tali.perry1@gmail.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-04-11	serial: 8250: Add Nuvoton NPCM UART	Joel Stanley	1	-0/+3
	The Nuvoton UART is almost compatible with the 8250 driver when probed via the 8250_of driver, however it requires some extra configuration at startup. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Joel Stanley <joel@jms.id.au> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit f597fbce38d230af95384f4a04e0a13a1d0ad45d) Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-04-05	serial/8250: export serial8250_read_char	Jeremy Kerr	1	-0/+1
	Currently, we export serial8250_rx_chars, which does a whole bunch of reads from the 8250 data register, without any form of flow control between reads. An upcoming change to the aspeed vuart driver implements more fine-grained flow control in the interrupt handler, requiring character-at-a-time control over the rx path. This change exports serial8250_read_char to allow this. OpenBMC-Staging-Count: 1 Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Tested-by: Eddie James <eajames@linux.vnet.ibm.com> Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2018-04-05	serial: Introduce UPSTAT_SYNC_FIFO for synchronised FIFOs	Jeremy Kerr	1	-0/+1
	This change adds a flag to indicate that a UART is has an external means of synchronising its FIFO, without needing CTSRTS or XON/XOFF. This allows us to use the throttle/unthrottle callbacks, without having to claim other methods of flow control. OpenBMC-Staging-Count: 1 Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Tested-by: Eddie James <eajames@linux.vnet.ibm.com> Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2018-03-07	fsi: occ: Add tracepoints	Andrew Jeffery	1	-0/+86
	The OCC driver uses a workqueue to manage calls through to the SBEFIFO, which in turn uses a timer callback to execute FIFO transfers. To ease observation of end-to-end interactions e.g. from userspace `cat`ing an OCC hwmon attribute, add tracing to book-end SBEFIFO transfers. This provides some perspective on the time taken for a single OCC operation to take place. OpenBMC-Staging-Count: 1 Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Acked-by: Eddie James <eajames@linux.vnet.ibm.com> Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-03-07	fsi: gpio: Trace busy count	Andrew Jeffery	1	-0/+16
	An observation from trace output of the existing FSI tracepoints was that the remote device was sometimes reporting as busy. Add a new tracepoint reporting the busy count in order to get a better grip on how often this is the case. OpenBMC-Staging-Count: 1 Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Acked-by: Eddie James <eajames@linux.vnet.ibm.com> Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-03-07	fsi: sbefifo: Add tracing of transfers	Andrew Jeffery	1	-0/+93
	OpenBMC-Staging-Count: 1 Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Acked-by: Eddie James <eajames@linux.vnet.ibm.com> Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-03-07	net/ncsi: Add generic netlink family	Samuel Mendoza-Jonas	1	-0/+115
	Add a generic netlink family for NCSI. This supports three commands; NCSI_CMD_PKG_INFO which returns information on packages and their associated channels, NCSI_CMD_SET_INTERFACE which allows a specific package or package/channel combination to be set as the preferred choice, and NCSI_CMD_CLEAR_INTERFACE which clears any preferred setting. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 955dc68cb9b23b42999cafe6df3684309bc686c6) Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-02-08	gpio: gpiolib: Generalise state persistence beyond sleep	Andrew Jeffery	5	-6/+16
	General support for state persistence is added to gpiolib with the introduction of a new pinconf parameter to propagate the request to hardware. The existing persistence support for sleep is adapted to include hardware support if the GPIO driver provides it. Persistence continues to be enabled by default; in-kernel consumers can opt out, but userspace (currently) does not have a choice. The _SLEEP_MAY_LOSE_VALUE and _SLEEP_MAINTAIN_VALUE symbols are renamed, dropping the SLEEP prefix to reflect that the concept is no longer sleep-specific. I feel that renaming to just _MAY_LOSE_VALUE could initially be misinterpreted, so I've further changed the symbols to _TRANSITORY and *_PERSISTENT to address this. The sysfs interface is modified only to keep consistency with the chardev interface in enforcing persistence for userspace exports. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> (cherry picked from commit e10f72bf4b3e8885c1915a119141481e7fc45ca8) Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-01-18	dt-bindings: gpio: Add ASPEED constants	Joel Stanley	1	-0/+49
	These are used to by the device tree to map pin numbers to constants required by the GPIO bindings. OpenBMC-Staging-Count: 1 Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-01-18	dt-bindings: clock: Add ASPEED constants	Joel Stanley	1	-2/+1
	These will be used by the clock driver. This commit is included so the tree will build without the clock series being applied. OpenBMC-Staging-Count: 1 Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Joel Stanley <joel@jms.id.au>
2017-11-28	Merge tag 'v4.13.16' into dev-4.13	Joel Stanley	33	-124/+180
	This is the 4.13.16 stable release
2017-11-28	leds: pca955x: add GPIO support	Cédric Le Goater	1	-0/+16
	The PCA955x family of chips are I2C LED blinkers whose pins not used to control LEDs can be used as general purpose I/Os (GPIOs). The following adds such a support by defining different operation modes for the pins (See bindings documentation for more details). The pca955x driver is then extended with a gpio_chip when some of pins are operating as GPIOs. The default operating mode is to behave as a LED. The GPIO support is conditioned by CONFIG_LEDS_PCA955X_GPIO. Signed-off-by: Cédric Le Goater <clg@kaod.org> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com> (cherry picked from commit 561099a1a2e992a482a8318c0c9c5af26222e5cd) Signed-off-by: Joel Stanley <joel@jms.id.au>
2017-11-28	leds: gpio: Allow LED to retain state at shutdown	Andrew Jeffery	1	-0/+2
	In some systems, such as Baseboard Management Controllers (BMCs), we want to retain the state of LEDs across a reboot of the BMC (whilst the host remains up). Implement support for the retain-state-shutdown devicetree property in leds-gpio. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Acked-by: Pavel Machek <pavel@ucw.cz> Tested-by: Brandon Wyman <bjwyman@gmail.com> Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com> (cherry picked from commit f5808ac158f2b16b686a3d3c0879c5d6048aba14) Signed-off-by: Joel Stanley <joel@jms.id.au>
2017-11-28	drivers/fsi: occ: Add in-kernel API	Edward A. James	1	-0/+41
	Add an in-kernel read/write API with exported functions. This is necessary for other drivers which want to directly interact with the OCC. Also add a child platform device node for the associated hwmon device. OpenBMC-Staging-Count: 2 Signed-off-by: Edward A. James <eajames@us.ibm.com> Signed-off-by: Joel Stanley <joel@jms.id.au>
2017-11-28	drivers/fsi: sbefifo: Add in-kernel API	Edward A. James	1	-0/+30
	Refactor the user interface of the SBEFIFO driver to allow for an in-kernel read/write API. Add exported functions for other drivers to call, and add an include file with those functions. Also parse the device tree for child nodes and create child platform devices accordingly. OpenBMC-Staging-Count: 2 Signed-off-by: Edward A. James <eajames@us.ibm.com> Signed-off-by: Joel Stanley <joel@jms.id.au>
2017-11-24	mm/page_alloc.c: broken deferred calculation	Pavel Tatashin	1	-1/+2
	commit d135e5750205a21a212a19dbb05aeb339e2cbea7 upstream. In reset_deferred_meminit() we determine number of pages that must not be deferred. We initialize pages for at least 2G of memory, but also pages for reserved memory in this node. The reserved memory is determined in this function: memblock_reserved_memory_within(), which operates over physical addresses, and returns size in bytes. However, reset_deferred_meminit() assumes that that this function operates with pfns, and returns page count. The result is that in the best case machine boots slower than expected due to initializing more pages than needed in single thread, and in the worst case panics because fewer than needed pages are initialized early. Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com Fixes: 864b9a393dcb ("mm: consider memblock reservations for deferred memory initialization sizing") Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24	netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed	Ye Yin	1	-0/+7
	[ Upstream commit 2b5ec1a5f9738ee7bf8f5ec0526e75e00362c48f ] When run ipvs in two different network namespace at the same host, and one ipvs transport network traffic to the other network namespace ipvs. 'ipvs_property' flag will make the second ipvs take no effect. So we should clear 'ipvs_property' when SKB network namespace changed. Fixes: 621e84d6f373 ("dev: introduce skb_scrub_packet()") Signed-off-by: Ye Yin <hustcat@gmail.com> Signed-off-by: Wei Zhou <chouryzhou@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-18	tcp: fix tcp_mtu_probe() vs highest_sack	Eric Dumazet	1	-3/+3
	[ Upstream commit 2b7cda9c35d3b940eb9ce74b30bbd5eb30db493d ] Based on SNMP values provided by Roman, Yuchung made the observation that some crashes in tcp_sacktag_walk() might be caused by MTU probing. Looking at tcp_mtu_probe(), I found that when a new skb was placed in front of the write queue, we were not updating tcp highest sack. If one skb is freed because all its content was copied to the new skb (for MTU probing), then tp->highest_sack could point to a now freed skb. Bad things would then happen, including infinite loops. This patch renames tcp_highest_sack_combine() and uses it from tcp_mtu_probe() to fix the bug. Note that I also removed one test against tp->sacked_out, since we want to replace tp->highest_sack regardless of whatever condition, since keeping a stale pointer to freed skb is a recipe for disaster. Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Reported-by: Roman Gushchin <guro@fb.com> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-18	tap: reference to KVA of an unloaded module causes kernel panic	Girish Moodalbail	1	-2/+2
	[ Upstream commit dea6e19f4ef746aa18b4c33d1a7fed54356796ed ] The commit 9a393b5d5988 ("tap: tap as an independent module") created a separate tap module that implements tap functionality and exports interfaces that will be used by macvtap and ipvtap modules to create create respective tap devices. However, that patch introduced a regression wherein the modules macvtap and ipvtap can be removed (through modprobe -r) while there are applications using the respective /dev/tapX devices. These applications cause kernel to hold reference to /dev/tapX through 'struct cdev macvtap_cdev' and 'struct cdev ipvtap_dev' defined in macvtap and ipvtap modules respectively. So, when the application is later closed the kernel panics because we are referencing KVA that is present in the unloaded modules. ----------8<------- Example ----------8<---------- $ sudo ip li add name mv0 link enp7s0 type macvtap $ sudo ip li show mv0 \|grep mv0\| awk -e '{print $1 $2}' 14:mv0@enp7s0: $ cat /dev/tap14 & $ lsmod \|egrep -i 'tap\|vlan' macvtap 16384 0 macvlan 24576 1 macvtap tap 24576 3 macvtap $ sudo modprobe -r macvtap $ fg cat /dev/tap14 ^C <...system panics...> BUG: unable to handle kernel paging request at ffffffffa038c500 IP: cdev_put+0xf/0x30 ----------8<-----------------8<---------- The fix is to set cdev.owner to the module that creates the tap device (either macvtap or ipvtap). With this set, the operations (in fs/char_dev.c) on char device holds and releases the module through cdev_get() and cdev_put() and will not allow the module to unload prematurely. Fixes: 9a393b5d5988ea4e (tap: tap as an independent module) Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-18	tcp/dccp: fix other lockdep splats accessing ireq_opt	Eric Dumazet	1	-0/+6
	[ Upstream commit 06f877d613be3621604c2520ec0351d9fbdca15f ] In my first attempt to fix the lockdep splat, I forgot we could enter inet_csk_route_req() with a freshly allocated request socket, for which refcount has not yet been elevated, due to complex SLAB_TYPESAFE_BY_RCU rules. We either are in rcu_read_lock() section _or_ we own a refcount on the request. Correct RCU verb to use here is rcu_dereference_check(), although it is not possible to prove we actually own a reference on a shared refcount :/ In v2, I added ireq_opt_deref() helper and use in three places, to fix other possible splats. [ 49.844590] lockdep_rcu_suspicious+0xea/0xf3 [ 49.846487] inet_csk_route_req+0x53/0x14d [ 49.848334] tcp_v4_route_req+0xe/0x10 [ 49.850174] tcp_conn_request+0x31c/0x6a0 [ 49.851992] ? __lock_acquire+0x614/0x822 [ 49.854015] tcp_v4_conn_request+0x5a/0x79 [ 49.855957] ? tcp_v4_conn_request+0x5a/0x79 [ 49.858052] tcp_rcv_state_process+0x98/0xdcc [ 49.859990] ? sk_filter_trim_cap+0x2f6/0x307 [ 49.862085] tcp_v4_do_rcv+0xfc/0x145 [ 49.864055] ? tcp_v4_do_rcv+0xfc/0x145 [ 49.866173] tcp_v4_rcv+0x5ab/0xaf9 [ 49.868029] ip_local_deliver_finish+0x1af/0x2e7 [ 49.870064] ip_local_deliver+0x1b2/0x1c5 [ 49.871775] ? inet_del_offload+0x45/0x45 [ 49.873916] ip_rcv_finish+0x3f7/0x471 [ 49.875476] ip_rcv+0x3f1/0x42f [ 49.876991] ? ip_local_deliver_finish+0x2e7/0x2e7 [ 49.878791] __netif_receive_skb_core+0x6d3/0x950 [ 49.880701] ? process_backlog+0x7e/0x216 [ 49.882589] __netif_receive_skb+0x1d/0x5e [ 49.884122] process_backlog+0x10c/0x216 [ 49.885812] net_rx_action+0x147/0x3df Fixes: a6ca7abe53633 ("tcp/dccp: fix lockdep splat in inet_csk_route_req()") Fixes: c92e8c02fe66 ("tcp/dccp: fix ireq->opt races") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: kernel test robot <fengguang.wu@intel.com> Reported-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-18	tcp/dccp: fix ireq->opt races	Eric Dumazet	1	-1/+1
	[ Upstream commit c92e8c02fe664155ac4234516e32544bec0f113d ] syzkaller found another bug in DCCP/TCP stacks [1] For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix ireq->pktopts race"), we need to make sure we do not access ireq->opt unless we own the request sock. Note the opt field is renamed to ireq_opt to ease grep games. [1] BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474 Read of size 1 at addr ffff8801c951039c by task syz-executor5/3295 CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x25b/0x340 mm/kasan/report.c:409 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427 ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474 tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135 tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587 tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557 __tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072 tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline] tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071 tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816 tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:249 [inline] ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:464 [inline] ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:249 [inline] ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587 netif_receive_skb+0xae/0x390 net/core/dev.c:4611 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792 call_write_iter include/linux/fs.h:1770 [inline] new_sync_write fs/read_write.c:468 [inline] __vfs_write+0x68a/0x970 fs/read_write.c:481 vfs_write+0x18f/0x510 fs/read_write.c:543 SYSC_write fs/read_write.c:588 [inline] SyS_write+0xef/0x220 fs/read_write.c:580 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x40c341 RSP: 002b:00007f469523ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 000000000040c341 RDX: 0000000000000037 RSI: 0000000020004000 RDI: 0000000000000015 RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000000f4240 R11: 0000000000000293 R12: 00000000004b7fd1 R13: 00000000ffffffff R14: 0000000020000000 R15: 0000000000025000 Allocated by task 3295: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 __do_kmalloc mm/slab.c:3725 [inline] __kmalloc+0x162/0x760 mm/slab.c:3734 kmalloc include/linux/slab.h:498 [inline] tcp_v4_save_options include/net/tcp.h:1962 [inline] tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271 tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283 tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313 tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857 tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482 tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:249 [inline] ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:464 [inline] ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:249 [inline] ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587 netif_receive_skb+0xae/0x390 net/core/dev.c:4611 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792 call_write_iter include/linux/fs.h:1770 [inline] new_sync_write fs/read_write.c:468 [inline] __vfs_write+0x68a/0x970 fs/read_write.c:481 vfs_write+0x18f/0x510 fs/read_write.c:543 SYSC_write fs/read_write.c:588 [inline] SyS_write+0xef/0x220 fs/read_write.c:580 entry_SYSCALL_64_fastpath+0x1f/0xbe Freed by task 3306: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524 __cache_free mm/slab.c:3503 [inline] kfree+0xca/0x250 mm/slab.c:3820 inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157 __sk_destruct+0xfd/0x910 net/core/sock.c:1560 sk_destruct+0x47/0x80 net/core/sock.c:1595 __sk_free+0x57/0x230 net/core/sock.c:1603 sk_free+0x2a/0x40 net/core/sock.c:1614 sock_put include/net/sock.h:1652 [inline] inet_csk_complete_hashdance+0xd5/0xf0 net/ipv4/inet_connection_sock.c:959 tcp_check_req+0xf4d/0x1620 net/ipv4/tcp_minisocks.c:765 tcp_v4_rcv+0x17f6/0x2f80 net/ipv4/tcp_ipv4.c:1675 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:249 [inline] ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:464 [inline] ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:249 [inline] ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587 netif_receive_skb+0xae/0x390 net/core/dev.c:4611 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792 call_write_iter include/linux/fs.h:1770 [inline] new_sync_write fs/read_write.c:468 [inline] __vfs_write+0x68a/0x970 fs/read_write.c:481 vfs_write+0x18f/0x510 fs/read_write.c:543 SYSC_write fs/read_write.c:588 [inline] SyS_write+0xef/0x220 fs/read_write.c:580 entry_SYSCALL_64_fastpath+0x1f/0xbe Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-18	tun: call dev_get_valid_name() before register_netdevice()	Cong Wang	1	-0/+3
	[ Upstream commit 0ad646c81b2182f7fa67ec0c8c825e0ee165696d ] register_netdevice() could fail early when we have an invalid dev name, in which case ->ndo_uninit() is not called. For tun device, this is a problem because a timer etc. are already initialized and it expects ->ndo_uninit() to clean them up. We could move these initializations into a ->ndo_init() so that register_netdevice() knows better, however this is still complicated due to the logic in tun_detach(). Therefore, I choose to just call dev_get_valid_name() before register_netdevice(), which is quicker and much easier to audit. And for this specific case, it is already enough. Fixes: 96442e42429e ("tuntap: choose the txq based on rxq") Reported-by: Dmitry Alexeev <avekceeb@gmail.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-15	ALSA: seq: Avoid invalid lockdep class warning	Takashi Iwai	1	-1/+2
	commit 3510c7aa069aa83a2de6dab2b41401a198317bdc upstream. The recent fix for adding rwsem nesting annotation was using the given "hop" argument as the lock subclass key. Although the idea itself works, it may trigger a kernel warning like: BUG: looking up invalid subclass: 8 .... since the lockdep has a smaller number of subclasses (8) than we currently allow for the hops there (10). The current definition is merely a sanity check for avoiding the too deep delivery paths, and the 8 hops are already enough. So, as a quick fix, just follow the max hops as same as the max lockdep subclasses. Fixes: 1f20f9ff57ca ("ALSA: seq: Fix nested rwsem annotation for lockdep splat") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-15	ALSA: timer: Limit max instances per timer	Takashi Iwai	1	-0/+2
	commit 9b7d869ee5a77ed4a462372bb89af622e705bfb8 upstream. Currently we allow unlimited number of timer instances, and it may bring the system hogging way too much CPU when too many timer instances are opened and processed concurrently. This may end up with a soft-lockup report as triggered by syzkaller, especially when hrtimer backend is deployed. Since such insane number of instances aren't demanded by the normal use case of ALSA sequencer and it merely opens a risk only for abuse, this patch introduces the upper limit for the number of instances per timer backend. As default, it's set to 1000, but for the fine-grained timer like hrtimer, it's set to 100. Reported-by: syzbot Tested-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-15	ACPICA: Make it possible to enable runtime GPEs earlier	Rafael J. Wysocki	1	-1/+2
	commit 1312b7e0caca44e7ff312bc2eaa888943384e3e1 upstream. Runtime GPEs have corresponding _Lxx/_Exx methods and are enabled automatically during the initialization of the ACPI subsystem through acpi_update_all_gpes() with the assumption that acpi_setup_gpe_for_wake() will be called in advance for all of the GPEs pointed to by _PRW objects in the namespace that may be affected by acpi_update_all_gpes(). That is, acpi_ev_initialize_gpe_block() can only be called for a GPE block after acpi_setup_gpe_for_wake() has been called for all of the _PRW (wakeup) GPEs in it. The platform firmware on some systems, however, expects GPEs to be enabled before the enumeration of devices which is when acpi_setup_gpe_for_wake() is called and that goes against the above assumption. For this reason, introduce a new flag to be set by acpi_ev_initialize_gpe_block() when automatically enabling a GPE to indicate to acpi_setup_gpe_for_wake() that it needs to drop the reference to the GPE coming from acpi_ev_initialize_gpe_block() and modify acpi_setup_gpe_for_wake() accordingly. These changes allow acpi_setup_gpe_for_wake() and acpi_ev_initialize_gpe_block() to be invoked in any order. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-15	netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to rhashtable"	Florian Westphal	2	-3/+1
	commit e1bf1687740ce1a3598a1c5e452b852ff2190682 upstream. This reverts commit 870190a9ec9075205c0fa795a09fa931694a3ff1. It was not a good idea. The custom hash table was a much better fit for this purpose. A fast lookup is not essential, in fact for most cases there is no lookup at all because original tuple is not taken and can be used as-is. What needs to be fast is insertion and deletion. rhlist removal however requires a rhlist walk. We can have thousands of entries in such a list if source port/addresses are reused for multiple flows, if this happens removal requests are so expensive that deletions of a few thousand flows can take several seconds(!). The advantages that we got from rhashtable are: 1) table auto-sizing 2) multiple locks 1) would be nice to have, but it is not essential as we have at most one lookup per new flow, so even a million flows in the bysource table are not a problem compared to current deletion cost. 2) is easy to add to custom hash table. I tried to add hlist_node to rhlist to speed up rhltable_remove but this isn't doable without changing semantics. rhltable_remove_fast will check that the to-be-deleted object is part of the table and that requires a list walk that we want to avoid. Furthermore, using hlist_node increases size of struct rhlist_head, which in turn increases nf_conn size. Link: https://bugzilla.kernel.org/show_bug.cgi?id=196821 Reported-by: Ivan Babrou <ibobrik@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-08	mm, swap: fix race between swap count continuation operations	Huang Ying	1	-0/+4
	commit 2628bd6fc052bd85e9864dae4de494d8a6313391 upstream. One page may store a set of entries of the sis->swap_map (swap_info_struct->swap_map) in multiple swap clusters. If some of the entries has sis->swap_map[offset] > SWAP_MAP_MAX, multiple pages will be used to store the set of entries of the sis->swap_map. And the pages are linked with page->lru. This is called swap count continuation. To access the pages which store the set of entries of the sis->swap_map simultaneously, previously, sis->lock is used. But to improve the scalability of __swap_duplicate(), swap cluster lock may be used in swap_count_continued() now. This may race with add_swap_count_continuation() which operates on a nearby swap cluster, in which the sis->swap_map entries are stored in the same page. The race can cause wrong swap count in practice, thus cause unfreeable swap entries or software lockup, etc. To fix the race, a new spin lock called cont_lock is added to struct swap_info_struct to protect the swap count continuation page list. This is a lock at the swap device level, so the scalability isn't very well. But it is still much better than the original sis->lock, because it is only acquired/released when swap count continuation is used. Which is considered rare in practice. If it turns out that the scalability becomes an issue for some workloads, we can split the lock into some more fine grained locks. Link: http://lkml.kernel.org/r/20171017081320.28133-1-ying.huang@intel.com Fixes: 235b62176712 ("mm/swap: add cluster lock") Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shaohua Li <shli@kernel.org> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02	spi: uapi: spidev: add missing ioctl header	Baruch Siach	1	-0/+1
	commit a2b4a79b88b24c49d98d45a06a014ffd22ada1a4 upstream. The SPI_IOC_MESSAGE() macro references _IOC_SIZEBITS. Add linux/ioctl.h to make sure this macro is defined. This fixes the following build failure of lcdproc with the musl libc: In file included from .../sysroot/usr/include/sys/ioctl.h:7:0, from hd44780-spi.c:31: hd44780-spi.c: In function 'spi_transfer': hd44780-spi.c:89:24: error: '_IOC_SIZEBITS' undeclared (first use in this function) status = ioctl(p->fd, SPI_IOC_MESSAGE(1), &xfer); ^ Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-27	ALSA: hda - Fix incorrect TLV callback check introduced during set_fs() removal	Takashi Iwai	1	-0/+3
	commit a91d66129fb9bcead12af3ed2008d6ddbf179509 upstream. The commit 99b5c5bb9a54 ("ALSA: hda - Remove the use of set_fs()") converted the get_kctl_0dB_offset() call for killing set_fs() usage in HD-audio codec code. The conversion assumed that the TLV callback used in HD-audio code is only snd_hda_mixer_amp() and applies the TLV calculation locally. Although this assumption is correct, and all slave kctls are actually with that callback, the current code is still utterly buggy; it doesn't hit this condition and falls back to the next check. It's because the function gets called after adding slave kctls to vmaster. By assigning a slave kctl, the slave kctl object is faked inside vmaster code, and the whole kctl ops are overridden. Thus the callback op points to a different value from what we've assumed. More badly, as reported by the KERNEXEC and UDEREF features of PaX, the code flow turns into the unexpected pitfall. The next fallback check is SNDRV_CTL_ELEM_ACCESS_TLV_READ access bit, and this always hits for each kctl with TLV. Then it evaluates the callback function pointer wrongly as if it were a TLV array. Although currently its side-effect is fairly limited, this incorrect reference may lead to an unpleasant result. For addressing the regression, this patch introduces a new helper to vmaster code, snd_ctl_apply_vmaster_slaves(). This works similarly like the existing map_slaves() in hda_codec.c: it loops over the slave list of the given master, and applies the given function to each slave. Then the initializer function receives the right kctl object and we can compare the correct pointer instead of the faked one. Also, for catching the similar breakage in future, give an error message when the unexpected TLV callback is found and bail out immediately. Fixes: 99b5c5bb9a54 ("ALSA: hda - Remove the use of set_fs()") Reported-by: PaX Team <pageexec@freemail.hu> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-27	KEYS: Fix race between updating and finding a negative key	David Howells	1	-17/+30
	commit 363b02dab09b3226f3bd1420dad9c72b79a42a76 upstream. Consolidate KEY_FLAG_INSTANTIATED, KEY_FLAG_NEGATIVE and the rejection error into one field such that: (1) The instantiation state can be modified/read atomically. (2) The error can be accessed atomically with the state. (3) The error isn't stored unioned with the payload pointers. This deals with the problem that the state is spread over three different objects (two bits and a separate variable) and reading or updating them atomically isn't practical, given that not only can uninstantiated keys change into instantiated or rejected keys, but rejected keys can also turn into instantiated keys - and someone accessing the key might not be using any locking. The main side effect of this problem is that what was held in the payload may change, depending on the state. For instance, you might observe the key to be in the rejected state. You then read the cached error, but if the key semaphore wasn't locked, the key might've become instantiated between the two reads - and you might now have something in hand that isn't actually an error code. The state is now KEY_IS_UNINSTANTIATED, KEY_IS_POSITIVE or a negative error code if the key is negatively instantiated. The key_is_instantiated() function is replaced with key_is_positive() to avoid confusion as negative keys are also 'instantiated'. Additionally, barriering is included: (1) Order payload-set before state-set during instantiation. (2) Order state-read before payload-read when using the key. Further separate barriering is necessary if RCU is being used to access the payload content after reading the payload pointers. Fixes: 146aa8b1453b ("KEYS: Merge the type-specific data with the payload data") Reported-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-27	bus: mbus: fix window size calculation for 4GB windows	Jan Luebbe	1	-2/+2
	commit 2bbbd96357ce76cc45ec722c00f654aa7b189112 upstream. At least the Armada XP SoC supports 4GB on a single DRAM window. Because the size register values contain the actual size - 1, the MSB is set in that case. For example, the SDRAM window's control register's value is 0xffffffe1 for 4GB (bits 31 to 24 contain the size). The MBUS driver reads back each window's size from registers and calculates the actual size as (control_reg \| ~DDR_SIZE_MASK) + 1, which overflows for 32 bit values, resulting in other miscalculations further on (a bad RAM window for the CESA crypto engine calculated by mvebu_mbus_setup_cpu_target_nooverlap() in my case). This patch changes the type in 'struct mbus_dram_window' from u32 to u64, which allows us to keep using the same register calculation code in most MBUS-using drivers (which calculate ->size - 1 again). Fixes: fddddb52a6c4 ("bus: introduce an Marvell EBU MBus driver") Signed-off-by: Jan Luebbe <jlu@pengutronix.de> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21	vmbus: eliminate duplicate cached index	Stephen Hemminger	1	-14/+0
	commit 05d00bc94ac27d220d8a78e365d7fa3a26dcca17 upstream. Don't need cached read index anymore now that packet iterator is used. The iterator has the original read index until the visible read_index is updated. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21	vmbus: refactor hv_signal_on_read	Stephen Hemminger	1	-49/+0
	commit 8dd45f2ab005a1f3301296059b23b03ec3dbf79b upstream. The function hv_signal_on_read was defined in hyperv.h and only used in one place in ring_buffer code. Clearer to just move it inline there. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21	Drivers: hv: vmbus: Fix bugs in rescind handling	K. Y. Srinivasan	1	-1/+1
	commit 192b2d78722ffea188e5ec6ae5d55010dce05a4b upstream. This patch addresses the following bugs in the current rescind handling code: 1. Fixes a race condition where we may be invoking hv_process_channel_removal() on an already freed channel. 2. Prevents indefinite wait when rescinding sub-channels by correctly setting the probe_complete state. I would like to thank Dexuan for patiently reviewing earlier versions of this patch and identifying many of the issues fixed here. Greg, please apply this to 4.14-final. Fixes: '54a66265d675 ("Drivers: hv: vmbus: Fix rescind handling")' Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21	Drivers: hv: vmbus: Fix rescind handling issues	K. Y. Srinivasan	1	-0/+2
	commit 6f3d791f300618caf82a2be0c27456edd76d5164 upstream. This patch handles the following issues that were observed when we are handling racing channel offer message and rescind message for the same offer: 1. Since the host does not respond to messages on a rescinded channel, in the current code, we could be indefinitely blocked on the vmbus_open() call. 2. When a rescinded channel is being closed, if there is a pending interrupt on the channel, we could end up freeing the channel that the interrupt handler would run on. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-18	ALSA: seq: Fix copy_from_user() call inside lock	Takashi Iwai	1	-0/+1
	commit 5803b023881857db32ffefa0d269c90280a67ee0 upstream. The event handler in the virmidi sequencer code takes a read-lock for the linked list traverse, while it's calling snd_seq_dump_var_event() in the loop. The latter function may expand the user-space data depending on the event type. It eventually invokes copy_from_user(), which might be a potential dead-lock. The sequencer core guarantees that the user-space data is passed only with atomic=0 argument, but snd_virmidi_dev_receive_event() ignores it and always takes read-lock(). For avoiding the problem above, this patch introduces rwsem for non-atomic case, while keeping rwlock for atomic case. Also while we're at it: the superfluous irq flags is dropped in snd_virmidi_input_open(). Reported-by: Jia-Ju Bai <baijiaju1990@163.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-18	fs/mpage.c: fix mpage_writepage() for pages with buffers	Matthew Wilcox	1	-0/+1
	commit f892760aa66a2d657deaf59538fb69433036767c upstream. When using FAT on a block device which supports rw_page, we can hit BUG_ON(!PageLocked(page)) in try_to_free_buffers(). This is because we call clean_buffers() after unlocking the page we've written. Introduce a new clean_page_buffers() which cleans all buffers associated with a page and call it from within bdev_write_page(). [akpm@linux-foundation.org: s/PAGE_SIZE/~0U/ per Linus and Matthew] Link: http://lkml.kernel.org/r/20171006211541.GA7409@bombadil.infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reported-by: Toshi Kani <toshi.kani@hpe.com> Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Tested-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-12	udp: perform source validation for mcast early demux	Paolo Abeni	1	-1/+3
	[ Upstream commit bc044e8db7962e727a75b591b9851ff2ac5cf846 ] The UDP early demux can leverate the rx dst cache even for multicast unconnected sockets. In such scenario the ipv4 source address is validated only on the first packet in the given flow. After that, when we fetch the dst entry from the socket rx cache, we stop enforcing the rp_filter and we even start accepting any kind of martian addresses. Disabling the dst cache for unconnected multicast socket will cause large performace regression, nearly reducing by half the max ingress tput. Instead we factor out a route helper to completely validate an skb source address for multicast packets and we call it from the UDP early demux for mcast packets landing on unconnected sockets, after successful fetching the related cached dst entry. This still gives a measurable, but limited performance regression: rp_filter = 0 rp_filter = 1 edmux disabled: 1182 Kpps 1127 Kpps edmux before: 2238 Kpps 2238 Kpps edmux after: 2037 Kpps 2019 Kpps The above figures are on top of current net tree. Applying the net-next commit 6e617de84e87 ("net: avoid a full fib lookup when rp_filter is disabled.") the delta with rp_filter == 0 will decrease even more. Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-12	scsi: sd: Implement blacklist option for WRITE SAME w/ UNMAP	Martin K. Petersen	2	-0/+2
	commit 28a0bc4120d38a394499382ba21d6965a67a3703 upstream. SBC-4 states: "A MAXIMUM UNMAP LBA COUNT field set to a non-zero value indicates the maximum number of LBAs that may be unmapped by an UNMAP command" "A MAXIMUM WRITE SAME LENGTH field set to a non-zero value indicates the maximum number of contiguous logical blocks that the device server allows to be unmapped or written in a single WRITE SAME command." Despite the spec being clear on the topic, some devices incorrectly expect WRITE SAME commands with the UNMAP bit set to be limited to the value reported in MAXIMUM UNMAP LBA COUNT in the Block Limits VPD. Implement a blacklist option that can be used to accommodate devices with this behavior. Reported-by: Bill Kuzeja <William.Kuzeja@stratus.com> Reported-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Tested-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>