Age | Commit message (Collapse) | Author | Files | Lines |
|
This driver provides an in-kernel and a user API for accessing
the command FIFO of the SBE (Self Boot Engine) of the POWER9
processor, via the FSI bus.
It provides an in-kernel interface to submit command and receive
responses, along with a helper to locate and analyse the response
status block. It's a simple synchronous submit() type API.
The user interface uses the write/read interface that an earlier
version of this driver already provided, however it has some
specific limitations in order to keep the driver simple and
avoid using up a lot of kernel memory:
- The user should perform a single write() with the command and
a single read() to get the response (with a buffer big enough
to hold the entire response).
- On a write() the command is simply "stored" into a kernel buffer,
it is submitted as one operation on the subsequent read(). This
allows to have the code write directly from the FIFO into the user
buffer and avoid hogging the SBE between the write() and read()
syscall as it's critical that the SBE be freed asap to respond
to the host. An extra write() will simply replace the previously
written command.
- A write of a single 4 bytes containing the value 0x52534554
in big endian will trigger a reset request. No read is necessary,
the write() call will return when the reset has been acknowledged
or times out.
- The command is limited to 4K bytes.
OpenBMC-Staging-Count: 1
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
And remporarily disable build of fsi-occ
OpenBMC-Staging-Count: 1
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
The FSI protocol defines two modes of recovery from CRC errors,
this implements both:
- If the device returns an ECRC (it detected a CRC error in the
command), then we simply issue the command again.
- If the master detects a CRC error in the response, we send
an E_POLL command which requests a resend of the response
without actually re-executing the command (which could otherwise
have unwanted side effects such as dequeuing a FIFO twice).
OpenBMC-Staging-Count: 1
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Christopher Bostic <cbostic@linux.vnet.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
OpenBMC-Staging-Count: 1
Signed-off-by: Haiyue Wang <haiyue.wang@linux.intel.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
[This is is not the same commit as upstream, as the changes to use new
type '__poll_t' from linux-4.16-rc1 are not included. This was done
because the feature is not a small self contained backport. There is no
impact on functionality for dev-4.13]
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
Provides a device driver for the KCS (Keyboard Controller Style)
IPMI interface which meets the requirement of the BMC (Baseboard
Management Controllers) side for handling the IPMI request from
host system software.
Signed-off-by: Haiyue Wang <haiyue.wang@linux.intel.com>
[Removed the selectability of IPMI_KCS_BMC, as it doesn't do much
good to have it by itself.]
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
Add a 24MHz fixed clock that is provided by the input oscillator. This
clock will be used for certain devices, e.g. pwm.
OpenBMC-Staging-Count: 1
Signed-off-by: Lei YU <mine260309@gmail.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
If simple_kms_helper based driver needs to work with vblanks,
then it has to provide drm_driver.{enable|disable}_vblank callbacks,
because drm_simple_kms_helper.drm_crtc_funcs does not provide any.
At the same time drm_driver.{enable|disable}_vblank callbacks
are marked as deprecated and shouldn't be used by new drivers.
Fix this by extending drm_simple_kms_helper.drm_crtc_funcs
to provide the missing callbacks.
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1518425574-32671-2-git-send-email-andr2000@gmail.com
(cherry picked from commit ac86cba96e2a439883d452772013049f54df2042)
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
Reference counting functions in the kernel typically use get/put suffixes. For
maintaining coding style consistency, introduce drm_dev_{get/put} functions. All
callers of drm_dev_ref() API have been converted in this patch and hence it has
been dropped while the drm_dev_unref() API with non-trivial number of users
remains for compatibility.
The semantic patch scripts/coccinelle/api/drm-get-put.cocci has been updated
with the new helper for conversion of drm_dev_unref() to drm_dev_put()
Signed-off-by: Aishwarya Pant <aishpant@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/6babda56134035a98220d5d37a4fd4048df214ce.1506413698.git.aishpant@gmail.com
(cherry picked from commit 9a96f55034e41b4e002b767e9218d55f03bdff7d)
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
The ast2500 has an additional reset register that contains resets not
present in the ast2400. This enables support for this register, and adds
the one reset line that is controlled by it.
OpenBMC-Staging-Count: 1
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
This is the DRM tree that was merged into the mainline 4.14 kernel. The
ASPEED GFX DRM driver requires functionaliy that was included in this
tree, so we merge it back into the 4.13 OpenBMC tree.
This should have no impact on other parts of the BMC.
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
Nuvoton Poleg BMC NPCM7XX contains an integrated clock controller, which
generates and supplies clocks to all modules within the BMC.
OpenBMC-Staging-Count: 1
Signed-off-by: Tali Perry <tali.perry1@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
The Nuvoton UART is almost compatible with the 8250 driver when probed
via the 8250_of driver, however it requires some extra configuration
at startup.
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f597fbce38d230af95384f4a04e0a13a1d0ad45d)
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
Currently, we export serial8250_rx_chars, which does a whole bunch of
reads from the 8250 data register, without any form of flow control
between reads.
An upcoming change to the aspeed vuart driver implements more
fine-grained flow control in the interrupt handler, requiring
character-at-a-time control over the rx path.
This change exports serial8250_read_char to allow this.
OpenBMC-Staging-Count: 1
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Tested-by: Eddie James <eajames@linux.vnet.ibm.com>
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
|
|
This change adds a flag to indicate that a UART is has an external means
of synchronising its FIFO, without needing CTSRTS or XON/XOFF.
This allows us to use the throttle/unthrottle callbacks, without having
to claim other methods of flow control.
OpenBMC-Staging-Count: 1
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Tested-by: Eddie James <eajames@linux.vnet.ibm.com>
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
|
|
The OCC driver uses a workqueue to manage calls through to the SBEFIFO,
which in turn uses a timer callback to execute FIFO transfers. To ease
observation of end-to-end interactions e.g. from userspace `cat`ing an
OCC hwmon attribute, add tracing to book-end SBEFIFO transfers. This
provides some perspective on the time taken for a single OCC operation
to take place.
OpenBMC-Staging-Count: 1
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Acked-by: Eddie James <eajames@linux.vnet.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
An observation from trace output of the existing FSI tracepoints was
that the remote device was sometimes reporting as busy. Add a new
tracepoint reporting the busy count in order to get a better grip on how
often this is the case.
OpenBMC-Staging-Count: 1
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Acked-by: Eddie James <eajames@linux.vnet.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
OpenBMC-Staging-Count: 1
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Acked-by: Eddie James <eajames@linux.vnet.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
Add a generic netlink family for NCSI. This supports three commands;
NCSI_CMD_PKG_INFO which returns information on packages and their
associated channels, NCSI_CMD_SET_INTERFACE which allows a specific
package or package/channel combination to be set as the preferred
choice, and NCSI_CMD_CLEAR_INTERFACE which clears any preferred setting.
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 955dc68cb9b23b42999cafe6df3684309bc686c6)
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
General support for state persistence is added to gpiolib with the
introduction of a new pinconf parameter to propagate the request to
hardware. The existing persistence support for sleep is adapted to
include hardware support if the GPIO driver provides it. Persistence
continues to be enabled by default; in-kernel consumers can opt out, but
userspace (currently) does not have a choice.
The *_SLEEP_MAY_LOSE_VALUE and *_SLEEP_MAINTAIN_VALUE symbols are
renamed, dropping the SLEEP prefix to reflect that the concept is no
longer sleep-specific. I feel that renaming to just *_MAY_LOSE_VALUE
could initially be misinterpreted, so I've further changed the symbols
to *_TRANSITORY and *_PERSISTENT to address this.
The sysfs interface is modified only to keep consistency with the
chardev interface in enforcing persistence for userspace exports.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
(cherry picked from commit e10f72bf4b3e8885c1915a119141481e7fc45ca8)
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
These are used to by the device tree to map pin numbers to constants
required by the GPIO bindings.
OpenBMC-Staging-Count: 1
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
These will be used by the clock driver. This commit is included so
the tree will build without the clock series being applied.
OpenBMC-Staging-Count: 1
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
This is the 4.13.16 stable release
|
|
The PCA955x family of chips are I2C LED blinkers whose pins not used
to control LEDs can be used as general purpose I/Os (GPIOs).
The following adds such a support by defining different operation
modes for the pins (See bindings documentation for more details). The
pca955x driver is then extended with a gpio_chip when some of pins are
operating as GPIOs. The default operating mode is to behave as a LED.
The GPIO support is conditioned by CONFIG_LEDS_PCA955X_GPIO.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
(cherry picked from commit 561099a1a2e992a482a8318c0c9c5af26222e5cd)
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
In some systems, such as Baseboard Management Controllers (BMCs), we
want to retain the state of LEDs across a reboot of the BMC (whilst the
host remains up). Implement support for the retain-state-shutdown
devicetree property in leds-gpio.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Acked-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Brandon Wyman <bjwyman@gmail.com>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
(cherry picked from commit f5808ac158f2b16b686a3d3c0879c5d6048aba14)
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
Add an in-kernel read/write API with exported functions. This is
necessary for other drivers which want to directly interact with the
OCC. Also add a child platform device node for the associated hwmon
device.
OpenBMC-Staging-Count: 2
Signed-off-by: Edward A. James <eajames@us.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
Refactor the user interface of the SBEFIFO driver to allow for an
in-kernel read/write API. Add exported functions for other drivers to
call, and add an include file with those functions. Also parse the
device tree for child nodes and create child platform devices
accordingly.
OpenBMC-Staging-Count: 2
Signed-off-by: Edward A. James <eajames@us.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
commit d135e5750205a21a212a19dbb05aeb339e2cbea7 upstream.
In reset_deferred_meminit() we determine number of pages that must not
be deferred. We initialize pages for at least 2G of memory, but also
pages for reserved memory in this node.
The reserved memory is determined in this function:
memblock_reserved_memory_within(), which operates over physical
addresses, and returns size in bytes. However, reset_deferred_meminit()
assumes that that this function operates with pfns, and returns page
count.
The result is that in the best case machine boots slower than expected
due to initializing more pages than needed in single thread, and in the
worst case panics because fewer than needed pages are initialized early.
Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com
Fixes: 864b9a393dcb ("mm: consider memblock reservations for deferred memory initialization sizing")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 2b5ec1a5f9738ee7bf8f5ec0526e75e00362c48f ]
When run ipvs in two different network namespace at the same host, and one
ipvs transport network traffic to the other network namespace ipvs.
'ipvs_property' flag will make the second ipvs take no effect. So we should
clear 'ipvs_property' when SKB network namespace changed.
Fixes: 621e84d6f373 ("dev: introduce skb_scrub_packet()")
Signed-off-by: Ye Yin <hustcat@gmail.com>
Signed-off-by: Wei Zhou <chouryzhou@gmail.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 2b7cda9c35d3b940eb9ce74b30bbd5eb30db493d ]
Based on SNMP values provided by Roman, Yuchung made the observation
that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
Looking at tcp_mtu_probe(), I found that when a new skb was placed
in front of the write queue, we were not updating tcp highest sack.
If one skb is freed because all its content was copied to the new skb
(for MTU probing), then tp->highest_sack could point to a now freed skb.
Bad things would then happen, including infinite loops.
This patch renames tcp_highest_sack_combine() and uses it
from tcp_mtu_probe() to fix the bug.
Note that I also removed one test against tp->sacked_out,
since we want to replace tp->highest_sack regardless of whatever
condition, since keeping a stale pointer to freed skb is a recipe
for disaster.
Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-by: Roman Gushchin <guro@fb.com>
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit dea6e19f4ef746aa18b4c33d1a7fed54356796ed ]
The commit 9a393b5d5988 ("tap: tap as an independent module") created a
separate tap module that implements tap functionality and exports
interfaces that will be used by macvtap and ipvtap modules to create
create respective tap devices.
However, that patch introduced a regression wherein the modules macvtap
and ipvtap can be removed (through modprobe -r) while there are
applications using the respective /dev/tapX devices. These applications
cause kernel to hold reference to /dev/tapX through 'struct cdev
macvtap_cdev' and 'struct cdev ipvtap_dev' defined in macvtap and ipvtap
modules respectively. So, when the application is later closed the
kernel panics because we are referencing KVA that is present in the
unloaded modules.
----------8<------- Example ----------8<----------
$ sudo ip li add name mv0 link enp7s0 type macvtap
$ sudo ip li show mv0 |grep mv0| awk -e '{print $1 $2}'
14:mv0@enp7s0:
$ cat /dev/tap14 &
$ lsmod |egrep -i 'tap|vlan'
macvtap 16384 0
macvlan 24576 1 macvtap
tap 24576 3 macvtap
$ sudo modprobe -r macvtap
$ fg
cat /dev/tap14
^C
<...system panics...>
BUG: unable to handle kernel paging request at ffffffffa038c500
IP: cdev_put+0xf/0x30
----------8<-----------------8<----------
The fix is to set cdev.owner to the module that creates the tap device
(either macvtap or ipvtap). With this set, the operations (in
fs/char_dev.c) on char device holds and releases the module through
cdev_get() and cdev_put() and will not allow the module to unload
prematurely.
Fixes: 9a393b5d5988ea4e (tap: tap as an independent module)
Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 06f877d613be3621604c2520ec0351d9fbdca15f ]
In my first attempt to fix the lockdep splat, I forgot we could
enter inet_csk_route_req() with a freshly allocated request socket,
for which refcount has not yet been elevated, due to complex
SLAB_TYPESAFE_BY_RCU rules.
We either are in rcu_read_lock() section _or_ we own a refcount on the
request.
Correct RCU verb to use here is rcu_dereference_check(), although it is
not possible to prove we actually own a reference on a shared
refcount :/
In v2, I added ireq_opt_deref() helper and use in three places, to fix other
possible splats.
[ 49.844590] lockdep_rcu_suspicious+0xea/0xf3
[ 49.846487] inet_csk_route_req+0x53/0x14d
[ 49.848334] tcp_v4_route_req+0xe/0x10
[ 49.850174] tcp_conn_request+0x31c/0x6a0
[ 49.851992] ? __lock_acquire+0x614/0x822
[ 49.854015] tcp_v4_conn_request+0x5a/0x79
[ 49.855957] ? tcp_v4_conn_request+0x5a/0x79
[ 49.858052] tcp_rcv_state_process+0x98/0xdcc
[ 49.859990] ? sk_filter_trim_cap+0x2f6/0x307
[ 49.862085] tcp_v4_do_rcv+0xfc/0x145
[ 49.864055] ? tcp_v4_do_rcv+0xfc/0x145
[ 49.866173] tcp_v4_rcv+0x5ab/0xaf9
[ 49.868029] ip_local_deliver_finish+0x1af/0x2e7
[ 49.870064] ip_local_deliver+0x1b2/0x1c5
[ 49.871775] ? inet_del_offload+0x45/0x45
[ 49.873916] ip_rcv_finish+0x3f7/0x471
[ 49.875476] ip_rcv+0x3f1/0x42f
[ 49.876991] ? ip_local_deliver_finish+0x2e7/0x2e7
[ 49.878791] __netif_receive_skb_core+0x6d3/0x950
[ 49.880701] ? process_backlog+0x7e/0x216
[ 49.882589] __netif_receive_skb+0x1d/0x5e
[ 49.884122] process_backlog+0x10c/0x216
[ 49.885812] net_rx_action+0x147/0x3df
Fixes: a6ca7abe53633 ("tcp/dccp: fix lockdep splat in inet_csk_route_req()")
Fixes: c92e8c02fe66 ("tcp/dccp: fix ireq->opt races")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kernel test robot <fengguang.wu@intel.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c92e8c02fe664155ac4234516e32544bec0f113d ]
syzkaller found another bug in DCCP/TCP stacks [1]
For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix
ireq->pktopts race"), we need to make sure we do not access
ireq->opt unless we own the request sock.
Note the opt field is renamed to ireq_opt to ease grep games.
[1]
BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
Read of size 1 at addr ffff8801c951039c by task syz-executor5/3295
CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
print_address_description+0x73/0x250 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x25b/0x340 mm/kasan/report.c:409
__asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427
ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135
tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587
tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557
__tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072
tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline]
tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071
tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816
tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682
ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
dst_input include/net/dst.h:464 [inline]
ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
__netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
__netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
netif_receive_skb+0xae/0x390 net/core/dev.c:4611
tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
call_write_iter include/linux/fs.h:1770 [inline]
new_sync_write fs/read_write.c:468 [inline]
__vfs_write+0x68a/0x970 fs/read_write.c:481
vfs_write+0x18f/0x510 fs/read_write.c:543
SYSC_write fs/read_write.c:588 [inline]
SyS_write+0xef/0x220 fs/read_write.c:580
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x40c341
RSP: 002b:00007f469523ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 000000000040c341
RDX: 0000000000000037 RSI: 0000000020004000 RDI: 0000000000000015
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000f4240 R11: 0000000000000293 R12: 00000000004b7fd1
R13: 00000000ffffffff R14: 0000000020000000 R15: 0000000000025000
Allocated by task 3295:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
__do_kmalloc mm/slab.c:3725 [inline]
__kmalloc+0x162/0x760 mm/slab.c:3734
kmalloc include/linux/slab.h:498 [inline]
tcp_v4_save_options include/net/tcp.h:1962 [inline]
tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271
tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283
tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313
tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857
tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482
tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711
ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
dst_input include/net/dst.h:464 [inline]
ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
__netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
__netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
netif_receive_skb+0xae/0x390 net/core/dev.c:4611
tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
call_write_iter include/linux/fs.h:1770 [inline]
new_sync_write fs/read_write.c:468 [inline]
__vfs_write+0x68a/0x970 fs/read_write.c:481
vfs_write+0x18f/0x510 fs/read_write.c:543
SYSC_write fs/read_write.c:588 [inline]
SyS_write+0xef/0x220 fs/read_write.c:580
entry_SYSCALL_64_fastpath+0x1f/0xbe
Freed by task 3306:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
__cache_free mm/slab.c:3503 [inline]
kfree+0xca/0x250 mm/slab.c:3820
inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157
__sk_destruct+0xfd/0x910 net/core/sock.c:1560
sk_destruct+0x47/0x80 net/core/sock.c:1595
__sk_free+0x57/0x230 net/core/sock.c:1603
sk_free+0x2a/0x40 net/core/sock.c:1614
sock_put include/net/sock.h:1652 [inline]
inet_csk_complete_hashdance+0xd5/0xf0 net/ipv4/inet_connection_sock.c:959
tcp_check_req+0xf4d/0x1620 net/ipv4/tcp_minisocks.c:765
tcp_v4_rcv+0x17f6/0x2f80 net/ipv4/tcp_ipv4.c:1675
ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
dst_input include/net/dst.h:464 [inline]
ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
__netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
__netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
netif_receive_skb+0xae/0x390 net/core/dev.c:4611
tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
call_write_iter include/linux/fs.h:1770 [inline]
new_sync_write fs/read_write.c:468 [inline]
__vfs_write+0x68a/0x970 fs/read_write.c:481
vfs_write+0x18f/0x510 fs/read_write.c:543
SYSC_write fs/read_write.c:588 [inline]
SyS_write+0xef/0x220 fs/read_write.c:580
entry_SYSCALL_64_fastpath+0x1f/0xbe
Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0ad646c81b2182f7fa67ec0c8c825e0ee165696d ]
register_netdevice() could fail early when we have an invalid
dev name, in which case ->ndo_uninit() is not called. For tun
device, this is a problem because a timer etc. are already
initialized and it expects ->ndo_uninit() to clean them up.
We could move these initializations into a ->ndo_init() so
that register_netdevice() knows better, however this is still
complicated due to the logic in tun_detach().
Therefore, I choose to just call dev_get_valid_name() before
register_netdevice(), which is quicker and much easier to audit.
And for this specific case, it is already enough.
Fixes: 96442e42429e ("tuntap: choose the txq based on rxq")
Reported-by: Dmitry Alexeev <avekceeb@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3510c7aa069aa83a2de6dab2b41401a198317bdc upstream.
The recent fix for adding rwsem nesting annotation was using the given
"hop" argument as the lock subclass key. Although the idea itself
works, it may trigger a kernel warning like:
BUG: looking up invalid subclass: 8
....
since the lockdep has a smaller number of subclasses (8) than we
currently allow for the hops there (10).
The current definition is merely a sanity check for avoiding the too
deep delivery paths, and the 8 hops are already enough. So, as a
quick fix, just follow the max hops as same as the max lockdep
subclasses.
Fixes: 1f20f9ff57ca ("ALSA: seq: Fix nested rwsem annotation for lockdep splat")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9b7d869ee5a77ed4a462372bb89af622e705bfb8 upstream.
Currently we allow unlimited number of timer instances, and it may
bring the system hogging way too much CPU when too many timer
instances are opened and processed concurrently. This may end up with
a soft-lockup report as triggered by syzkaller, especially when
hrtimer backend is deployed.
Since such insane number of instances aren't demanded by the normal
use case of ALSA sequencer and it merely opens a risk only for abuse,
this patch introduces the upper limit for the number of instances per
timer backend. As default, it's set to 1000, but for the fine-grained
timer like hrtimer, it's set to 100.
Reported-by: syzbot
Tested-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1312b7e0caca44e7ff312bc2eaa888943384e3e1 upstream.
Runtime GPEs have corresponding _Lxx/_Exx methods and are enabled
automatically during the initialization of the ACPI subsystem through
acpi_update_all_gpes() with the assumption that acpi_setup_gpe_for_wake()
will be called in advance for all of the GPEs pointed to by _PRW
objects in the namespace that may be affected by acpi_update_all_gpes().
That is, acpi_ev_initialize_gpe_block() can only be called for a GPE
block after acpi_setup_gpe_for_wake() has been called for all of the
_PRW (wakeup) GPEs in it.
The platform firmware on some systems, however, expects GPEs to be
enabled before the enumeration of devices which is when
acpi_setup_gpe_for_wake() is called and that goes against the above
assumption.
For this reason, introduce a new flag to be set by
acpi_ev_initialize_gpe_block() when automatically enabling a GPE
to indicate to acpi_setup_gpe_for_wake() that it needs to drop the
reference to the GPE coming from acpi_ev_initialize_gpe_block()
and modify acpi_setup_gpe_for_wake() accordingly. These changes
allow acpi_setup_gpe_for_wake() and acpi_ev_initialize_gpe_block()
to be invoked in any order.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e1bf1687740ce1a3598a1c5e452b852ff2190682 upstream.
This reverts commit 870190a9ec9075205c0fa795a09fa931694a3ff1.
It was not a good idea. The custom hash table was a much better
fit for this purpose.
A fast lookup is not essential, in fact for most cases there is no lookup
at all because original tuple is not taken and can be used as-is.
What needs to be fast is insertion and deletion.
rhlist removal however requires a rhlist walk.
We can have thousands of entries in such a list if source port/addresses
are reused for multiple flows, if this happens removal requests are so
expensive that deletions of a few thousand flows can take several
seconds(!).
The advantages that we got from rhashtable are:
1) table auto-sizing
2) multiple locks
1) would be nice to have, but it is not essential as we have at
most one lookup per new flow, so even a million flows in the bysource
table are not a problem compared to current deletion cost.
2) is easy to add to custom hash table.
I tried to add hlist_node to rhlist to speed up rhltable_remove but this
isn't doable without changing semantics. rhltable_remove_fast will
check that the to-be-deleted object is part of the table and that
requires a list walk that we want to avoid.
Furthermore, using hlist_node increases size of struct rhlist_head, which
in turn increases nf_conn size.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196821
Reported-by: Ivan Babrou <ibobrik@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2628bd6fc052bd85e9864dae4de494d8a6313391 upstream.
One page may store a set of entries of the sis->swap_map
(swap_info_struct->swap_map) in multiple swap clusters.
If some of the entries has sis->swap_map[offset] > SWAP_MAP_MAX,
multiple pages will be used to store the set of entries of the
sis->swap_map. And the pages are linked with page->lru. This is called
swap count continuation. To access the pages which store the set of
entries of the sis->swap_map simultaneously, previously, sis->lock is
used. But to improve the scalability of __swap_duplicate(), swap
cluster lock may be used in swap_count_continued() now. This may race
with add_swap_count_continuation() which operates on a nearby swap
cluster, in which the sis->swap_map entries are stored in the same page.
The race can cause wrong swap count in practice, thus cause unfreeable
swap entries or software lockup, etc.
To fix the race, a new spin lock called cont_lock is added to struct
swap_info_struct to protect the swap count continuation page list. This
is a lock at the swap device level, so the scalability isn't very well.
But it is still much better than the original sis->lock, because it is
only acquired/released when swap count continuation is used. Which is
considered rare in practice. If it turns out that the scalability
becomes an issue for some workloads, we can split the lock into some
more fine grained locks.
Link: http://lkml.kernel.org/r/20171017081320.28133-1-ying.huang@intel.com
Fixes: 235b62176712 ("mm/swap: add cluster lock")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a2b4a79b88b24c49d98d45a06a014ffd22ada1a4 upstream.
The SPI_IOC_MESSAGE() macro references _IOC_SIZEBITS. Add linux/ioctl.h
to make sure this macro is defined. This fixes the following build
failure of lcdproc with the musl libc:
In file included from .../sysroot/usr/include/sys/ioctl.h:7:0,
from hd44780-spi.c:31:
hd44780-spi.c: In function 'spi_transfer':
hd44780-spi.c:89:24: error: '_IOC_SIZEBITS' undeclared (first use in this function)
status = ioctl(p->fd, SPI_IOC_MESSAGE(1), &xfer);
^
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a91d66129fb9bcead12af3ed2008d6ddbf179509 upstream.
The commit 99b5c5bb9a54 ("ALSA: hda - Remove the use of set_fs()")
converted the get_kctl_0dB_offset() call for killing set_fs() usage in
HD-audio codec code. The conversion assumed that the TLV callback
used in HD-audio code is only snd_hda_mixer_amp() and applies the TLV
calculation locally.
Although this assumption is correct, and all slave kctls are actually
with that callback, the current code is still utterly buggy; it
doesn't hit this condition and falls back to the next check. It's
because the function gets called after adding slave kctls to vmaster.
By assigning a slave kctl, the slave kctl object is faked inside
vmaster code, and the whole kctl ops are overridden. Thus the
callback op points to a different value from what we've assumed.
More badly, as reported by the KERNEXEC and UDEREF features of PaX,
the code flow turns into the unexpected pitfall. The next fallback
check is SNDRV_CTL_ELEM_ACCESS_TLV_READ access bit, and this always
hits for each kctl with TLV. Then it evaluates the callback function
pointer wrongly as if it were a TLV array. Although currently its
side-effect is fairly limited, this incorrect reference may lead to an
unpleasant result.
For addressing the regression, this patch introduces a new helper to
vmaster code, snd_ctl_apply_vmaster_slaves(). This works similarly
like the existing map_slaves() in hda_codec.c: it loops over the slave
list of the given master, and applies the given function to each
slave. Then the initializer function receives the right kctl object
and we can compare the correct pointer instead of the faked one.
Also, for catching the similar breakage in future, give an error
message when the unexpected TLV callback is found and bail out
immediately.
Fixes: 99b5c5bb9a54 ("ALSA: hda - Remove the use of set_fs()")
Reported-by: PaX Team <pageexec@freemail.hu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 363b02dab09b3226f3bd1420dad9c72b79a42a76 upstream.
Consolidate KEY_FLAG_INSTANTIATED, KEY_FLAG_NEGATIVE and the rejection
error into one field such that:
(1) The instantiation state can be modified/read atomically.
(2) The error can be accessed atomically with the state.
(3) The error isn't stored unioned with the payload pointers.
This deals with the problem that the state is spread over three different
objects (two bits and a separate variable) and reading or updating them
atomically isn't practical, given that not only can uninstantiated keys
change into instantiated or rejected keys, but rejected keys can also turn
into instantiated keys - and someone accessing the key might not be using
any locking.
The main side effect of this problem is that what was held in the payload
may change, depending on the state. For instance, you might observe the
key to be in the rejected state. You then read the cached error, but if
the key semaphore wasn't locked, the key might've become instantiated
between the two reads - and you might now have something in hand that isn't
actually an error code.
The state is now KEY_IS_UNINSTANTIATED, KEY_IS_POSITIVE or a negative error
code if the key is negatively instantiated. The key_is_instantiated()
function is replaced with key_is_positive() to avoid confusion as negative
keys are also 'instantiated'.
Additionally, barriering is included:
(1) Order payload-set before state-set during instantiation.
(2) Order state-read before payload-read when using the key.
Further separate barriering is necessary if RCU is being used to access the
payload content after reading the payload pointers.
Fixes: 146aa8b1453b ("KEYS: Merge the type-specific data with the payload data")
Reported-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2bbbd96357ce76cc45ec722c00f654aa7b189112 upstream.
At least the Armada XP SoC supports 4GB on a single DRAM window. Because
the size register values contain the actual size - 1, the MSB is set in
that case. For example, the SDRAM window's control register's value is
0xffffffe1 for 4GB (bits 31 to 24 contain the size).
The MBUS driver reads back each window's size from registers and
calculates the actual size as (control_reg | ~DDR_SIZE_MASK) + 1, which
overflows for 32 bit values, resulting in other miscalculations further
on (a bad RAM window for the CESA crypto engine calculated by
mvebu_mbus_setup_cpu_target_nooverlap() in my case).
This patch changes the type in 'struct mbus_dram_window' from u32 to
u64, which allows us to keep using the same register calculation code in
most MBUS-using drivers (which calculate ->size - 1 again).
Fixes: fddddb52a6c4 ("bus: introduce an Marvell EBU MBus driver")
Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 05d00bc94ac27d220d8a78e365d7fa3a26dcca17 upstream.
Don't need cached read index anymore now that packet iterator
is used. The iterator has the original read index until the
visible read_index is updated.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8dd45f2ab005a1f3301296059b23b03ec3dbf79b upstream.
The function hv_signal_on_read was defined in hyperv.h and
only used in one place in ring_buffer code. Clearer to just
move it inline there.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 192b2d78722ffea188e5ec6ae5d55010dce05a4b upstream.
This patch addresses the following bugs in the current rescind handling code:
1. Fixes a race condition where we may be invoking hv_process_channel_removal()
on an already freed channel.
2. Prevents indefinite wait when rescinding sub-channels by correctly setting
the probe_complete state.
I would like to thank Dexuan for patiently reviewing earlier versions of this
patch and identifying many of the issues fixed here.
Greg, please apply this to 4.14-final.
Fixes: '54a66265d675 ("Drivers: hv: vmbus: Fix rescind handling")'
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6f3d791f300618caf82a2be0c27456edd76d5164 upstream.
This patch handles the following issues that were observed when we are
handling racing channel offer message and rescind message for the same
offer:
1. Since the host does not respond to messages on a rescinded channel,
in the current code, we could be indefinitely blocked on the vmbus_open() call.
2. When a rescinded channel is being closed, if there is a pending interrupt on the
channel, we could end up freeing the channel that the interrupt handler would run on.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5803b023881857db32ffefa0d269c90280a67ee0 upstream.
The event handler in the virmidi sequencer code takes a read-lock for
the linked list traverse, while it's calling snd_seq_dump_var_event()
in the loop. The latter function may expand the user-space data
depending on the event type. It eventually invokes copy_from_user(),
which might be a potential dead-lock.
The sequencer core guarantees that the user-space data is passed only
with atomic=0 argument, but snd_virmidi_dev_receive_event() ignores it
and always takes read-lock(). For avoiding the problem above, this
patch introduces rwsem for non-atomic case, while keeping rwlock for
atomic case.
Also while we're at it: the superfluous irq flags is dropped in
snd_virmidi_input_open().
Reported-by: Jia-Ju Bai <baijiaju1990@163.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f892760aa66a2d657deaf59538fb69433036767c upstream.
When using FAT on a block device which supports rw_page, we can hit
BUG_ON(!PageLocked(page)) in try_to_free_buffers(). This is because we
call clean_buffers() after unlocking the page we've written. Introduce
a new clean_page_buffers() which cleans all buffers associated with a
page and call it from within bdev_write_page().
[akpm@linux-foundation.org: s/PAGE_SIZE/~0U/ per Linus and Matthew]
Link: http://lkml.kernel.org/r/20171006211541.GA7409@bombadil.infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reported-by: Toshi Kani <toshi.kani@hpe.com>
Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Tested-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit bc044e8db7962e727a75b591b9851ff2ac5cf846 ]
The UDP early demux can leverate the rx dst cache even for
multicast unconnected sockets.
In such scenario the ipv4 source address is validated only on
the first packet in the given flow. After that, when we fetch
the dst entry from the socket rx cache, we stop enforcing
the rp_filter and we even start accepting any kind of martian
addresses.
Disabling the dst cache for unconnected multicast socket will
cause large performace regression, nearly reducing by half the
max ingress tput.
Instead we factor out a route helper to completely validate an
skb source address for multicast packets and we call it from
the UDP early demux for mcast packets landing on unconnected
sockets, after successful fetching the related cached dst entry.
This still gives a measurable, but limited performance
regression:
rp_filter = 0 rp_filter = 1
edmux disabled: 1182 Kpps 1127 Kpps
edmux before: 2238 Kpps 2238 Kpps
edmux after: 2037 Kpps 2019 Kpps
The above figures are on top of current net tree.
Applying the net-next commit 6e617de84e87 ("net: avoid a full
fib lookup when rp_filter is disabled.") the delta with
rp_filter == 0 will decrease even more.
Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 28a0bc4120d38a394499382ba21d6965a67a3703 upstream.
SBC-4 states:
"A MAXIMUM UNMAP LBA COUNT field set to a non-zero value indicates the
maximum number of LBAs that may be unmapped by an UNMAP command"
"A MAXIMUM WRITE SAME LENGTH field set to a non-zero value indicates
the maximum number of contiguous logical blocks that the device server
allows to be unmapped or written in a single WRITE SAME command."
Despite the spec being clear on the topic, some devices incorrectly
expect WRITE SAME commands with the UNMAP bit set to be limited to the
value reported in MAXIMUM UNMAP LBA COUNT in the Block Limits VPD.
Implement a blacklist option that can be used to accommodate devices
with this behavior.
Reported-by: Bill Kuzeja <William.Kuzeja@stratus.com>
Reported-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|