Age | Commit message (Collapse) | Author | Files | Lines |
|
Livepatch uses ftrace for redirection to new patched functions. It means
that if ftrace is disabled, all live patched functions are disabled as
well. Toggling global 'ftrace_enabled' sysctl thus affect it directly.
It is not a problem per se, because only administrator can set sysctl
values, but it still may be surprising.
Introduce PERMANENT ftrace_ops flag to amend this. If the
FTRACE_OPS_FL_PERMANENT is set on any ftrace ops, the tracing cannot be
disabled by disabling ftrace_enabled. Equally, a callback with the flag
set cannot be registered if ftrace_enabled is disabled.
Link: http://lkml.kernel.org/r/20191016113316.13415-2-mbenes@suse.cz
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Pull networking fixes from David Miller:
1) Fix free/alloc races in batmanadv, from Sven Eckelmann.
2) Several leaks and other fixes in kTLS support of mlx5 driver, from
Tariq Toukan.
3) BPF devmap_hash cost calculation can overflow on 32-bit, from Toke
Høiland-Jørgensen.
4) Add an r8152 device ID, from Kazutoshi Noguchi.
5) Missing include in ipv6's addrconf.c, from Ben Dooks.
6) Use siphash in flow dissector, from Eric Dumazet. Attackers can
easily infer the 32-bit secret otherwise etc.
7) Several netdevice nesting depth fixes from Taehee Yoo.
8) Fix several KCSAN reported errors, from Eric Dumazet. For example,
when doing lockless skb_queue_empty() checks, and accessing
sk_napi_id/sk_incoming_cpu lockless as well.
9) Fix jumbo packet handling in RXRPC, from David Howells.
10) Bump SOMAXCONN and tcp_max_syn_backlog values, from Eric Dumazet.
11) Fix DMA synchronization in gve driver, from Yangchun Fu.
12) Several bpf offload fixes, from Jakub Kicinski.
13) Fix sk_page_frag() recursion during memory reclaim, from Tejun Heo.
14) Fix ping latency during high traffic rates in hisilicon driver, from
Jiangfent Xiao.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
net: fix installing orphaned programs
net: cls_bpf: fix NULL deref on offload filter removal
selftests: bpf: Skip write only files in debugfs
selftests: net: reuseport_dualstack: fix uninitalized parameter
r8169: fix wrong PHY ID issue with RTL8168dp
net: dsa: bcm_sf2: Fix IMP setup for port different than 8
net: phylink: Fix phylink_dbg() macro
gve: Fixes DMA synchronization.
inet: stop leaking jiffies on the wire
ixgbe: Remove duplicate clear_bit() call
Documentation: networking: device drivers: Remove stray asterisks
e1000: fix memory leaks
i40e: Fix receive buffer starvation for AF_XDP
igb: Fix constant media auto sense switching when no cable is connected
net: ethernet: arc: add the missed clk_disable_unprepare
igb: Enable media autosense for the i350.
igb/igc: Don't warn on fatal read failures when the device is removed
tcp: increase tcp_max_syn_backlog max value
net: increase SOMAXCONN to 4096
netdevsim: Fix use-after-free during device dismantle
...
|
|
Pull NFS client bugfixes from Anna Schumaker:
"This contains two delegation fixes (with the RCU lock leak fix marked
for stable), and three patches to fix destroying the the sunrpc back
channel.
Stable bugfixes:
- Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
Other fixes:
- The TCP back channel mustn't disappear while requests are
outstanding
- The RDMA back channel mustn't disappear while requests are
outstanding
- Destroy the back channel when we destroy the host transport
- Don't allow a cached open with a revoked delegation"
* tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
NFSv4: Don't allow a cached open with a revoked delegation
SUNRPC: Destroy the back channel when we destroy the host transport
SUNRPC: The RDMA back channel mustn't disappear while requests are outstanding
SUNRPC: The TCP back channel mustn't disappear while requests are outstanding
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Misc fixes: an ABI fix for a reserved field, AMD IBS fixes, an Intel
uncore PMU driver fix and a header typo fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/headers: Fix spelling s/EACCESS/EACCES/, s/privilidge/privilege/
perf/x86/uncore: Fix event group support
perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h)
perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity
perf/core: Start rejecting the syscall with attr.__reserved_2 set
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Various fixes all over the map: prevent boot crashes on HyperV,
classify UEFI randomness as bootloader randomness, fix EFI boot for
the Raspberry Pi2, fix efi_test permissions, etc"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN
x86, efi: Never relocate kernel below lowest acceptable address
efi: libstub/arm: Account for firmware reserved memory at the base of RAM
efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness
efi/tpm: Return -EINVAL when determining tpm final events log size fails
efi: Make CONFIG_EFI_RCI2_TABLE selectable on x86 only
|
|
SOMAXCONN is /proc/sys/net/core/somaxconn default value.
It has been defined as 128 more than 20 years ago.
Since it caps the listen() backlog values, the very small value has
caused numerous problems over the years, and many people had
to raise it on their hosts after beeing hit by problems.
Google has been using 1024 for at least 15 years, and we increased
this to 4096 after TCP listener rework has been completed, more than
4 years ago. We got no complain of this change breaking any
legacy application.
Many applications indeed setup a TCP listener with listen(fd, -1);
meaning they let the system select the backlog.
Raising SOMAXCONN lowers chance of the port being unavailable under
even small SYNFLOOD attack, and reduces possibilities of side channel
vulnerabilities.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Yue Cao <ycao009@ucr.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The driver exposes EFI runtime services to user-space through an IOCTL
interface, calling the EFI services function pointers directly without
using the efivar API.
Disallow access to the /dev/efi_test character device when the kernel is
locked down to prevent arbitrary user-space to call EFI runtime services.
Also require CAP_SYS_ADMIN to open the chardev to prevent unprivileged
users to call the EFI runtime services, instead of just relying on the
chardev file mode bits for this.
The main user of this driver is the fwts [0] tool that already checks if
the effective user ID is 0 and fails otherwise. So this change shouldn't
cause any regression to this tool.
[0]: https://wiki.ubuntu.com/FirmwareTestSuite/Reference/uefivarinfo
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Matthew Garrett <mjg59@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191029173755.27149-7-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Currently, kernel fails to boot on some HyperV VMs when using EFI.
And it's a potential issue on all x86 platforms.
It's caused by broken kernel relocation on EFI systems, when below three
conditions are met:
1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR)
by the loader.
2. There isn't enough room to contain the kernel, starting from the
default load address (eg. something else occupied part the region).
3. In the memmap provided by EFI firmware, there is a memory region
starts below LOAD_PHYSICAL_ADDR, and suitable for containing the
kernel.
EFI stub will perform a kernel relocation when condition 1 is met. But
due to condition 2, EFI stub can't relocate kernel to the preferred
address, so it fallback to ask EFI firmware to alloc lowest usable memory
region, got the low region mentioned in condition 3, and relocated
kernel there.
It's incorrect to relocate the kernel below LOAD_PHYSICAL_ADDR. This
is the lowest acceptable kernel relocation address.
The first thing goes wrong is in arch/x86/boot/compressed/head_64.S.
Kernel decompression will force use LOAD_PHYSICAL_ADDR as the output
address if kernel is located below it. Then the relocation before
decompression, which move kernel to the end of the decompression buffer,
will overwrite other memory region, as there is no enough memory there.
To fix it, just don't let EFI stub relocate the kernel to any address
lower than lowest acceptable address.
[ ardb: introduce efi_low_alloc_above() to reduce the scope of the change ]
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191029173755.27149-6-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"A few fixes to the dmaengine drivers:
- fix in sprd driver for link list and potential memory leak
- tegra transfer failure fix
- imx size check fix for script_number
- xilinx fix for 64bit AXIDMA and control reg update
- qcom bam dma resource leak fix
- cppi slave transfer fix when idle"
* tag 'dmaengine-fix-5.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: cppi41: Fix cppi41_dma_prep_slave_sg() when idle
dmaengine: qcom: bam_dma: Fix resource leak
dmaengine: sprd: Fix the possible memory leak issue
dmaengine: xilinx_dma: Fix control reg update in vdma_channel_set_config
dmaengine: xilinx_dma: Fix 64-bit simple AXIDMA transfer
dmaengine: imx-sdma: fix size check for sdma script_number
dmaengine: tegra210-adma: fix transfer failure
dmaengine: sprd: Fix the link-list pointer register configuration issue
|
|
When we're destroying the host transport mechanism, we should ensure
that we do not leak memory by failing to release any back channel
slots that might still exist.
Reported-by: Neil Brown <neilb@suse.de>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
The union should contain the extended dest and counter list.
Remove the resevered 0x40 bits which is redundant.
This change doesn't break any functionally.
Everything works today because the code in fs_cmd.c is using
the correct structs if extended dest or the basic dest.
Fixes: 1b115498598f ("net/mlx5: Introduce extended destination fields")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
sk_page_frag() optimizes skb_frag allocations by using per-task
skb_frag cache when it knows it's the only user. The condition is
determined by seeing whether the socket allocation mask allows
blocking - if the allocation may block, it obviously owns the task's
context and ergo exclusively owns current->task_frag.
Unfortunately, this misses recursion through memory reclaim path.
Please take a look at the following backtrace.
[2] RIP: 0010:tcp_sendmsg_locked+0xccf/0xe10
...
tcp_sendmsg+0x27/0x40
sock_sendmsg+0x30/0x40
sock_xmit.isra.24+0xa1/0x170 [nbd]
nbd_send_cmd+0x1d2/0x690 [nbd]
nbd_queue_rq+0x1b5/0x3b0 [nbd]
__blk_mq_try_issue_directly+0x108/0x1b0
blk_mq_request_issue_directly+0xbd/0xe0
blk_mq_try_issue_list_directly+0x41/0xb0
blk_mq_sched_insert_requests+0xa2/0xe0
blk_mq_flush_plug_list+0x205/0x2a0
blk_flush_plug_list+0xc3/0xf0
[1] blk_finish_plug+0x21/0x2e
_xfs_buf_ioapply+0x313/0x460
__xfs_buf_submit+0x67/0x220
xfs_buf_read_map+0x113/0x1a0
xfs_trans_read_buf_map+0xbf/0x330
xfs_btree_read_buf_block.constprop.42+0x95/0xd0
xfs_btree_lookup_get_block+0x95/0x170
xfs_btree_lookup+0xcc/0x470
xfs_bmap_del_extent_real+0x254/0x9a0
__xfs_bunmapi+0x45c/0xab0
xfs_bunmapi+0x15/0x30
xfs_itruncate_extents_flags+0xca/0x250
xfs_free_eofblocks+0x181/0x1e0
xfs_fs_destroy_inode+0xa8/0x1b0
destroy_inode+0x38/0x70
dispose_list+0x35/0x50
prune_icache_sb+0x52/0x70
super_cache_scan+0x120/0x1a0
do_shrink_slab+0x120/0x290
shrink_slab+0x216/0x2b0
shrink_node+0x1b6/0x4a0
do_try_to_free_pages+0xc6/0x370
try_to_free_mem_cgroup_pages+0xe3/0x1e0
try_charge+0x29e/0x790
mem_cgroup_charge_skmem+0x6a/0x100
__sk_mem_raise_allocated+0x18e/0x390
__sk_mem_schedule+0x2a/0x40
[0] tcp_sendmsg_locked+0x8eb/0xe10
tcp_sendmsg+0x27/0x40
sock_sendmsg+0x30/0x40
___sys_sendmsg+0x26d/0x2b0
__sys_sendmsg+0x57/0xa0
do_syscall_64+0x42/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
In [0], tcp_send_msg_locked() was using current->page_frag when it
called sk_wmem_schedule(). It already calculated how many bytes can
be fit into current->page_frag. Due to memory pressure,
sk_wmem_schedule() called into memory reclaim path which called into
xfs and then IO issue path. Because the filesystem in question is
backed by nbd, the control goes back into the tcp layer - back into
tcp_sendmsg_locked().
nbd sets sk_allocation to (GFP_NOIO | __GFP_MEMALLOC) which makes
sense - it's in the process of freeing memory and wants to be able to,
e.g., drop clean pages to make forward progress. However, this
confused sk_page_frag() called from [2]. Because it only tests
whether the allocation allows blocking which it does, it now thinks
current->page_frag can be used again although it already was being
used in [0].
After [2] used current->page_frag, the offset would be increased by
the used amount. When the control returns to [0],
current->page_frag's offset is increased and the previously calculated
number of bytes now may overrun the end of allocated memory leading to
silent memory corruptions.
Fix it by adding gfpflags_normal_context() which tests sleepable &&
!reclaim and use it to determine whether to use current->task_frag.
v2: Eric didn't like gfp flags being tested twice. Introduce a new
helper gfpflags_normal_context() and combine the two tests.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some paths call skb_queue_empty() without holding
the queue lock. We must use a barrier in order
to not let the compiler do strange things, and avoid
KCSAN splats.
Adding a barrier in skb_queue_empty() might be overkill,
I prefer adding a new helper to clearly identify
points where the callers might be lockless. This might
help us finding real bugs.
The corresponding WRITE_ONCE() should add zero cost
for current compilers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull virtio fixes from Michael Tsirkin:
"Some minor fixes"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vringh: fix copy direction of vringh_iov_push_kern()
vsock/virtio: remove unused 'work' field from 'struct virtio_vsock_pkt'
virtio_ring: fix stalls for packed rings
|
|
As per POSIX, the correct spelling of the error code is EACCES:
include/uapi/asm-generic/errno-base.h:#define EACCES 13 /* Permission denied */
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Kosina <trivial@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lkml.kernel.org/r/20191024122904.12463-1-geert+renesas@glider.be
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The 'work' field was introduced with commit 06a8fc78367d0
("VSOCK: Introduce virtio_vsock_common.ko")
but it is never used in the code, so we can remove it to save
memory allocated in the per-packet 'struct virtio_vsock_pkt'
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2019-10-27
The following pull-request contains BPF updates for your *net* tree.
We've added 7 non-merge commits during the last 11 day(s) which contain
a total of 7 files changed, 66 insertions(+), 16 deletions(-).
The main changes are:
1) Fix two use-after-free bugs in relation to RCU in jited symbol exposure to
kallsyms, from Daniel Borkmann.
2) Fix NULL pointer dereference in AF_XDP rx-only sockets, from Magnus Karlsson.
3) Fix hang in netdev unregister for hash based devmap as well as another overflow
bug on 32 bit archs in memlock cost calculation, from Toke Høiland-Jørgensen.
4) Fix wrong memory access in LWT BPF programs on reroute due to invalid dst.
Also fix BPF selftests to use more compatible nc options, from Jiri Benc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is a single sysfs fix for 5.4-rc5.
It resolves an error if you actually try to use the __BIN_ATTR_WO()
macro, seems I never tested it properly before :(
This has been in linux-next for a while with no reported issues"
* tag 'driver-core-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
sysfs: Fixes __BIN_ATTR_WO() macro
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull modules fixes from Jessica Yu:
- Revert __ksymtab_$namespace.$symbol naming scheme back to
__ksymtab_$symbol, as it was causing issues with depmod.
Instead, have modpost extract a symbol's namespace from __kstrtabns
and __ksymtab_strings.
- Fix `make nsdeps` for out of tree kernel builds (make O=...) caused
by unescaped '/'.
Use a different sed delimiter to avoid this problem.
* tag 'modules-for-v5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
scripts/nsdeps: use alternative sed delimiter
symbol namespaces: revert to previous __ksymtab name scheme
modpost: make updating the symbol namespace explicit
modpost: delegate updating namespaces to separate function
|
|
This patch removes variables and callback these are related to the nested
device structure.
devices that can be nested have their own nest_level variable that
represents the depth of nested devices.
In the previous patch, new {lower/upper}_level variables are added and
they replace old private nest_level variable.
So, this patch removes all 'nest_level' variables.
In order to avoid lockdep warning, ->ndo_get_lock_subclass() was added
to get lockdep subclass value, which is actually lower nested depth value.
But now, they use the dynamic lockdep key to avoid lockdep warning instead
of the subclass.
So, this patch removes ->ndo_get_lock_subclass() callback.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to link an adjacent node, netdev_upper_dev_link() is used
and in order to unlink an adjacent node, netdev_upper_dev_unlink() is used.
unlink operation does not fail, but link operation can fail.
In order to exchange adjacent nodes, we should unlink an old adjacent
node first. then, link a new adjacent node.
If link operation is failed, we should link an old adjacent node again.
But this link operation can fail too.
It eventually breaks the adjacent link relationship.
This patch adds an ignore flag into the netdev_adjacent structure.
If this flag is set, netdev_upper_dev_link() ignores an old adjacent
node for a moment.
This patch also adds new functions for other modules.
netdev_adjacent_change_prepare()
netdev_adjacent_change_commit()
netdev_adjacent_change_abort()
netdev_adjacent_change_prepare() inserts new device into adjacent list
but new device is not allowed to use immediately.
If netdev_adjacent_change_prepare() fails, it internally rollbacks
adjacent list so that we don't need any other action.
netdev_adjacent_change_commit() deletes old device in the adjacent list
and allows new device to use.
netdev_adjacent_change_abort() rollbacks adjacent list.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
team interface could be nested and it's lock variable could be nested too.
But this lock uses static lockdep key and there is no nested locking
handling code such as mutex_lock_nested() and so on.
so the Lockdep would warn about the circular locking scenario that
couldn't happen.
In order to fix, this patch makes the team module to use dynamic lock key
instead of static key.
Test commands:
ip link add team0 type team
ip link add team1 type team
ip link set team0 master team1
ip link set team0 nomaster
ip link set team1 master team0
ip link set team1 nomaster
Splat that looks like:
[ 40.364352] WARNING: possible recursive locking detected
[ 40.364964] 5.4.0-rc3+ #96 Not tainted
[ 40.365405] --------------------------------------------
[ 40.365973] ip/750 is trying to acquire lock:
[ 40.366542] ffff888060b34c40 (&team->lock){+.+.}, at: team_set_mac_address+0x151/0x290 [team]
[ 40.367689]
but task is already holding lock:
[ 40.368729] ffff888051201c40 (&team->lock){+.+.}, at: team_del_slave+0x29/0x60 [team]
[ 40.370280]
other info that might help us debug this:
[ 40.371159] Possible unsafe locking scenario:
[ 40.371942] CPU0
[ 40.372338] ----
[ 40.372673] lock(&team->lock);
[ 40.373115] lock(&team->lock);
[ 40.373549]
*** DEADLOCK ***
[ 40.374432] May be due to missing lock nesting notation
[ 40.375338] 2 locks held by ip/750:
[ 40.375851] #0: ffffffffabcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0
[ 40.376927] #1: ffff888051201c40 (&team->lock){+.+.}, at: team_del_slave+0x29/0x60 [team]
[ 40.377989]
stack backtrace:
[ 40.378650] CPU: 0 PID: 750 Comm: ip Not tainted 5.4.0-rc3+ #96
[ 40.379368] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 40.380574] Call Trace:
[ 40.381208] dump_stack+0x7c/0xbb
[ 40.381959] __lock_acquire+0x269d/0x3de0
[ 40.382817] ? register_lock_class+0x14d0/0x14d0
[ 40.383784] ? check_chain_key+0x236/0x5d0
[ 40.384518] lock_acquire+0x164/0x3b0
[ 40.385074] ? team_set_mac_address+0x151/0x290 [team]
[ 40.385805] __mutex_lock+0x14d/0x14c0
[ 40.386371] ? team_set_mac_address+0x151/0x290 [team]
[ 40.387038] ? team_set_mac_address+0x151/0x290 [team]
[ 40.387632] ? mutex_lock_io_nested+0x1380/0x1380
[ 40.388245] ? team_del_slave+0x60/0x60 [team]
[ 40.388752] ? rcu_read_lock_sched_held+0x90/0xc0
[ 40.389304] ? rcu_read_lock_bh_held+0xa0/0xa0
[ 40.389819] ? lock_acquire+0x164/0x3b0
[ 40.390285] ? lockdep_rtnl_is_held+0x16/0x20
[ 40.390797] ? team_port_get_rtnl+0x90/0xe0 [team]
[ 40.391353] ? __module_text_address+0x13/0x140
[ 40.391886] ? team_set_mac_address+0x151/0x290 [team]
[ 40.392547] team_set_mac_address+0x151/0x290 [team]
[ 40.393111] dev_set_mac_address+0x1f0/0x3f0
[ ... ]
Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some interface types could be nested.
(VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..)
These interface types should set lockdep class because, without lockdep
class key, lockdep always warn about unexisting circular locking.
In the current code, these interfaces have their own lockdep class keys and
these manage itself. So that there are so many duplicate code around the
/driver/net and /net/.
This patch adds new generic lockdep keys and some helper functions for it.
This patch does below changes.
a) Add lockdep class keys in struct net_device
- qdisc_running, xmit, addr_list, qdisc_busylock
- these keys are used as dynamic lockdep key.
b) When net_device is being allocated, lockdep keys are registered.
- alloc_netdev_mqs()
c) When net_device is being free'd llockdep keys are unregistered.
- free_netdev()
d) Add generic lockdep key helper function
- netdev_register_lockdep_key()
- netdev_unregister_lockdep_key()
- netdev_update_lockdep_key()
e) Remove unnecessary generic lockdep macro and functions
f) Remove unnecessary lockdep code of each interfaces.
After this patch, each interface modules don't need to maintain
their lockdep keys.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current code doesn't limit the number of nested devices.
Nested devices would be handled recursively and this needs huge stack
memory. So, unlimited nested devices could make stack overflow.
This patch adds upper_level and lower_level, they are common variables
and represent maximum lower/upper depth.
When upper/lower device is attached or dettached,
{lower/upper}_level are updated. and if maximum depth is bigger than 8,
attach routine fails and returns -EMLINK.
In addition, this patch converts recursive routine of
netdev_walk_all_{lower/upper} to iterator routine.
Test commands:
ip link add dummy0 type dummy
ip link add link dummy0 name vlan1 type vlan id 1
ip link set vlan1 up
for i in {2..55}
do
let A=$i-1
ip link add vlan$i link vlan$A type vlan id $i
done
ip link del dummy0
Splat looks like:
[ 155.513226][ T908] BUG: KASAN: use-after-free in __unwind_start+0x71/0x850
[ 155.514162][ T908] Write of size 88 at addr ffff8880608a6cc0 by task ip/908
[ 155.515048][ T908]
[ 155.515333][ T908] CPU: 0 PID: 908 Comm: ip Not tainted 5.4.0-rc3+ #96
[ 155.516147][ T908] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 155.517233][ T908] Call Trace:
[ 155.517627][ T908]
[ 155.517918][ T908] Allocated by task 0:
[ 155.518412][ T908] (stack is not available)
[ 155.518955][ T908]
[ 155.519228][ T908] Freed by task 0:
[ 155.519885][ T908] (stack is not available)
[ 155.520452][ T908]
[ 155.520729][ T908] The buggy address belongs to the object at ffff8880608a6ac0
[ 155.520729][ T908] which belongs to the cache names_cache of size 4096
[ 155.522387][ T908] The buggy address is located 512 bytes inside of
[ 155.522387][ T908] 4096-byte region [ffff8880608a6ac0, ffff8880608a7ac0)
[ 155.523920][ T908] The buggy address belongs to the page:
[ 155.524552][ T908] page:ffffea0001822800 refcount:1 mapcount:0 mapping:ffff88806c657cc0 index:0x0 compound_mapcount:0
[ 155.525836][ T908] flags: 0x100000000010200(slab|head)
[ 155.526445][ T908] raw: 0100000000010200 ffffea0001813808 ffffea0001a26c08 ffff88806c657cc0
[ 155.527424][ T908] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 155.528429][ T908] page dumped because: kasan: bad access detected
[ 155.529158][ T908]
[ 155.529410][ T908] Memory state around the buggy address:
[ 155.530060][ T908] ffff8880608a6b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 155.530971][ T908] ffff8880608a6c00: fb fb fb fb fb f1 f1 f1 f1 00 f2 f2 f2 f3 f3 f3
[ 155.531889][ T908] >ffff8880608a6c80: f3 fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 155.532806][ T908] ^
[ 155.533509][ T908] ffff8880608a6d00: fb fb fb fb fb fb fb fb fb f1 f1 f1 f1 00 00 00
[ 155.534436][ T908] ffff8880608a6d80: f2 f3 f3 f3 f3 fb fb fb 00 00 00 00 00 00 00 00
[ ... ]
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
UDP IPv6 packets auto flowlabels are using a 32bit secret
(static u32 hashrnd in net/core/flow_dissector.c) and
apply jhash() over fields known by the receivers.
Attackers can easily infer the 32bit secret and use this information
to identify a device and/or user, since this 32bit secret is only
set at boot time.
Really, using jhash() to generate cookies sent on the wire
is a serious security concern.
Trying to change the rol32(hash, 16) in ip6_make_flowlabel() would be
a dead end. Trying to periodically change the secret (like in sch_sfq.c)
could change paths taken in the network for long lived flows.
Let's switch to siphash, as we did in commit df453700e8d8
("inet: switch IP ID generator to siphash")
Using a cryptographically strong pseudo random function will solve this
privacy issue and more generally remove other weak points in the stack.
Packet schedulers using skb_get_hash_perturb() benefit from this change.
Fixes: b56774163f99 ("ipv6: Enable auto flow labels by default")
Fixes: 42240901f7c4 ("ipv6: Implement different admin modes for automatic flow labels")
Fixes: 67800f9b1f4e ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel")
Fixes: cb1ce2ef387b ("ipv6: Implement automatic flow label generation on transmit")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jonathan Berger <jonathann1@walla.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ionic driver started using dymamic_hex_dump(), but
that is not always defined:
drivers/net/ethernet/pensando/ionic/ionic_main.c:229:2: error: implicit declaration of function 'dynamic_hex_dump' [-Werror,-Wimplicit-function-declaration]
Add a dummy implementation to use when CONFIG_DYNAMIC_DEBUG
is disabled, printing nothing.
Fixes: 938962d55229 ("ionic: Add adminq action")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
syzkaller managed to trigger the following crash:
[...]
BUG: unable to handle page fault for address: ffffc90001923030
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD aa551067 P4D aa551067 PUD aa552067 PMD a572b067 PTE 80000000a1173163
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 7982 Comm: syz-executor912 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:bpf_jit_binary_hdr include/linux/filter.h:787 [inline]
RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:531 [inline]
RIP: 0010:bpf_tree_comp kernel/bpf/core.c:600 [inline]
RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
RIP: 0010:bpf_prog_kallsyms_find kernel/bpf/core.c:674 [inline]
RIP: 0010:is_bpf_text_address+0x184/0x3b0 kernel/bpf/core.c:709
[...]
Call Trace:
kernel_text_address kernel/extable.c:147 [inline]
__kernel_text_address+0x9a/0x110 kernel/extable.c:102
unwind_get_return_address+0x4c/0x90 arch/x86/kernel/unwind_frame.c:19
arch_stack_walk+0x98/0xe0 arch/x86/kernel/stacktrace.c:26
stack_trace_save+0xb6/0x150 kernel/stacktrace.c:123
save_stack mm/kasan/common.c:69 [inline]
set_track mm/kasan/common.c:77 [inline]
__kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:510
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:518
slab_post_alloc_hook mm/slab.h:584 [inline]
slab_alloc mm/slab.c:3319 [inline]
kmem_cache_alloc+0x1f5/0x2e0 mm/slab.c:3483
getname_flags+0xba/0x640 fs/namei.c:138
getname+0x19/0x20 fs/namei.c:209
do_sys_open+0x261/0x560 fs/open.c:1091
__do_sys_open fs/open.c:1115 [inline]
__se_sys_open fs/open.c:1110 [inline]
__x64_sys_open+0x87/0x90 fs/open.c:1110
do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
[...]
After further debugging it turns out that we walk kallsyms while in parallel
we tear down a BPF program which contains subprograms that have been JITed
though the program itself has not been fully exposed and is eventually bailing
out with error.
The bpf_prog_kallsyms_del_subprogs() in bpf_prog_load()'s error path removes
the symbols, however, bpf_prog_free() tears down the JIT memory too early via
scheduled work. Instead, it needs to properly respect RCU grace period as the
kallsyms walk for BPF is under RCU.
Fix it by refactoring __bpf_prog_put()'s tear down and reuse it in our error
path where we defer final destruction when we have subprogs in the program.
Fixes: 7d1982b4e335 ("bpf: fix panic in prog load calls cleanup")
Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
Reported-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com
Link: https://lore.kernel.org/bpf/55f6367324c2d7e9583fa9ccf5385dcbba0d7a6e.1571752452.git.daniel@iogearbox.net
|
|
There are no more active users of DEV_PM_QOS_MIN_FREQUENCY and
DEV_PM_QOS_MAX_FREQUENCY device PM QoS request types, so drop them
along with the code supporting them.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Replace the CPU device PM QoS used for the management of min and max
frequency constraints in cpufreq (and its users) with per-policy
frequency QoS to avoid problems with cpufreq policies covering
more then one CPU.
Namely, a cpufreq driver is registered with the subsys interface
which calls cpufreq_add_dev() for each CPU, starting from CPU0, so
currently the PM QoS notifiers are added to the first CPU in the
policy (i.e. CPU0 in the majority of cases).
In turn, when the cpufreq driver is unregistered, the subsys interface
doing that calls cpufreq_remove_dev() for each CPU, starting from CPU0,
and the PM QoS notifiers are only removed when cpufreq_remove_dev() is
called for the last CPU in the policy, say CPUx, which as a rule is
not CPU0 if the policy covers more than one CPU. Then, the PM QoS
notifiers cannot be removed, because CPUx does not have them, and
they are still there in the device PM QoS notifiers list of CPU0,
which prevents new PM QoS notifiers from being registered for CPU0
on the next attempt to register the cpufreq driver.
The same issue occurs when the first CPU in the policy goes offline
before unregistering the driver.
After this change it does not matter which CPU is the policy CPU at
the driver registration time and whether or not it is online all the
time, because the frequency QoS is per policy and not per CPU.
Fixes: 67d874c3b2c6 ("cpufreq: Register notifiers with the PM QoS framework")
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Reported-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Diagnosed-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/linux-pm/5ad2624194baa2f53acc1f1e627eb7684c577a19.1562210705.git.viresh.kumar@linaro.org/T/#md2d89e95906b8c91c15f582146173dce2e86e99f
Link: https://lore.kernel.org/linux-pm/20191017094612.6tbkwoq4harsjcqv@vireshk-i7/T/#m30d48cc23b9a80467fbaa16e30f90b3828a5a29b
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Introduce frequency QoS, based on the "raw" low-level PM QoS, to
represent min and max frequency requests and aggregate constraints.
The min and max frequency requests are to be represented by
struct freq_qos_request objects and the aggregate constraints are to
be represented by struct freq_constraints objects. The latter are
expected to be initialized with the help of freq_constraints_init().
The freq_qos_read_value() helper is defined to retrieve the aggregate
constraints values from a given struct freq_constraints object and
there are the freq_qos_add_request(), freq_qos_update_request() and
freq_qos_remove_request() helpers to manipulate the min and max
frequency requests. It is assumed that the the helpers will not
run concurrently with each other for the same struct freq_qos_request
object, so if that may be the case, their uses must ensure proper
synchronization between them (e.g. through locking).
In addition, freq_qos_add_notifier() and freq_qos_remove_notifier()
are provided to add and remove notifiers that will trigger on aggregate
constraint changes to and from a given struct freq_constraints object,
respectively.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Pull networking fixes from David Miller:
"I was battling a cold after some recent trips, so quite a bit piled up
meanwhile, sorry about that.
Highlights:
1) Fix fd leak in various bpf selftests, from Brian Vazquez.
2) Fix crash in xsk when device doesn't support some methods, from
Magnus Karlsson.
3) Fix various leaks and use-after-free in rxrpc, from David Howells.
4) Fix several SKB leaks due to confusion of who owns an SKB and who
should release it in the llc code. From Eric Biggers.
5) Kill a bunc of KCSAN warnings in TCP, from Eric Dumazet.
6) Jumbo packets don't work after resume on r8169, as the BIOS resets
the chip into non-jumbo mode during suspend. From Heiner Kallweit.
7) Corrupt L2 header during MPLS push, from Davide Caratti.
8) Prevent possible infinite loop in tc_ctl_action, from Eric
Dumazet.
9) Get register bits right in bcmgenet driver, based upon chip
version. From Florian Fainelli.
10) Fix mutex problems in microchip DSA driver, from Marek Vasut.
11) Cure race between route lookup and invalidation in ipv4, from Wei
Wang.
12) Fix performance regression due to false sharing in 'net'
structure, from Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (145 commits)
net: reorder 'struct net' fields to avoid false sharing
net: dsa: fix switch tree list
net: ethernet: dwmac-sun8i: show message only when switching to promisc
net: aquantia: add an error handling in aq_nic_set_multicast_list
net: netem: correct the parent's backlog when corrupted packet was dropped
net: netem: fix error path for corrupted GSO frames
macb: propagate errors when getting optional clocks
xen/netback: fix error path of xenvif_connect_data()
net: hns3: fix mis-counting IRQ vector numbers issue
net: usb: lan78xx: Connect PHY before registering MAC
vsock/virtio: discard packets if credit is not respected
vsock/virtio: send a credit update when buffer size is changed
mlxsw: spectrum_trap: Push Ethernet header before reporting trap
net: ensure correct skb->tstamp in various fragmenters
net: bcmgenet: reset 40nm EPHY on energy detect
net: bcmgenet: soft reset 40nm EPHYs before MAC init
net: phy: bcm7xxx: define soft_reset for 40nm EPHY
net: bcmgenet: don't set phydev->link from MAC
net: Update address for MediaTek ethernet driver in MAINTAINERS
ipv4: fix race condition between route lookup and invalidation
...
|
|
The introduction of Symbol Namespaces changed the naming schema of the
__ksymtab entries from __kysmtab__symbol to __ksymtab_NAMESPACE.symbol.
That caused some breakages in tools that depend on the name layout in
either the binaries(vmlinux,*.ko) or in System.map. E.g. kmod's depmod
would not be able to read System.map without a patch to support symbol
namespaces. A warning reported by depmod for namespaced symbols would
look like
depmod: WARNING: [...]/uas.ko needs unknown symbol usb_stor_adjust_quirks
In order to address this issue, revert to the original naming scheme and
rather read the __kstrtabns_<symbol> entries and their corresponding
values from __ksymtab_strings to update the namespace values for
symbols. After having read all symbols and handled them in
handle_modversions(), the symbols are created. In a second pass, read
the __kstrtabns_ entries and update the namespaces accordingly.
Fixes: 8651ec01daed ("module: add support for symbol namespaces.")
Reported-by: Stefan Wahren <stefan.wahren@i2se.com>
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The main thing here is a long-awaited workaround for a CPU erratum on
ThunderX2 which we have developed in conjunction with engineers from
Cavium/Marvell.
At the moment, the workaround is unconditionally enabled for affected
CPUs at runtime but we may add a command-line option to disable it in
future if performance numbers show up indicating a significant cost
for real workloads.
Summary:
- Work around Cavium/Marvell ThunderX2 erratum #219
- Fix regression in mlock() ABI caused by sign-extension of TTBR1 addresses
- More fixes to the spurious kernel fault detection logic
- Fix pathological preemption race when enabling some CPU features at boot
- Drop broken kcore macros in favour of generic implementations
- Fix userspace view of ID_AA64ZFR0_EL1 when SVE is disabled
- Avoid NULL dereference on allocation failure during hibernation"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: tags: Preserve tags for addresses translated via TTBR1
arm64: mm: fix inverted PAR_EL1.F check
arm64: sysreg: fix incorrect definition of SYS_PAR_EL1_F
arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabled
arm64: hibernate: check pgd table allocation
arm64: cpufeature: Treat ID_AA64ZFR0_EL1 as RAZ when SVE is not enabled
arm64: Fix kcore macros after 52-bit virtual addressing fallout
arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected
arm64: Avoid Cavium TX2 erratum 219 when switching TTBR
arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set
|
|
The KSZ8795 PHY ID is in fact used by KSZ8794/KSZ8795/KSZ8765 switches.
Update the PHY ID and name to reflect that, as this family of switches
is commonly refered to as KSZ87xx
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: George McCollister <george.mccollister@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Sean Nyekjaer <sean.nyekjaer@prevas.dk>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Workaround for Cavium/Marvell ThunderX2 erratum #219.
* errata/tx2-219:
arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected
arm64: Avoid Cavium TX2 erratum 219 when switching TTBR
arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"The fixes pertain to a problem with initializing the Intel GPIO
irqchips when adding gpiochips.
Andy fixed it up elegantly by adding a hardware initialization
callback to the struct gpio_irq_chip so let's use this. Tested and
verified on the target hardware"
* tag 'gpio-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: lynxpoint: set default handler to be handle_bad_irq()
gpio: merrifield: Move hardware initialization to callback
gpio: lynxpoint: Move hardware initialization to callback
gpio: intel-mid: Move hardware initialization to callback
gpiolib: Initialize the hardware with a callback
gpio: merrifield: Restore use of irq_base
|
|
Preempting from IRQ-return means that the task has its PSTATE saved
on the stack, which will get restored when the task is resumed and does
the actual IRQ return.
However, enabling some CPU features requires modifying the PSTATE. This
means that, if a task was scheduled out during an IRQ-return before all
CPU features are enabled, the task might restore a PSTATE that does not
include the feature enablement changes once scheduled back in.
* Task 1:
PAN == 0 ---| |---------------
| |<- return from IRQ, PSTATE.PAN = 0
| <- IRQ |
+--------+ <- preempt() +--
^
|
reschedule Task 1, PSTATE.PAN == 1
* Init:
--------------------+------------------------
^
|
enable_cpu_features
set PSTATE.PAN on all CPUs
Worse than this, since PSTATE is untouched when task switching is done,
a task missing the new bits in PSTATE might affect another task, if both
do direct calls to schedule() (outside of IRQ/exception contexts).
Fix this by preventing preemption on IRQ-return until features are
enabled on all CPUs.
This way the only PSTATE values that are saved on the stack are from
synchronous exceptions. These are expected to be fatal this early, the
exception is BRK for WARN_ON(), but as this uses do_debug_exception()
which keeps IRQs masked, it shouldn't call schedule().
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[james: Replaced a really cool hack, with an even simpler static key in C.
expanded commit message with Julien's cover-letter ascii art]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
the following script:
# tc qdisc add dev eth0 clsact
# tc filter add dev eth0 egress protocol ip matchall \
> action mpls push protocol mpls_uc label 0x355aa bos 1
causes corruption of all IP packets transmitted by eth0. On TC egress, we
can't rely on the value of skb->mac_len, because it's 0 and a MPLS 'push'
operation will result in an overwrite of the first 4 octets in the packet
L2 header (e.g. the Destination Address if eth0 is an Ethernet); the same
error pattern is present also in the MPLS 'pop' operation. Fix this error
in act_mpls data plane, computing 'mac_len' as the difference between the
network header and the mac header (when not at TC ingress), and use it in
MPLS 'push'/'pop' core functions.
v2: unbreak 'make htmldocs' because of missing documentation of 'mac_len'
in skb_mpls_pop(), reported by kbuild test robot
CC: Lorenzo Bianconi <lorenzo@kernel.org>
Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC")
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After changing the drivers to use GPIO core to add an IRQ chip
it appears that some of them requires a hardware initialization
before adding the IRQ chip.
Add an optional callback ->init_hw() to allow that drivers
to initialize hardware if needed.
This change is a part of the fix NULL pointer dereference
brought to the several drivers recently.
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Fix (Sphinx) kernel-doc warning in <linux/xarray.h>:
include/linux/xarray.h:232: WARNING: Unexpected indentation.
Link: http://lkml.kernel.org/r/89ba2134-ce23-7c10-5ee1-ef83b35aa984@infradead.org
Fixes: a3e4d3f97ec8 ("XArray: Redesign xa_alloc API")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix kernel-doc warning in <linux/bitmap.h>:
include/linux/bitmap.h:341: warning: Function parameter or member 'nbits' not described in 'bitmap_or_equal'
Also fix small typo (bitnaps).
Link: http://lkml.kernel.org/r/0729ea7a-2c0d-b2c5-7dd3-3629ee0803e2@infradead.org
Fixes: b9fa6442f704 ("cpumask: Implement cpumask_or_equal()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 37389167a281 ("mm, page_owner: keep owner info when freeing the
page") has introduced a flag PAGE_EXT_OWNER_ACTIVE to indicate that page
is tracked as being allocated. Kirril suggested naming it
PAGE_EXT_OWNER_ALLOCATED to make it more clear, as "active is somewhat
loaded term for a page".
Link: http://lkml.kernel.org/r/20190930122916.14969-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Walter Wu <walter-zh.wu@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "followups to debug_pagealloc improvements through
page_owner", v3.
These are followups to [1] which made it to Linus meanwhile. Patches 1
and 3 are based on Kirill's review, patch 2 on KASAN request [2]. It
would be nice if all of this made it to 5.4 with [1] already there (or
at least Patch 1).
This patch (of 3):
As noted by Kirill, commit 7e2f2a0cd17c ("mm, page_owner: record page
owner for each subpage") has introduced an off-by-one error in
__set_page_owner_handle() when looking up page_ext for subpages. As a
result, the head page page_owner info is set twice, while for the last
tail page, it's not set at all.
Fix this and also make the code more efficient by advancing the page_ext
pointer we already have, instead of calling lookup_page_ext() for each
subpage. Since the full size of struct page_ext is not known at compile
time, we can't use a simple page_ext++ statement, so introduce a
page_ext_next() inline function for that.
Link: http://lkml.kernel.org/r/20190930122916.14969-2-vbabka@suse.cz
Fixes: 7e2f2a0cd17c ("mm, page_owner: record page owner for each subpage")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Reported-by: Miles Chen <miles.chen@mediatek.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Walter Wu <walter-zh.wu@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Illegal memory will be touch if SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V3
(41) exceed the size of structure sdma_script_start_addrs(40),
thus cause memory corrupt such as slob block header so that kernel
trap into while() loop forever in slob_free(). Please refer to below
code piece in imx-sdma.c:
for (i = 0; i < sdma->script_number; i++)
if (addr_arr[i] > 0)
saddr_arr[i] = addr_arr[i]; /* memory corrupt here */
That issue was brought by commit a572460be9cf ("dmaengine: imx-sdma: Add
support for version 3 firmware") because SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V3
(38->41 3 scripts added) not align with script number added in
sdma_script_start_addrs(2 scripts).
Fixes: a572460be9cf ("dmaengine: imx-sdma: Add support for version 3 firmware")
Cc: stable@vger.kernel
Link: https://www.spinics.net/lists/arm-kernel/msg754895.html
Signed-off-by: Robin Gong <yibin.gong@nxp.com>
Reported-by: Jurgen Lambrecht <J.Lambrecht@TELEVIC.com>
Link: https://lore.kernel.org/r/1569347584-3478-1-git-send-email-yibin.gong@nxp.com
[vkoul: update the patch title]
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
Both tcp_v4_err() and tcp_v6_err() do the following operations
while they do not own the socket lock :
fastopen = tp->fastopen_rsk;
snd_una = fastopen ? tcp_rsk(fastopen)->snt_isn : tp->snd_una;
The problem is that without appropriate barrier, the compiler
might reload tp->fastopen_rsk and trigger a NULL deref.
request sockets are protected by RCU, we can simply add
the missing annotations and barriers to solve the issue.
Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Update/fix inspur-ipsps1 and k10temp Documentation
- Fix nct7904 driver
- Fix HWMON_P_MIN_ALARM mask in hwmon core
* tag 'hwmon-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: docs: Extend inspur-ipsps1 title underline
hwmon: (nct7904) Add array fan_alarm and vsen_alarm to store the alarms in nct7904_data struct.
docs: hwmon: Include 'inspur-ipsps1.rst' into docs
hwmon: Fix HWMON_P_MIN_ALARM mask
hwmon: (k10temp) Update documentation and add temp2_input info
hwmon: (nct7904) Fix the incorrect value of vsen_mask in nct7904_data struct
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a lot of small USB driver fixes for 5.4-rc3.
syzbot has stepped up its testing of the USB driver stack, now able to
trigger fun race conditions between disconnect and probe functions.
Because of that we have a lot of fixes in here from Johan and others
fixing these reported issues that have been around since almost all
time.
We also are just deleting the rio500 driver, making all of the syzbot
bugs found in it moot as it turns out no one has been using it for
years as there is a userspace version that is being used instead.
There are also a number of other small fixes in here, all resolving
reported issues or regressions.
All have been in linux-next without any reported issues"
* tag 'usb-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (65 commits)
USB: yurex: fix NULL-derefs on disconnect
USB: iowarrior: use pr_err()
USB: iowarrior: drop redundant iowarrior mutex
USB: iowarrior: drop redundant disconnect mutex
USB: iowarrior: fix use-after-free after driver unbind
USB: iowarrior: fix use-after-free on release
USB: iowarrior: fix use-after-free on disconnect
USB: chaoskey: fix use-after-free on release
USB: adutux: fix use-after-free on release
USB: ldusb: fix NULL-derefs on driver unbind
USB: legousbtower: fix use-after-free on release
usb: cdns3: Fix for incorrect DMA mask.
usb: cdns3: fix cdns3_core_init_role()
usb: cdns3: gadget: Fix full-speed mode
USB: usb-skeleton: drop redundant in-urb check
USB: usb-skeleton: fix use-after-free after driver unbind
USB: usb-skeleton: fix NULL-deref on disconnect
usb:cdns3: Fix for CV CH9 running with g_zero driver.
usb: dwc3: Remove dev_err() on platform_get_irq() failure
usb: dwc3: Switch to platform_get_irq_byname_optional()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Misc EFI fixes all across the map: CPER error report fixes, fixes to
TPM event log parsing, fix for a kexec hang, a Sparse fix and other
fixes"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/tpm: Fix sanity check of unsigned tbl_size being less than zero
efi/x86: Do not clean dummy variable in kexec path
efi: Make unexported efi_rci2_sysfs_init() static
efi/tpm: Only set 'efi_tpm_final_log_size' after successful event log parsing
efi/tpm: Don't traverse an event log with no events
efi/tpm: Don't access event->count when it isn't mapped
efivar/ssdt: Don't iterate over EFI vars if no SSDT override was specified
efi/cper: Fix endianness of PCIe class code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"A handful of fixes: a kexec linking fix, an AMD MWAITX fix, a vmware
guest support fix when built under Clang, and new CPU model number
definitions"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add Comet Lake to the Intel CPU models header
lib/string: Make memzero_explicit() inline instead of external
x86/cpu/vmware: Use the full form of INL in VMWARE_PORT
x86/asm: Fix MWAITX C-state hint value
|
|
Pull NFS client bugfixes from Anna Schumaker:
"Stable bugfixes:
- Fix O_DIRECT accounting of number of bytes read/written # v4.1+
Other fixes:
- Fix nfsi->nrequests count error on nfs_inode_remove_request()
- Remove redundant mirror tracking in O_DIRECT
- Fix leak of clp->cl_acceptor string
- Fix race to sk_err after xs_error_report"
* tag 'nfs-for-5.4-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: fix race to sk_err after xs_error_report
NFSv4: Fix leak of clp->cl_acceptor string
NFS: Remove redundant mirror tracking in O_DIRECT
NFS: Fix O_DIRECT accounting of number of bytes read/written
nfs: Fix nfsi->nrequests count error on nfs_inode_remove_request
|