Age | Commit message (Collapse) | Author | Files | Lines |
|
In the function ravb_hwtstamp_get() in ravb_main.c with the existing
values for RAVB_RXTSTAMP_TYPE_V2_L2_EVENT (0x2) and RAVB_RXTSTAMP_TYPE_ALL
(0x6)
if (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE_V2_L2_EVENT)
config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L2_EVENT;
else if (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE_ALL)
config.rx_filter = HWTSTAMP_FILTER_ALL;
if the test on RAVB_RXTSTAMP_TYPE_ALL should be true,
it will never be reached.
This issue can be verified with 'hwtstamp_config' testing program
(tools/testing/selftests/net/hwtstamp_config.c). Setting filter type
to ALL and subsequent retrieving it gives incorrect value:
$ hwtstamp_config eth0 OFF ALL
flags = 0
tx_type = OFF
rx_filter = ALL
$ hwtstamp_config eth0
flags = 0
tx_type = OFF
rx_filter = PTP_V2_L2_EVENT
Correct this by converting if-else's to switch.
Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Link: https://lore.kernel.org/r/20201026102130.29368-1-andrew_gabbasov@mentor.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This needs to unlock before returning.
Fixes: 544e7c33ec2f ("net: devlink: Add support for port regions")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20201026080127.GB1628785@mwanda
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
These paths don't set the error codes. It's especially important in
devlink_nl_region_notify_build() where it leads to a NULL dereference in
the caller.
Fixes: 544e7c33ec2f ("net: devlink: Add support for port regions")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20201026080059.GA1628785@mwanda
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
CPL handler functions chtls_pass_open_rpl() and
chtls_close_listsrv_rpl() should return CPL_RET_BUF_DONE
so that caller function will do skb free to avoid leak.
Fixes: cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201025194228.31271-1-vinay.yadav@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In chtls_pass_establish() we hold child socket lock using bh_lock_sock
and we are again trying bh_lock_sock in add_to_reap_list, causing deadlock.
Remove bh_lock_sock in add_to_reap_list() as lock is already held.
Fixes: cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201025193538.31112-1-vinay.yadav@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When unbinding the hns3 driver with the HNS3 VF, I got the following
kernel panic:
[ 265.709989] Unable to handle kernel paging request at virtual address ffff800054627000
[ 265.717928] Mem abort info:
[ 265.720740] ESR = 0x96000047
[ 265.723810] EC = 0x25: DABT (current EL), IL = 32 bits
[ 265.729126] SET = 0, FnV = 0
[ 265.732195] EA = 0, S1PTW = 0
[ 265.735351] Data abort info:
[ 265.738227] ISV = 0, ISS = 0x00000047
[ 265.742071] CM = 0, WnR = 1
[ 265.745055] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000009b54000
[ 265.751753] [ffff800054627000] pgd=0000202ffffff003, p4d=0000202ffffff003, pud=00002020020eb003, pmd=00000020a0dfc003, pte=0000000000000000
[ 265.764314] Internal error: Oops: 96000047 [#1] SMP
[ 265.830357] CPU: 61 PID: 20319 Comm: bash Not tainted 5.9.0+ #206
[ 265.836423] Hardware name: Huawei TaiShan 2280 V2/BC82AMDDA, BIOS 1.05 09/18/2019
[ 265.843873] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
[ 265.843890] pc : hclgevf_cmd_uninit+0xbc/0x300
[ 265.861988] lr : hclgevf_cmd_uninit+0xb0/0x300
[ 265.861992] sp : ffff80004c983b50
[ 265.881411] pmr_save: 000000e0
[ 265.884453] x29: ffff80004c983b50 x28: ffff20280bbce500
[ 265.889744] x27: 0000000000000000 x26: 0000000000000000
[ 265.895034] x25: ffff800011a1f000 x24: ffff800011a1fe90
[ 265.900325] x23: ffff0020ce9b00d8 x22: ffff0020ce9b0150
[ 265.905616] x21: ffff800010d70e90 x20: ffff800010d70e90
[ 265.910906] x19: ffff0020ce9b0080 x18: 0000000000000004
[ 265.916198] x17: 0000000000000000 x16: ffff800011ae32e8
[ 265.916201] x15: 0000000000000028 x14: 0000000000000002
[ 265.916204] x13: ffff800011ae32e8 x12: 0000000000012ad8
[ 265.946619] x11: ffff80004c983b50 x10: 0000000000000000
[ 265.951911] x9 : ffff8000115d0888 x8 : 0000000000000000
[ 265.951914] x7 : ffff800011890b20 x6 : c0000000ffff7fff
[ 265.951917] x5 : ffff80004c983930 x4 : 0000000000000001
[ 265.951919] x3 : ffffa027eec1b000 x2 : 2b78ccbbff369100
[ 265.964487] x1 : 0000000000000000 x0 : ffff800054627000
[ 265.964491] Call trace:
[ 265.964494] hclgevf_cmd_uninit+0xbc/0x300
[ 265.964496] hclgevf_uninit_ae_dev+0x9c/0xe8
[ 265.964501] hnae3_unregister_ae_dev+0xb0/0x130
[ 265.964516] hns3_remove+0x34/0x88 [hns3]
[ 266.009683] pci_device_remove+0x48/0xf0
[ 266.009692] device_release_driver_internal+0x114/0x1e8
[ 266.030058] device_driver_detach+0x28/0x38
[ 266.034224] unbind_store+0xd4/0x108
[ 266.037784] drv_attr_store+0x40/0x58
[ 266.041435] sysfs_kf_write+0x54/0x80
[ 266.045081] kernfs_fop_write+0x12c/0x250
[ 266.049076] vfs_write+0xc4/0x248
[ 266.052378] ksys_write+0x74/0xf8
[ 266.055677] __arm64_sys_write+0x24/0x30
[ 266.059584] el0_svc_common.constprop.3+0x84/0x270
[ 266.064354] do_el0_svc+0x34/0xa0
[ 266.067658] el0_svc+0x38/0x40
[ 266.070700] el0_sync_handler+0x8c/0xb0
[ 266.074519] el0_sync+0x140/0x180
It looks like the BAR memory region had already been unmapped before we
start clearing CMDQ registers in it, which is pretty bad and the kernel
happily kills itself because of a Current EL Data Abort (on arm64).
Moving the CMDQ uninitialization a bit early fixes the issue for me.
Fixes: 862d969a3a4d ("net: hns3: do VF's pci re-initialization while PF doing FLR")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20201023051550.793-1-yuzenghui@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Michael Chan says:
====================
bnxt_en: Bug fixes.
These 5 bug fixes are all related to the firmware reset or AER recovery.
2 patches fix the cleanup logic for the workqueue used to handle firmware
reset and recovery. 1 patch ensures that the chip will have the proper
BAR addresses latched after fatal AER recovery. 1 patch fixes the
open path to check for firmware reset abort error. The last one
sends the fw reset command unconditionally to fix the AER reset logic.
====================
Link: https://lore.kernel.org/r/1603685901-17917-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the AER or firmware reset flow, if we are in fatal error state or
if pci_channel_offline() is true, we don't send any commands to the
firmware because the commands will likely not reach the firmware and
most commands don't matter much because the firmware is likely to be
reset imminently.
However, the HWRM_FUNC_RESET command is different and we should always
attempt to send it. In the AER flow for example, the .slot_reset()
call will trigger this fw command and we need to try to send it to
effect the proper reset.
Fixes: b340dc680ed4 ("bnxt_en: Avoid sending firmware messages when AER error is detected.")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
bnxt_open_nic() is called during configuration changes that require
the NIC to be closed and then opened. This call is protected by
rtnl_lock. Firmware reset can be happening at the same time. Only
critical portions of the entire firmware reset sequence are protected
by the rtnl_lock. It is possible that bnxt_open_nic() can be called
when the firmware reset sequence is aborting. In that case,
bnxt_open_nic() needs to check if the ABORT_ERR flag is set and
abort if it is. The configuration change that resulted in the
bnxt_open_nic() call will fail but the NIC will be brought to a
consistent IF_DOWN state.
Without this patch, if bnxt_open_nic() were to continue in this error
state, it may crash like this:
[ 1648.659736] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1648.659768] IP: [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en]
[ 1648.659796] PGD 101e1b3067 PUD 101e1b2067 PMD 0
[ 1648.659813] Oops: 0000 [#1] SMP
[ 1648.659825] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc dell_smbios dell_wmi_descriptor dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper vfat cryptd fat pcspkr ipmi_ssif sg k10temp i2c_piix4 wmi ipmi_si ipmi_devintf ipmi_msghandler tpm_crb acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm libahci megaraid_sas crct10dif_pclmul crct10dif_common
[ 1648.660063] tg3 libata crc32c_intel bnxt_en(OE) drm_panel_orientation_quirks devlink ptp pps_core dm_mirror dm_region_hash dm_log dm_mod fuse
[ 1648.660105] CPU: 13 PID: 3867 Comm: ethtool Kdump: loaded Tainted: G OE ------------ 3.10.0-1152.el7.x86_64 #1
[ 1648.660911] Hardware name: Dell Inc. PowerEdge R7515/0R4CNN, BIOS 1.2.14 01/28/2020
[ 1648.661662] task: ffff94e64cbc9080 ti: ffff94f55df1c000 task.ti: ffff94f55df1c000
[ 1648.662409] RIP: 0010:[<ffffffffc01e9b3a>] [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en]
[ 1648.663171] RSP: 0018:ffff94f55df1fba8 EFLAGS: 00010202
[ 1648.663927] RAX: 0000000000000000 RBX: ffff94e6827e0000 RCX: 0000000000000000
[ 1648.664684] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94e6827e08c0
[ 1648.665433] RBP: ffff94f55df1fc20 R08: 00000000000001ff R09: 0000000000000008
[ 1648.666184] R10: 0000000000000d53 R11: ffff94f55df1f7ce R12: ffff94e6827e08c0
[ 1648.666940] R13: ffff94e6827e08c0 R14: ffff94e6827e08c0 R15: ffffffffb9115e40
[ 1648.667695] FS: 00007f8aadba5740(0000) GS:ffff94f57eb40000(0000) knlGS:0000000000000000
[ 1648.668447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1648.669202] CR2: 0000000000000000 CR3: 0000001022772000 CR4: 0000000000340fe0
[ 1648.669966] Call Trace:
[ 1648.670730] [<ffffffffc01f1d5d>] ? bnxt_need_reserve_rings+0x9d/0x170 [bnxt_en]
[ 1648.671496] [<ffffffffc01fa7ea>] __bnxt_open_nic+0x8a/0x9a0 [bnxt_en]
[ 1648.672263] [<ffffffffc01f7479>] ? bnxt_close_nic+0x59/0x1b0 [bnxt_en]
[ 1648.673031] [<ffffffffc01fb11b>] bnxt_open_nic+0x1b/0x50 [bnxt_en]
[ 1648.673793] [<ffffffffc020037c>] bnxt_set_ringparam+0x6c/0xa0 [bnxt_en]
[ 1648.674550] [<ffffffffb8a5f564>] dev_ethtool+0x1334/0x21a0
[ 1648.675306] [<ffffffffb8a719ff>] dev_ioctl+0x1ef/0x5f0
[ 1648.676061] [<ffffffffb8a324bd>] sock_do_ioctl+0x4d/0x60
[ 1648.676810] [<ffffffffb8a326bb>] sock_ioctl+0x1eb/0x2d0
[ 1648.677548] [<ffffffffb8663230>] do_vfs_ioctl+0x3a0/0x5b0
[ 1648.678282] [<ffffffffb8b8e678>] ? __do_page_fault+0x238/0x500
[ 1648.679016] [<ffffffffb86634e1>] SyS_ioctl+0xa1/0xc0
[ 1648.679745] [<ffffffffb8b93f92>] system_call_fastpath+0x25/0x2a
[ 1648.680461] Code: 9e 60 01 00 00 0f 1f 40 00 45 8b 8e 48 01 00 00 31 c9 45 85 c9 0f 8e 73 01 00 00 66 0f 1f 44 00 00 49 8b 86 a8 00 00 00 48 63 d1 <48> 8b 14 d0 48 85 d2 0f 84 46 01 00 00 41 8b 86 44 01 00 00 c7
[ 1648.681986] RIP [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en]
[ 1648.682724] RSP <ffff94f55df1fba8>
[ 1648.683451] CR2: 0000000000000000
Fixes: ec5d31e3c15d ("bnxt_en: Handle firmware reset status during IF_UP.")
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When a PCIe fatal error occurs, the internal latched BAR addresses
in the chip get reset even though the BAR register values in config
space are retained.
pci_restore_state() will not rewrite the BAR addresses if the
BAR address values are valid, causing the chip's internal BAR addresses
to stay invalid. So we need to zero the BAR registers during PCIe fatal
error to force pci_restore_state() to restore the BAR addresses. These
write cycles to the BAR registers will cause the proper BAR addresses to
latch internally.
Fixes: 6316ea6db93d ("bnxt_en: Enable AER support.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As part of the commit b148bb238c02
("bnxt_en: Fix possible crash in bnxt_fw_reset_task()."),
cancel_delayed_work_sync() is called only for VFs to fix a possible
crash by cancelling any pending delayed work items. It was assumed
by mistake that the flush_workqueue() call on the PF would flush
delayed work items as well.
As flush_workqueue() does not cancel the delayed workqueue, extend
the fix for PFs. This fix will avoid the system crash, if there are
any pending delayed work items in fw_reset_task() during driver's
.remove() call.
Unify the workqueue cleanup logic for both PF and VF by calling
cancel_work_sync() and cancel_delayed_work_sync() directly in
bnxt_remove_one().
Fixes: b148bb238c02 ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A recent patch has moved the workqueue cleanup logic before
calling unregister_netdev() in bnxt_remove_one(). This caused a
regression because the workqueue can be restarted if the device is
still open. Workqueue cleanup must be done after unregister_netdev().
The workqueue will not restart itself after the device is closed.
Call bnxt_cancel_sp_work() after unregister_netdev() and
call bnxt_dl_fw_reporters_destroy() after that. This fixes the
regession and the original NULL ptr dereference issue.
Fixes: b16939b59cc0 ("bnxt_en: Fix NULL ptr dereference crash in bnxt_fw_reset_task()")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ido Schimmel says:
====================
mlxsw: Various fixes
This patch set contains various fixes for mlxsw.
Patch #1 ensures that only link modes that are supported by both the
device and the driver are advertised. When a link mode that is not
supported by the driver is negotiated by the device, it will be
presented as an unknown speed by ethtool, causing the bond driver to
wrongly assume that the link is down.
Patch #2 fixes a trivial memory leak upon module removal.
Patch #3 fixes a use-after-free that syzkaller was able to trigger once
on a slow emulator after a few months of fuzzing.
====================
Link: https://lore.kernel.org/r/20201024133733.2107509-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Each EMAD transaction stores the skb used to issue the EMAD request
('trans->tx_skb') so that the request could be retried in case of a
timeout. The skb can be freed when a corresponding response is received
or as part of the retry logic (e.g., failed retransmit, exceeded maximum
number of retries).
The two tasks (i.e., response processing and retransmits) are
synchronized by the atomic 'trans->active' field which ensures that
responses to inactive transactions are ignored.
In case of a failed retransmit the transaction is finished and all of
its resources are freed. However, the current code does not mark it as
inactive. Syzkaller was able to hit a race condition in which a
concurrent response is processed while the transaction's resources are
being freed, resulting in a use-after-free [1].
Fix the issue by making sure to mark the transaction as inactive after a
failed retransmit and free its resources only if a concurrent task did
not already do that.
[1]
BUG: KASAN: use-after-free in consume_skb+0x30/0x370
net/core/skbuff.c:833
Read of size 4 at addr ffff88804f570494 by task syz-executor.0/1004
CPU: 0 PID: 1004 Comm: syz-executor.0 Not tainted 5.8.0-rc7+ #68
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xf6/0x16e lib/dump_stack.c:118
print_address_description.constprop.0+0x1c/0x250
mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
check_memory_region_inline mm/kasan/generic.c:186 [inline]
check_memory_region+0x14e/0x1b0 mm/kasan/generic.c:192
instrument_atomic_read include/linux/instrumented.h:56 [inline]
atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
refcount_read include/linux/refcount.h:147 [inline]
skb_unref include/linux/skbuff.h:1044 [inline]
consume_skb+0x30/0x370 net/core/skbuff.c:833
mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592
mlxsw_emad_process_response drivers/net/ethernet/mellanox/mlxsw/core.c:651 [inline]
mlxsw_emad_rx_listener_func+0x5c9/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:672
mlxsw_core_skb_receive+0x4df/0x770 drivers/net/ethernet/mellanox/mlxsw/core.c:2063
mlxsw_pci_cqe_rdq_handle drivers/net/ethernet/mellanox/mlxsw/pci.c:595 [inline]
mlxsw_pci_cq_tasklet+0x12a6/0x2520 drivers/net/ethernet/mellanox/mlxsw/pci.c:651
tasklet_action_common.isra.0+0x13f/0x3e0 kernel/softirq.c:550
__do_softirq+0x223/0x964 kernel/softirq.c:292
asm_call_on_stack+0x12/0x20 arch/x86/entry/entry_64.S:711
Allocated by task 1006:
save_stack+0x1b/0x40 mm/kasan/common.c:48
set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc mm/kasan/common.c:494 [inline]
__kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467
slab_post_alloc_hook mm/slab.h:586 [inline]
slab_alloc_node mm/slub.c:2824 [inline]
slab_alloc mm/slub.c:2832 [inline]
kmem_cache_alloc+0xcd/0x2e0 mm/slub.c:2837
__build_skb+0x21/0x60 net/core/skbuff.c:311
__netdev_alloc_skb+0x1e2/0x360 net/core/skbuff.c:464
netdev_alloc_skb include/linux/skbuff.h:2810 [inline]
mlxsw_emad_alloc drivers/net/ethernet/mellanox/mlxsw/core.c:756 [inline]
mlxsw_emad_reg_access drivers/net/ethernet/mellanox/mlxsw/core.c:787 [inline]
mlxsw_core_reg_access_emad+0x1ab/0x1420 drivers/net/ethernet/mellanox/mlxsw/core.c:1817
mlxsw_reg_trans_query+0x39/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:1831
mlxsw_sp_sb_pm_occ_clear drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:260 [inline]
mlxsw_sp_sb_occ_max_clear+0xbff/0x10a0 drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:1365
mlxsw_devlink_sb_occ_max_clear+0x76/0xb0 drivers/net/ethernet/mellanox/mlxsw/core.c:1037
devlink_nl_cmd_sb_occ_max_clear_doit+0x1ec/0x280 net/core/devlink.c:1765
genl_family_rcv_msg_doit net/netlink/genetlink.c:669 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:714 [inline]
genl_rcv_msg+0x617/0x980 net/netlink/genetlink.c:731
netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2470
genl_rcv+0x24/0x40 net/netlink/genetlink.c:742
netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1330
netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg+0x150/0x190 net/socket.c:671
____sys_sendmsg+0x6d8/0x840 net/socket.c:2359
___sys_sendmsg+0xff/0x170 net/socket.c:2413
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2446
do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:384
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 73:
save_stack+0x1b/0x40 mm/kasan/common.c:48
set_track mm/kasan/common.c:56 [inline]
kasan_set_free_info mm/kasan/common.c:316 [inline]
__kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455
slab_free_hook mm/slub.c:1474 [inline]
slab_free_freelist_hook mm/slub.c:1507 [inline]
slab_free mm/slub.c:3072 [inline]
kmem_cache_free+0xbe/0x380 mm/slub.c:3088
kfree_skbmem net/core/skbuff.c:622 [inline]
kfree_skbmem+0xef/0x1b0 net/core/skbuff.c:616
__kfree_skb net/core/skbuff.c:679 [inline]
consume_skb net/core/skbuff.c:837 [inline]
consume_skb+0xe1/0x370 net/core/skbuff.c:831
mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592
mlxsw_emad_transmit_retry.isra.0+0x9d/0xc0 drivers/net/ethernet/mellanox/mlxsw/core.c:613
mlxsw_emad_trans_timeout_work+0x43/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:625
process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269
worker_thread+0x9e/0x1050 kernel/workqueue.c:2415
kthread+0x355/0x470 kernel/kthread.c:291
ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293
The buggy address belongs to the object at ffff88804f5703c0
which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 212 bytes inside of
224-byte region [ffff88804f5703c0, ffff88804f5704a0)
The buggy address belongs to the page:
page:ffffea00013d5c00 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0
flags: 0x100000000000200(slab)
raw: 0100000000000200 dead000000000100 dead000000000122 ffff88806c625400
raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88804f570380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ffff88804f570400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88804f570480: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff88804f570500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88804f570580: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
Fixes: caf7297e7ab5f ("mlxsw: core: Introduce support for asynchronous EMAD register access")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Free the devlink instance during the teardown sequence in the non-reload
case to avoid the following memory leak.
unreferenced object 0xffff888232895000 (size 2048):
comm "modprobe", pid 1073, jiffies 4295568857 (age 164.871s)
hex dump (first 32 bytes):
00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de ........".......
10 50 89 32 82 88 ff ff 10 50 89 32 82 88 ff ff .P.2.....P.2....
backtrace:
[<00000000c704e9a6>] __kmalloc+0x13a/0x2a0
[<00000000ee30129d>] devlink_alloc+0xff/0x760
[<0000000092ab3e5d>] 0xffffffffa042e5b0
[<000000004f3f8a31>] 0xffffffffa042f6ad
[<0000000092800b4b>] 0xffffffffa0491df3
[<00000000c4843903>] local_pci_probe+0xcb/0x170
[<000000006993ded7>] pci_device_probe+0x2c2/0x4e0
[<00000000a8e0de75>] really_probe+0x2c5/0xf90
[<00000000d42ba75d>] driver_probe_device+0x1eb/0x340
[<00000000bcc95e05>] device_driver_attach+0x294/0x300
[<000000000e2bc177>] __driver_attach+0x167/0x2f0
[<000000007d44cd6e>] bus_for_each_dev+0x148/0x1f0
[<000000003cd5a91e>] driver_attach+0x45/0x60
[<000000000041ce51>] bus_add_driver+0x3b8/0x720
[<00000000f5215476>] driver_register+0x230/0x4e0
[<00000000d79356f5>] __pci_register_driver+0x190/0x200
Fixes: a22712a96291 ("mlxsw: core: Fix devlink unregister flow")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Vadim Pasternak <vadimp@nvidia.com>
Tested-by: Oleksandr Shamray <oleksandrs@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
During port creation the driver instructs the device to advertise all
the supported link modes queried from the device.
Since cited commit not all the link modes supported by the device are
supported by the driver. This can result in the device negotiating a
link mode that is not recognized by the driver causing ethtool to show
an unsupported speed:
$ ethtool swp1
...
Speed: Unknown!
This is especially problematic when the netdev is enslaved to a bond, as
the bond driver uses unknown speed as an indication that the link is
down:
[13048.900895] net_ratelimit: 86 callbacks suppressed
[13048.900902] t_bond0: (slave swp52): failed to get link speed/duplex
[13048.912160] t_bond0: (slave swp49): failed to get link speed/duplex
Fix this by making sure that only link modes that are supported by both
the device and the driver are advertised.
Fixes: b97cd891268d ("mlxsw: Remove 56G speed support")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Karsten Graul says:
====================
net/smc: fixes 2020-10-23
Patch 1 fixes a potential null pointer dereference. Patch 2 takes care
of a suppressed return code and patch 3 corrects the system EID in the
ISM driver.
====================
Link: https://lore.kernel.org/r/20201023184830.59548-1-kgraul@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The system EID that is defined by the ISM driver is not correct. Using
an incorrect system EID allows to communicate with remote Linux systems
that use the same incorrect system EID, but when it comes to
interoperability with other operating systems then the system EIDs do
never match which prevents SMC-Dv2 communication.
Using the correct system EID fixes this problem.
Fixes: 201091ebb2a1 ("net/smc: introduce System Enterprise ID (SEID)")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The patch that repaired the invalid return code in smcd_new_buf_create()
missed to take care of errno ENOSPC which has a special meaning that no
more DMBEs can be registered on the device. Fix that by keeping this
errno value during the translation of the return code.
Fixes: 6b1bbf94ab36 ("net/smc: fix invalid return code in smcd_new_buf_create()")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
smc_listen_work() calls smc_listen_decline() on label out_decl,
providing the ini pointer variable. But this pointer can still be null
when the label out_decl is reached.
Fix this by checking the ini variable in smc_listen_work() and call
smc_listen_decline() with the result directly.
Fixes: a7c9c5f4af7f ("net/smc: CLC accept / confirm V2")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
During __vsock_create() CAP_NET_ADMIN is used to determine if the
vsock_sock->trusted should be set to true. This value is used later
for determing if a remote connection should be allowed to connect
to a restricted VM. Unfortunately, if the caller doesn't have
CAP_NET_ADMIN, an audit message such as an selinux denial is
generated even if the caller does not want a trusted socket.
Logging errors on success is confusing. To avoid this, switch the
capable(CAP_NET_ADMIN) check to the noaudit version.
Reported-by: Roman Kiryanov <rkir@google.com>
https://android-review.googlesource.com/c/device/generic/goldfish/+/1468545/
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Link: https://lore.kernel.org/r/20201023143757.377574-1-jeffv@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current code sets up the filter action field before
rewrites are set up. When the action 'switch' is used
with rewrites, this may result in initial few packets
that get switched out don't have rewrites applied
on them.
So, make sure filter action is set up along with rewrites
or only after everything else is set up for rewrites.
Fixes: 12b276fbf6e0 ("cxgb4: add support to create hash filters")
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Link: https://lore.kernel.org/r/20201023115852.18262-1-rajur@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Smatch complains that "ret" might be uninitialized if we don't enter
the loop. We do always enter the loop so it's a false positive, but
it's cleaner to just return a literal zero and that silences the
warning as well.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20201023112212.GA282278@mwanda
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With SO_RCVLOWAT, under memory pressure,
it is possible to enter a state where:
1. We have not received enough bytes to satisfy SO_RCVLOWAT.
2. We have not entered buffer pressure (see tcp_rmem_pressure()).
3. But, we do not have enough buffer space to accept more packets.
In this case, we advertise 0 rwnd (due to #3) but the application does
not drain the receive queue (no wakeup because of #1 and #2) so the
flow stalls.
Modify the heuristic for SO_RCVLOWAT so that, if we are advertising
rwnd<=rcv_mss, force a wakeup to prevent a stall.
Without this patch, setting tcp_rmem to 6143 and disabling TCP
autotune causes a stalled flow. With this patch, no stall occurs. This
is with RPC-style traffic with large messages.
Fixes: 03f45c883c6f ("tcp: avoid extra wakeups for SO_RCVLOWAT users")
Signed-off-by: Arjun Roy <arjunroy@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201023184709.217614-1-arjunroy.kdev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Clang warns about the extra parentheses in this comparison:
drivers/net/ethernet/freescale/ucc_geth.c:1361:28:
warning: equality comparison with extraneous parentheses
if ((ugeth->phy_interface == PHY_INTERFACE_MODE_SGMII))
~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
It seems clear the intent here is to do a comparison not an
assignment, so drop the extra parentheses to avoid any confusion.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201023033236.3296988-1-mpe@ellerman.id.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Shannon Nelson says:
====================
ionic: memory usage fixes
This patchset addresses some memory leaks and incorrect
io reads.
====================
Link: https://lore.kernel.org/r/20201022235531.65956-1-snelson@pensando.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The sentinel descriptor entry was getting missed in the
traverse of the ring from head to tail, so change to a
loop of 0 to the end.
Fixes: f1d2e894f1b7 ("ionic: use index not pointer for queue tracking")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Kmemleak pointed out to us that ionic_rx_flush() is sending
skbs into napi_gro_XXX with a disabled napi context, and these
end up getting lost and leaked. We can safely remove the flush.
Fixes: 0f3154e6bcb3 ("ionic: Add Tx and Rx handling")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The sparse complaints around the static_asserts were obscuring
more useful complaints. So, don't check the static_asserts,
and fix the remaining sparse complaints.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
chtls_pt_recvmsg() receives a skb with tls header and subsequent
skb with data, need to finalize the data copy whenever next skb
with tls header is available. but here current tls header is
overwritten by next available tls header, ends up corrupting
user buffer data. fixing it by finalizing current record whenever
next skb contains tls header.
v1->v2:
- Improved commit message.
Fixes: 17a7d24aa89d ("crypto: chtls - generic handling of data and hdr")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201022190556.21308-1-vinay.yadav@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
IPA transactions describe actions to be performed by the IPA
hardware. Three cases use IPA transactions: transmitting a socket
buffer; providing a page to receive packet data; and issuing an IPA
immediate command. An IPA transaction contains a scatter/gather
list (SGL) to hold the set of actions to be performed.
We map buffers in the SGL for DMA at the time they are added to the
transaction. For skb TX transactions, we fill the SGL with a call
to skb_to_sgvec(). Page RX transactions involve a single page
pointer, and that is recorded in the SGL with sg_set_page(). In
both of these cases we then map the SGL for DMA with a call to
dma_map_sg().
Immediate commands are different. The payload for an immediate
command comes from a region of coherent DMA memory, which must
*not* be mapped for DMA. For that reason, gsi_trans_cmd_add()
sort of hand-crafts each SGL entry added to a command transaction.
This patch fixes a problem with the code that crafts the SGL entry
for an immediate command. Previously a portion of the SGL entry was
updated using sg_set_buf(). However this is not valid because it
includes a call to virt_to_page() on the buffer, but the command
buffer pointer is not a linear address.
Since we never actually map the SGL for command transactions, there
are very few fields in the SGL we need to fill. Specifically, we
only need to record the DMA address and the length, so they can be
used by __gsi_trans_commit() to fill a TRE. We additionally need to
preserve the SGL flags so for_each_sg() still works. For that we
can simply assign a null page pointer for command SGL entries.
Fixes: 9dd441e4ed575 ("soc: qcom: ipa: GSI transactions")
Reported-by: Stephen Boyd <swboyd@chromium.org>
Tested-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20201022010029.11877-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Cross-tree/merge window issues:
- rtl8150: don't incorrectly assign random MAC addresses; fix late in
the 5.9 cycle started depending on a return code from a function
which changed with the 5.10 PR from the usb subsystem
Current release regressions:
- Revert "virtio-net: ethtool configurable RXCSUM", it was causing
crashes at probe when control vq was not negotiated/available
Previous release regressions:
- ixgbe: fix probing of multi-port 10 Gigabit Intel NICs with an MDIO
bus, only first device would be probed correctly
- nexthop: Fix performance regression in nexthop deletion by
effectively switching from recently added synchronize_rcu() to
synchronize_rcu_expedited()
- netsec: ignore 'phy-mode' device property on ACPI systems; the
property is not populated correctly by the firmware, but firmware
configures the PHY so just keep boot settings
Previous releases - always broken:
- tcp: fix to update snd_wl1 in bulk receiver fast path, addressing
bulk transfers getting "stuck"
- icmp: randomize the global rate limiter to prevent attackers from
getting useful signal
- r8169: fix operation under forced interrupt threading, make the
driver always use hard irqs, even on RT, given the handler is light
and only wants to schedule napi (and do so through a _irqoff()
variant, preferably)
- bpf: Enforce pointer id generation for all may-be-null register
type to avoid pointers erroneously getting marked as null-checked
- tipc: re-configure queue limit for broadcast link
- net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN
tunnels
- fix various issues in chelsio inline tls driver
Misc:
- bpf: improve just-added bpf_redirect_neigh() helper api to support
supplying nexthop by the caller - in case BPF program has already
done a lookup we can avoid doing another one
- remove unnecessary break statements
- make MCTCP not select IPV6, but rather depend on it"
* tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
tcp: fix to update snd_wl1 in bulk receiver fast path
net: Properly typecast int values to set sk_max_pacing_rate
netfilter: nf_fwd_netdev: clear timestamp in forwarding path
ibmvnic: save changed mac address to adapter->mac_addr
selftests: mptcp: depends on built-in IPv6
Revert "virtio-net: ethtool configurable RXCSUM"
rtnetlink: fix data overflow in rtnl_calcit()
net: ethernet: mtk-star-emac: select REGMAP_MMIO
net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup
net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device
bpf, libbpf: Guard bpf inline asm from bpf_tail_call_static
bpf, selftests: Extend test_tc_redirect to use modified bpf_redirect_neigh()
bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop
mptcp: depends on IPV6 but not as a module
sfc: move initialisation of efx->filter_sem to efx_init_struct()
mpls: load mpls_gso after mpls_iptunnel
net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels
net/sched: act_gate: Unlock ->tcfa_lock in tc_setup_flow_action()
net: dsa: bcm_sf2: make const array static, makes object smaller
mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Use iomap for non-journaled buffered I/O. This largely eliminates
buffer heads on filesystems where the block size matches the page
size. Many thanks to Christoph Hellwig for this patch!
- Fixes for some more journaled data filesystem bugs, found by running
xfstests with data journaling on for all files (chattr +j $MNT) (Bob
Peterson)
- gfs2_evict_inode refactoring (Bob Peterson)
- Use the statfs data in the journal during recovery instead of reading
it in from the local statfs inodes (Abhi Das)
- Several other minor fixes by various people
* tag 'gfs2-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (30 commits)
gfs2: Recover statfs info in journal head
gfs2: lookup local statfs inodes prior to journal recovery
gfs2: Add fields for statfs info in struct gfs2_log_header_host
gfs2: Ignore subsequent errors after withdraw in rgrp_go_sync
gfs2: Eliminate gl_vm
gfs2: Only access gl_delete for iopen glocks
gfs2: Fix comments to glock_hash_walk
gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders)
gfs2: Ignore journal log writes for jdata holes
gfs2: simplify gfs2_block_map
gfs2: Only set PageChecked if we have a transaction
gfs2: don't lock sd_ail_lock in gfs2_releasepage
gfs2: make gfs2_ail1_empty_one return the count of active items
gfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipe
gfs2: enhance log_blocks trace point to show log blocks free
gfs2: add missing log_blocks trace points in gfs2_write_revokes
gfs2: rename gfs2_write_full_page to gfs2_write_jdata_page, remove parm
gfs2: add validation checks for size of superblock
gfs2: use-after-free in sysfs deregistration
gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump
...
|
|
Pull cifs updates from Steve French:
- add support for recognizing special file types (char/block/fifo/
symlink) for files created by Linux on WSL (a format we plan to move
to as the default for creating special files on Linux, as it has
advantages over the other current option, the SFU format) in readdir.
- fix double queries to root directory when directory leases not
supported (e.g. Samba)
- fix querying mode bits (modefromsid mount option) for special file
types
- stronger encryption (gcm256), disabled by default until tested more
broadly
- allow querying owner when server reports 'well known SID' on query
dir with SMB3.1.1 POSIX extensions
* tag '5.10-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (30 commits)
SMB3: add support for recognizing WSL reparse tags
cifs: remove bogus debug code
smb3.1.1: fix typo in compression flag
cifs: move smb version mount options into fs_context.c
cifs: move cache mount options to fs_context.ch
cifs: move security mount options into fs_context.ch
cifs: add files to host new mount api
smb3: do not try to cache root directory if dir leases not supported
smb3: fix stat when special device file and mounted with modefromsid
cifs: Print the address and port we are connecting to in generic_ip_connect()
SMB3: Resolve data corruption of TCP server info fields
cifs: make const array static, makes object smaller
SMB3.1.1: Fix ids returned in POSIX query dir
smb3: add dynamic trace point to trace when credits obtained
smb3.1.1: do not fail if no encryption required but server doesn't support it
cifs: Return the error from crypt_message when enc/dec key not found.
smb3.1.1: set gcm256 when requested
smb3.1.1: rename nonces used for GCM and CCM encryption
smb3.1.1: print warning if server does not support requested encryption type
smb3.1.1: add new module load parm enable_gcm_256
...
|
|
Pull clone/dedupe/remap code refactoring from Darrick Wong:
"Move the generic file range remap (aka reflink and dedupe) functions
out of mm/filemap.c and fs/read_write.c and into fs/remap_range.c to
reduce clutter in the first two files"
* tag 'vfs-5.10-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
vfs: move the generic write and copy checks out of mm
vfs: move the remap range helpers to remap_range.c
vfs: move generic_remap_checks out of mm
|
|
Pull KVM updates from Paolo Bonzini:
"For x86, there is a new alternative and (in the future) more scalable
implementation of extended page tables that does not need a reverse
map from guest physical addresses to host physical addresses.
For now it is disabled by default because it is still lacking a few of
the existing MMU's bells and whistles. However it is a very solid
piece of work and it is already available for people to hammer on it.
Other updates:
ARM:
- New page table code for both hypervisor and guest stage-2
- Introduction of a new EL2-private host context
- Allow EL2 to have its own private per-CPU variables
- Support of PMU event filtering
- Complete rework of the Spectre mitigation
PPC:
- Fix for running nested guests with in-kernel IRQ chip
- Fix race condition causing occasional host hard lockup
- Minor cleanups and bugfixes
x86:
- allow trapping unknown MSRs to userspace
- allow userspace to force #GP on specific MSRs
- INVPCID support on AMD
- nested AMD cleanup, on demand allocation of nested SVM state
- hide PV MSRs and hypercalls for features not enabled in CPUID
- new test for MSR_IA32_TSC writes from host and guest
- cleanups: MMU, CPUID, shared MSRs
- LAPIC latency optimizations ad bugfixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits)
kvm: x86/mmu: NX largepage recovery for TDP MMU
kvm: x86/mmu: Don't clear write flooding count for direct roots
kvm: x86/mmu: Support MMIO in the TDP MMU
kvm: x86/mmu: Support write protection for nesting in tdp MMU
kvm: x86/mmu: Support disabling dirty logging for the tdp MMU
kvm: x86/mmu: Support dirty logging for the TDP MMU
kvm: x86/mmu: Support changed pte notifier in tdp MMU
kvm: x86/mmu: Add access tracking for tdp_mmu
kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU
kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU
kvm: x86/mmu: Add TDP MMU PF handler
kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg
kvm: x86/mmu: Support zapping SPTEs in the TDP MMU
KVM: Cache as_id in kvm_memory_slot
kvm: x86/mmu: Add functions to handle changed TDP SPTEs
kvm: x86/mmu: Allocate and free TDP MMU roots
kvm: x86/mmu: Init / Uninit the TDP MMU
kvm: x86/mmu: Introduce tdp_iter
KVM: mmu: extract spte.h and spte.c
KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp
...
|
|
Pull virtio updates from Michael Tsirkin:
"vhost, vdpa, and virtio cleanups and fixes
A very quiet cycle, no new features"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
MAINTAINERS: add URL for virtio-mem
vhost_vdpa: remove unnecessary spin_lock in vhost_vring_call
vringh: fix __vringh_iov() when riov and wiov are different
vdpa/mlx5: Setup driver only if VIRTIO_CONFIG_S_DRIVER_OK
s390: virtio: PV needs VIRTIO I/O device protection
virtio: let arch advertise guest's memory access restrictions
vhost_vdpa: Fix duplicate included kernel.h
vhost: reduce stack usage in log_used
virtio-mem: Constify mem_id_table
virtio_input: Constify id_table
virtio-balloon: Constify id_table
vdpa/mlx5: Fix failure to bring link up
vdpa/mlx5: Make use of a specific 16 bit endianness API
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform updates from Benson Leung:
"cros-ec:
- Error code cleanup across cros-ec by Guenter
- Remove cros_ec_cmd_xfer in favor of cros_ec_cmd_xfer_status
cros_ec_typec:
- Landed initial USB4 support in typec connector class driver for
cros_ec
- Role switch bugfix on disconnect, and reordering configuration
steps
cros_ec_lightbar:
- Fix buffer outsize and result for get_lightbar_version
misc:
- Remove config MFD_CROS_EC, now that transition from MFD is complete
- Enable KEY_LEFTMETA in new location on arm based cros-ec-keyboard
keymap"
* tag 'tag-chrome-platform-for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
ARM: dts: cros-ec-keyboard: Add alternate keymap for KEY_LEFTMETA
platform/chrome: Use kobj_to_dev() instead of container_of()
platform/chrome: cros_ec_proto: Drop cros_ec_cmd_xfer()
platform/chrome: cros_ec_proto: Update cros_ec_cmd_xfer() call-sites
platform/chrome: Kconfig: Remove the transitional MFD_CROS_EC config
platform/chrome: cros_ec_lightbar: Reduce ligthbar get version command
platform/chrome: cros_ec_trace: Add fields to command traces
platform/chrome: cros_ec_typec: Re-order connector configuration steps
platform/chrome: cros_ec_typec: Avoid setting usb role twice during disconnect
platform/chrome: cros_ec_typec: Send enum values to usb_role_switch_set_role()
platform/chrome: cros_ec_typec: USB4 support
pwm: cros-ec: Simplify EC error handling
platform/chrome: cros_ec_proto: Convert EC error codes to Linux error codes
platform/input: cros_ec: Replace -ENOTSUPP with -ENOPROTOOPT
pwm: cros-ec: Accept more error codes from cros_ec_cmd_xfer_status
platform/chrome: cros_ec_sysfs: Report range of error codes from EC
cros_ec_lightbar: Accept more error codes from cros_ec_cmd_xfer_status
iio: cros_ec: Accept -EOPNOTSUPP as 'not supported' error code
|
|
Pull arch task_work cleanups from Jens Axboe:
"Two cleanups that don't fit other categories:
- Finally get the task_work_add() cleanup done properly, so we don't
have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
all callers, and also fixes up the documentation for
task_work_add().
- While working on some TIF related changes for 5.11, this
TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
duplication for how that is handled"
* tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
task_work: cleanup notification modes
tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fix from Vineet Gupta:
"I found a snafu in perf driver which made it into 5.9-rc4 and the fix
should go in now than wait"
* tag 'arc-5.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: perf: redo the pct irq missing in device-tree handling
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull more arm64 updates from Will Deacon:
"A small selection of further arm64 fixes and updates. Most of these
are fixes that came in during the merge window, with the exception of
the HAVE_MOVE_PMD mremap() speed-up which we discussed back in 2018
and somehow forgot to enable upstream.
- Improve performance of Spectre-v2 mitigation on Falkor CPUs (if
you're lucky enough to have one)
- Select HAVE_MOVE_PMD. This has been shown to improve mremap()
performance, which is used heavily by the Android runtime GC, and
it seems we forgot to enable this upstream back in 2018.
- Ensure linker flags are consistent between LLVM and BFD
- Fix stale comment in Spectre mitigation rework
- Fix broken copyright header
- Fix KASLR randomisation of the linear map
- Prevent arm64-specific prctl()s from compat tasks (return -EINVAL)"
Link: https://lore.kernel.org/kvmarm/20181108181201.88826-3-joelaf@google.com/
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: proton-pack: Update comment to reflect new function name
arm64: spectre-v2: Favour CPU-specific mitigation at EL2
arm64: link with -z norelro regardless of CONFIG_RELOCATABLE
arm64: Fix a broken copyright header in gen_vdso_offsets.sh
arm64: mremap speedup - Enable HAVE_MOVE_PMD
arm64: mm: use single quantity to represent the PA to VA translation
arm64: reject prctl(PR_PAC_RESET_KEYS) on compat tasks
|
|
Apply the outstanding statfs changes in the journal head to the
master statfs file. Zero out the local statfs file for good measure.
Previously, statfs updates would be read in from the local statfs inode and
synced to the master statfs inode during recovery.
We now use the statfs updates in the journal head to update the master statfs
inode instead of reading in from the local statfs inode. To preserve backward
compatibility with kernels that can't do this, we still need to keep the
local statfs inode up to date by writing changes to it. At some point in the
future, we can do away with the local statfs inodes altogether and keep the
statfs changes solely in the journal.
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
We need to lookup the master statfs inode and the local statfs
inodes earlier in the mount process (in init_journal) so journal
recovery can use them when it attempts to recover the statfs info.
We lookup all the local statfs inodes and store them in a linked
list to allow a node to recover statfs info for other nodes in the
cluster.
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When KVM maps a largepage backed region at a lower level in order to
make it executable (i.e. NX large page shattering), it reduces the TLB
performance of that region. In order to avoid making this degradation
permanent, KVM must periodically reclaim shattered NX largepages by
zapping them and allowing them to be rebuilt in the page fault handler.
With this patch, the TDP MMU does not respect KVM's rate limiting on
reclaim. It traverses the entire TDP structure every time. This will be
addressed in a future patch.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-21-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Direct roots don't have a write flooding count because the guest can't
affect that paging structure. Thus there's no need to clear the write
flooding count on a fast CR3 switch for direct roots.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-20-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In order to support MMIO, KVM must be able to walk the TDP paging
structures to find mappings for a given GFN. Support this walk for
the TDP MMU.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
v2: Thanks to Dan Carpenter and kernel test robot for finding that root
was used uninitialized in get_mmio_spte.
Signed-off-by: Ben Gardon <bgardon@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Message-Id: <20201014182700.2888246-19-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
To support nested virtualization, KVM will sometimes need to write
protect pages which are part of a shadowed paging structure or are not
writable in the shadowed paging structure. Add a function to write
protect GFN mappings for this purpose.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-18-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Dirty logging ultimately breaks down MMU mappings to 4k granularity.
When dirty logging is no longer needed, these granaular mappings
represent a useless performance penalty. When dirty logging is disabled,
search the paging structure for mappings that could be re-constituted
into a large page mapping. Zap those mappings so that they can be
faulted in again at a higher mapping level.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-17-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Dirty logging is a key feature of the KVM MMU and must be supported by
the TDP MMU. Add support for both the write protection and PML dirty
logging modes.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-16-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In order to interoperate correctly with the rest of KVM and other Linux
subsystems, the TDP MMU must correctly handle various MMU notifiers. Add
a hook and handle the change_pte MMU notifier.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-15-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|