Age | Commit message (Collapse) | Author | Files | Lines |
|
We might use the nid of memmaps that were never initialized. For
example, if the memmap was poisoned, we will crash the kernel in
pfn_to_nid() right now. Let's use the calculated boundaries of the
separate zones instead. This now also avoids having to iterate over a
whole bunch of subsections again, after shrinking one zone.
Before commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory
hotplug"), the memmap was initialized to 0 and the node was set to the
right value. After that commit, the node might be garbage.
We'll have to fix shrink_zone_span() next.
Link: http://lkml.kernel.org/r/20191006085646.5768-4-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [d0dc12e86b319]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Damian Tometzki <damian.tometzki@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jun Yao <yaojun8558363@gmail.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Pankaj Gupta <pagupta@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Rich Felker <dalias@libc.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
/proc/pagetypeinfo
Uninitialized memmaps contain garbage and in the worst case trigger
kernel BUGs, especially with CONFIG_PAGE_POISONING. They should not get
touched.
For example, when not onlining a memory block that is spanned by a zone
and reading /proc/pagetypeinfo with CONFIG_DEBUG_VM_PGFLAGS and
CONFIG_PAGE_POISONING, we can trigger a kernel BUG:
:/# echo 1 > /sys/devices/system/memory/memory40/online
:/# echo 1 > /sys/devices/system/memory/memory42/online
:/# cat /proc/pagetypeinfo > test.file
page:fffff2c585200000 is uninitialized and poisoned
raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
There is not page extension available.
------------[ cut here ]------------
kernel BUG at include/linux/mm.h:1107!
invalid opcode: 0000 [#1] SMP NOPTI
Please note that this change does not affect ZONE_DEVICE, because
pagetypeinfo_showmixedcount_print() is called from
mm/vmstat.c:pagetypeinfo_showmixedcount() only for populated zones, and
ZONE_DEVICE is never populated (zone->present_pages always 0).
[david@redhat.com: move check to outer loop, add comment, rephrase description]
Link: http://lkml.kernel.org/r/20191011140638.8160-1-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b319
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When CONFIG_PRINTK_CALLER is set, struct printk_log contains an
additional member caller_id. This affects the offset of the log text.
Account for this by using the type information from gdb to determine all
the offsets instead of using hardcoded values.
This fixes following error:
(gdb) lx-dmesg
Python Exception <class 'ValueError'> embedded null character:
Error occurred in Python command: embedded null character
The read_u* utility functions now take an offset argument to make them
easier to use.
Link: http://lkml.kernel.org/r/20191011142500.2339-1-joel.colledge@linbit.com
Signed-off-by: Joel Colledge <joel.colledge@linbit.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Leonard Crestez <leonard.crestez@nxp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We should check for pfn_to_online_page() to not access uninitialized
memmaps. Reshuffle the code so we don't have to duplicate the error
message.
Link: http://lkml.kernel.org/r/20191009142435.3975-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There are three places where we access uninitialized memmaps, namely:
- /proc/kpagecount
- /proc/kpageflags
- /proc/kpagecgroup
We have initialized memmaps either when the section is online or when the
page was initialized to the ZONE_DEVICE. Uninitialized memmaps contain
garbage and in the worst case trigger kernel BUGs, especially with
CONFIG_PAGE_POISONING.
For example, not onlining a DIMM during boot and calling /proc/kpagecount
with CONFIG_PAGE_POISONING:
:/# cat /proc/kpagecount > tmp.test
BUG: unable to handle page fault for address: fffffffffffffffe
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 114616067 P4D 114616067 PUD 114618067 PMD 0
Oops: 0000 [#1] SMP NOPTI
CPU: 0 PID: 469 Comm: cat Not tainted 5.4.0-rc1-next-20191004+ #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
RIP: 0010:kpagecount_read+0xce/0x1e0
Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480
RSP: 0018:ffffa14e409b7e78 EFLAGS: 00010202
RAX: fffffffffffffffe RBX: 0000000000020000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 00007f76b5595000 RDI: fffff35645000000
RBP: 00007f76b5595000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
R13: 0000000000020000 R14: 00007f76b5595000 R15: ffffa14e409b7f08
FS: 00007f76b577d580(0000) GS:ffff8f41bd400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffffffffffffe CR3: 0000000078960000 CR4: 00000000000006f0
Call Trace:
proc_reg_read+0x3c/0x60
vfs_read+0xc5/0x180
ksys_read+0x68/0xe0
do_syscall_64+0x5c/0xa0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
For now, let's drop support for ZONE_DEVICE from the three pseudo files
in order to fix this. To distinguish offline memory (with garbage
memmap) from ZONE_DEVICE memory with properly initialized memmaps, we
would have to check get_dev_pagemap() and pfn_zone_device_reserved()
right now. The usage of both (especially, special casing devmem) is
frowned upon and needs to be reworked.
The fundamental issue we have is:
if (pfn_to_online_page(pfn)) {
/* memmap initialized */
} else if (pfn_valid(pfn)) {
/*
* ???
* a) offline memory. memmap garbage.
* b) devmem: memmap initialized to ZONE_DEVICE.
* c) devmem: reserved for driver. memmap garbage.
* (d) devmem: memmap currently initializing - garbage)
*/
}
We'll leave the pfn_zone_device_reserved() check in stable_page_flags()
in place as that function is also used from memory failure. We now no
longer dump information about pages that are not in use anymore -
offline.
Link: http://lkml.kernel.org/r/20191009142435.3975-2-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Toshiki Fukasawa <t-fukasawa@vx.jp.nec.com>
Cc: Pankaj gupta <pagupta@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
soft_offline_page_store()
Uninitialized memmaps contain garbage and in the worst case trigger kernel
BUGs, especially with CONFIG_PAGE_POISONING. They should not get touched.
Right now, when trying to soft-offline a PFN that resides on a memory
block that was never onlined, one gets a misleading error with
CONFIG_PAGE_POISONING:
:/# echo 5637144576 > /sys/devices/system/memory/soft_offline_page
[ 23.097167] soft offline: 0x150000 page already poisoned
But the actual result depends on the garbage in the memmap.
soft_offline_page() can only work with online pages, it returns -EIO in
case of ZONE_DEVICE. Make sure to only forward pages that are online
(iow, managed by the buddy) and, therefore, have an initialized memmap.
Add a check against pfn_to_online_page() and similarly return -EIO.
Link: http://lkml.kernel.org/r/20191010141200.8985-1-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request from Keith that address deadlocks, double resets,
memory leaks, and other regression.
- Fixup elv_support_iosched() for bio based devices (Damien)
- Fixup for the ahci PCS quirk (Dan)
- Socket O_NONBLOCK handling fix for io_uring (me)
- Timeout sequence io_uring fixes (yangerkun)
- MD warning fix for parameter default_layout (Song)
- blkcg activation fixes (Tejun)
- blk-rq-qos node deletion fix (Tejun)
* tag 'for-linus-2019-10-18' of git://git.kernel.dk/linux-block:
nvme-pci: Set the prp2 correctly when using more than 4k page
io_uring: fix logic error in io_timeout
io_uring: fix up O_NONBLOCK handling for sockets
md/raid0: fix warning message for parameter default_layout
libata/ahci: Fix PCS quirk application
blk-rq-qos: fix first node deletion of rq_qos_del()
blkcg: Fix multiple bugs in blkcg_activate_policy()
io_uring: consider the overflow of sequence for timeout req
nvme-tcp: fix possible leakage during error flow
nvmet-loop: fix possible leakage during error flow
block: Fix elv_support_iosched()
nvme-tcp: Initialize sk->sk_ll_usec only with NET_RX_BUSY_POLL
nvme: Wait for reset state when required
nvme: Prevent resets during paused controller state
nvme: Restart request timers in resetting state
nvme: Remove ADMIN_ONLY state
nvme-pci: Free tagset if no IO queues
nvme: retain split access workaround for capability reads
nvme: fix possible deadlock when nvme_update_formats fails
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
"Some RISC-V fixes:
- Fix the virtual memory layout so the fixaddr region doesn't overlap
with other regions. (This was originally intended to go in as part
of an earlier patch, but I inadvertently dropped it during a
rebase)
- Add the DT chosen/stdout-path property to the HiFive Unleashed DT
file. This is so "earlycon" can be specified with no arguments on
the kernel command line, and the correct UART will be automatically
selected.
And two cleanup patches:
- Simplify the code in our breakpoint trap handler.
- Drop a comment in our TLB flush code that has caused some
confusion"
* tag 'riscv/for-v5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: fix virtual address overlapped in FIXADDR_START and VMEMMAP_START
riscv: tlbflush: remove confusing comment on local_flush_tlb_all()
riscv: dts: HiFive Unleashed: add default chosen/stdout-path
riscv: remove the switch statement in do_trap_break()
|
|
This was always meant to be a temporary thing, just for testing and to
see if it actually ever triggered.
The only thing that reported it was syzbot doing disk image fuzzing, and
then that warning is expected. So let's just remove it before -rc4,
because the extra sanity testing should probably go to -stable, but we
don't want the warning to do so.
Reported-by: syzbot+3031f712c7ad5dd4d926@syzkaller.appspotmail.com
Fixes: 8a23eb804ca4 ("Make filldir[64]() verify the directory entry filename is valid")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull ceph fixes from Ilya Dryomov:
"A future-proofing decoding fix from Jeff intended for stable and a
patch for a mostly benign race from Dongsheng"
* tag 'ceph-for-5.4-rc4' of git://github.com/ceph/ceph-client:
rbd: cancel lock_dwork if the wait is interrupted
ceph: just skip unrecognized info in ceph_reply_info_extra
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM snapshot deadlock that can occur due to COW throttling
preventing locks from being released.
- Fix DM cache's GFP_NOWAIT allocation failure error paths by switching
to GFP_NOIO.
- Make __hash_find() static in the DM clone target.
* tag 'for-5.4/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache: fix bugs when a GFP_NOWAIT allocation fails
dm snapshot: rework COW throttling to fix deadlock
dm snapshot: introduce account_start_copy() and account_end_copy()
dm clone: Make __hash_find static
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Fixes for page-table issues on Mali GPUs
- Missing free in an error path for ARM-SMMU
- PASID decoding in the AMD IOMMU Event log code
- Another update for the locking fixes in the AMD IOMMU driver
- Reduce the calls to platform_get_irq() in the IPMMU-VMSA and Rockchip
IOMMUs to get rid of the warning message added to this function
recently
* tag 'iommu-fixes-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Check PM_LEVEL_SIZE() condition in locked section
iommu/amd: Fix incorrect PASID decoding from event log
iommu/ipmmu-vmsa: Only call platform_get_irq() when interrupt is mandatory
iommu/rockchip: Don't use platform_get_irq to implicitly count irqs
iommu/io-pgtable-arm: Support all Mali configurations
iommu/io-pgtable-arm: Correct Mali attributes
iommu/arm-smmu: Free context bitmap in the err path of arm_smmu_init_domain_context
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux
Pull usercopy test fixlets from Christian Brauner:
"This contains two improvements for the copy_struct_from_user() tests:
- a coding style change to get rid of the ugly "if ((ret |= test()))"
pointed out when pulling the original patchset.
- avoid a soft lockups when running the usercopy tests on machines
with large page sizes by scanning only a 1024 byte region"
* tag 'copy-struct-from-user-v5.4-rc4' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
usercopy: Avoid soft lockups in test_check_nonzero_user()
lib: test_user_copy: style cleanup
|
|
In mlx5_fw_fatal_reporter_dump if mlx5_crdump_collect fails the
allocated memory for cr_data must be released otherwise there will be
memory leak. To fix this, this commit changes the return instruction
into goto error handling.
Fixes: 9b1f29823605 ("net/mlx5: Add support for FW fatal reporter dump")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
In mlx5_fpga_conn_create_cq if mlx5_vector2eqn fails the allocated
memory should be released.
Fixes: 537a50574175 ("net/mlx5: FPGA, Add high-speed connection routines")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
The completion queue consumer index increments upon a call to
mlx5_cqwq_pop().
When dumping an error CQE, the index is already incremented.
Decrease one for the print command.
Fixes: 16cc14d81733 ("net/mlx5e: Dump xmit error completions")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Once the kTLS TX resync function is called, it used to return
a binary value, for success or failure.
However, in case the TLS SKB is a retransmission of the connection
handshake, it initiates the resync flow (as the tcp seq check holds),
while regular packet handle is expected.
In this patch, we identify this case and skip the resync operation
accordingly.
Counters:
- Add a counter (tls_skip_no_sync_data) to monitor this.
- Bump the dump counters up as they are used more frequently.
- Add a missing counter descriptor declaration for tls_resync_bytes
in sq_stats_desc.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Do not assume the crypto info is accessible during the
connection lifetime. Save a copy of it in the private
TX context.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Cipher type is checked upon connection addition.
No need to recheck it per every TX resync invocation.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
HW expects the data size in DUMP WQEs to be up to MTU.
Make sure they are in range.
We elevate the frag page refcount by 'n-1', in addition to the
one obtained in tx_sync_info_get(), having an overall of 'n'
references. We bulk increments by using a single page_ref_add()
command, to optimize perfermance.
The refcounts are released one by one, by the corresponding completions.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Before posting the context params WQEs, make sure there is enough
contiguous room for them, and fill frag edge if needed.
When posting only a nop, no need for room check, as it needs a single
WQEBB, meaning no contiguity issue.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
All references for frag pages that are obtained in tx_sync_info_get()
should be released.
Release usually occurs in the corresponding CQE of the WQE.
In error flows, not all fragments have a WQE posted for them, hence
no matching CQE will be generated.
For these pages, release the reference in the error flow.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Access the record fragments only under the TLS ctx lock.
In the resync flow, save a copy of them to be used when
preparing and posting the required DUMP WQEs.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
In TX resync flow where DUMP WQEs are posted, keep a pointer to
the fragment page to unref it upon completion, instead of saving
the whole fragment.
In addition, move it the end of the arguments list in tx_fill_wi().
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
No Eth segment, so no dynamic inline headers.
The size of a Dump WQE is fixed, use constants and remove
unnecessary checks.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
A call to kTLS completion handler was missing in the TXQSQ release
flow. Add it.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Not all fields of WQE info are being written in the function,
having some leftovers from previous rounds.
Zero-memset it upon update.
Particularly, not nullifying the wi->resync_dump_frag field
will cause double free of the kTLS DUMPed frags.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Cited patch removed the assumption only in datapath.
Here we remove it also form control/cleanup flow.
Fixes: 9ab0233728ca ("net/mlx5e: Tx, Don't implicitly assume SKB-less wqe has one WQEBB")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
As soon as the netdev is registers, the kernel can start using the
interface. If the driver connects the MAC to the PHY after the netdev
is registered, there is a race condition where the interface can be
opened without having the PHY connected.
Change the order to close this race condition.
Fixes: 92571a1aae40 ("lan78xx: Connect phy early")
Reported-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Stefano Garzarella says:
====================
vsock/virtio: make the credit mechanism more robust
This series makes the credit mechanism implemented in the
virtio-vsock devices more robust.
Patch 1 sends an update to the remote peer when the buf_alloc
change.
Patch 2 prevents a malicious peer (especially the guest) can
consume all the memory of the other peer, discarding packets
when the credit available is not respected.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the remote peer doesn't respect the credit information
(buf_alloc, fwd_cnt), sending more data than it can send,
we should drop the packets to prevent a malicious peer
from using all of our memory.
This is patch follows the VIRTIO spec: "VIRTIO_VSOCK_OP_RW data
packets MUST only be transmitted when the peer has sufficient
free buffer space for the payload"
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When the user application set a new buffer size value, we should
update the remote peer about this change, since it uses this
information to calculate the credit available.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
devlink maintains packets and bytes statistics for each trap. Since
eth_type_trans() was called to set the skb's protocol, the data pointer
no longer points to the start of the packet and the bytes accounting is
off by 14 bytes.
Fix this by pushing the skb's data pointer to the start of the packet.
Fixes: b5ce611fd96e ("mlxsw: spectrum: Add devlink-trap support")
Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Tested-by: Alex Kushnarov <alexanderk@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Thomas found that some forwarded packets would be stuck
in FQ packet scheduler because their skb->tstamp contained
timestamps far in the future.
We thought we addressed this point in commit 8203e2d844d3
("net: clear skb->tstamp in forwarding paths") but there
is still an issue when/if a packet needs to be fragmented.
In order to meet EDT requirements, we have to make sure all
fragments get the original skb->tstamp.
Note that this original skb->tstamp should be zero in
forwarding path, but might have a non zero value in
output path if user decided so.
Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Thomas Bartschies <Thomas.Bartschies@cvk.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC host:
- sdhci-iproc: Prevent some spurious interrupts
- renesas_sdhi/sh_mmcif: Avoid false warnings about IRQs not found
MEMSTICK host:
- jmb38x_ms: Fix an error handling path at ->probe()"
* tag 'mmc-v5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
memstick: jmb38x_ms: Fix an error handling path in 'jmb38x_ms_probe()'
mmc: sdhci-iproc: fix spurious interrupts on Multiblock reads with bcm2711
mmc: sh_mmcif: Use platform_get_irq_optional() for optional interrupt
mmc: renesas_sdhi: Do not use platform_get_irq() to count interrupts
|
|
Doug Berger says:
====================
net: bcmgenet: restore internal EPHY support
I managed to get my hands on an old BCM97435SVMB board to do some
testing with the latest kernel and uncovered a number of things
that managed to get broken over the years (some by me ;).
This commit set attempts to correct the errors I observed in my
testing.
The first commit applies to all internal PHYs to restore proper
reporting of link status when a link comes up.
The second commit restores the soft reset to the initialization of
the older internal EPHYs used by 40nm Set-Top Box devices.
The third corrects a bug I introduced when removing excessive soft
resets by altering the initialization sequence in a way that keeps
the GENETv3 MAC interface happy.
Finally, I observed a number of issues when manually configuring
the network interface of the older EPHYs that appear to be resolved
by the fourth commit.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The EPHY integrated into the 40nm Set-Top Box devices can falsely
detect energy when connected to a disabled peer interface. When the
peer interface is enabled the EPHY will detect and report the link
as active, but on occasion may get into a state where it is not
able to exchange data with the connected GENET MAC. This issue has
not been observed when the link parameters are auto-negotiated;
however, it has been observed with a manually configured link.
It has been empirically determined that issuing a soft reset to the
EPHY when energy is detected prevents it from getting into this bad
state.
Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It turns out that the "Workaround for putting the PHY in IDDQ mode"
used by the internal EPHYs on 40nm Set-Top Box chips when powering
down puts the interface to the GENET MAC in a state that can cause
subsequent MAC resets to be incomplete.
Rather than restore the forced soft reset when powering up internal
PHYs, this commit moves the invocation of phy_init_hw earlier in
the MAC initialization sequence to just before the MAC reset in the
open and resume functions. This allows the interface to be stable
and allows the MAC resets to be successful.
The bcmgenet_mii_probe() function is split in two to accommodate
this. The new function bcmgenet_mii_connect() handles the first
half of the functionality before the MAC initialization, and the
bcmgenet_mii_config() function is extended to provide the remaining
PHY configuration following the MAC initialization.
Fixes: 484bfa1507bf ("Revert "net: bcmgenet: Software reset EPHY after power on"")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The internal 40nm EPHYs use a "Workaround for putting the PHY in
IDDQ mode." These PHYs require a soft reset to restore functionality
after they are powered back up.
This commit defines the soft_reset function to use genphy_soft_reset
during phy_init_hw to accommodate this.
Fixes: 6e2d85ec0559 ("net: phy: Stop with excessive soft reset")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When commit 28b2e0d2cd13 ("net: phy: remove parameter new_link from
phy_mac_interrupt()") removed the new_link parameter it set the
phydev->link state from the MAC before invoking phy_mac_interrupt().
However, once commit 88d6272acaaa ("net: phy: avoid unneeded MDIO
reads in genphy_read_status") was added this initialization prevents
the proper determination of the connection parameters by the function
genphy_read_status().
This commit removes that initialization to restore the proper
functionality.
Fixes: 88d6272acaaa ("net: phy: avoid unneeded MDIO reads in genphy_read_status")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Just a few small fixes for the usual suspect, HD- and USB-audio:
enablement of runtime PM for Nvidia due to the recent PCI changes, a
fix for potential hangs with recent HD-audio platforms, and the rest
device-specific quirks"
* tag 'sound-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Force runtime PM on Nvidia HDMI codecs
ALSA: hda/realtek - Enable headset mic on Asus MJ401TA
ALSA: usb-audio: Disable quirks for BOSS Katana amplifiers
ALSA: hdac: clear link output stream mapping
ALSA: hda/realtek: Reduce the Headphone static noise on XPS 9350/9360
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"Fix possible use-after-free in the ACPI CPPC support code (John Garry)
and prevent the ACPI HMAT parsing code from using possibly incorrect
data coming from the platform firmware (Daniel Black)"
* tag 'acpi-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: CPPC: Set pcc_data[pcc_ss_id] to NULL in acpi_cppc_processor_exit()
ACPI: HMAT: ACPI_HMAT_MEMORY_PD_VALID is deprecated since ACPI-6.3
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These include a fix for a recent regression in the ACPI CPU
performance scaling code, a PCI device power management fix,
a system shutdown fix related to cpufreq, a removal of an ACPI
suspend-to-idle blacklist entry and a build warning fix.
Specifics:
- Fix possible NULL pointer dereference in the ACPI processor scaling
initialization code introduced by a recent cpufreq update (Rafael
Wysocki).
- Fix possible deadlock due to suspending cpufreq too late during
system shutdown (Rafael Wysocki).
- Make the PCI device system resume code path be more consistent with
its PM-runtime counterpart to fix an issue with missing delay on
transitions from D3cold to D0 during system resume from
suspend-to-idle on some systems (Rafael Wysocki).
- Drop Dell XPS13 9360 from the LPS0 Idle _DSM blacklist to make it
use suspend-to-idle by default (Mario Limonciello).
- Fix build warning in the core system suspend support code (Ben
Dooks)"
* tag 'pm-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: processor: Avoid NULL pointer dereferences at init time
PCI: PM: Fix pci_power_up()
PM: sleep: include <linux/pm_runtime.h> for pm_wq
cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown
ACPI: PM: Drop Dell XPS13 9360 from LPS0 Idle _DSM blacklist
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi
Pull scsi fixes from Martin Petersen:
"These two commits were in a separate postmerge branch due to a
dependency on changes merged for 5.4 in the block tree.
They fix two issues in the intersection of the request cleanup changes
from block (b7e9e1fb7a92) and the request batching changes
(8930a6c20791) that were made to SCSI during the 5.4 cycle"
* tag 'mkp-scsi-postmerge' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi:
scsi: core: fix dh and multipathing for SCSI hosts without request batching
scsi: core: fix missing .cleanup_rq for SCSI hosts without request batching
|
|
The increase_address_space() function has to check the PM_LEVEL_SIZE()
condition again under the domain->lock to avoid a false trigger of the
WARN_ON_ONCE() and to avoid that the address space is increase more
often than necessary.
Reported-by: Qian Cai <cai@lca.pw>
Fixes: 754265bcab78 ("iommu/amd: Fix race in increase_address_space()")
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Pull NVMe updates from Keith:
"This is a collection of bug fixes committed since the previous pull
request that address deadlocks, double resets, memory leaks, and other
regression."
* 'nvme-5.4' of git://git.infradead.org/nvme:
nvme-pci: Set the prp2 correctly when using more than 4k page
nvme-tcp: fix possible leakage during error flow
nvmet-loop: fix possible leakage during error flow
nvme-tcp: Initialize sk->sk_ll_usec only with NET_RX_BUSY_POLL
nvme: Wait for reset state when required
nvme: Prevent resets during paused controller state
nvme: Restart request timers in resetting state
nvme: Remove ADMIN_ONLY state
nvme-pci: Free tagset if no IO queues
nvme: retain split access workaround for capability reads
nvme: fix possible deadlock when nvme_update_formats fails
|
|
In the current code, the nvme is using a fixed 4k PRP entry size,
but if the kernel use a page size which is more than 4k, we should
consider the situation that the bv_offset may be larger than the
dev->ctrl.page_size. Otherwise we may miss setting the prp2 and then
cause the command can't be executed correctly.
Fixes: dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
* acpi-tables:
ACPI: HMAT: ACPI_HMAT_MEMORY_PD_VALID is deprecated since ACPI-6.3
|
|
When enabling KASAN and DEBUG_TEST_DRIVER_REMOVE, I find this KASAN
warning:
[ 20.872057] BUG: KASAN: use-after-free in pcc_data_alloc+0x40/0xb8
[ 20.878226] Read of size 4 at addr ffff00236cdeb684 by task swapper/0/1
[ 20.884826]
[ 20.886309] CPU: 19 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc1-00009-ge7f7df3db5bf-dirty #289
[ 20.894994] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019
[ 20.903505] Call trace:
[ 20.905942] dump_backtrace+0x0/0x200
[ 20.909593] show_stack+0x14/0x20
[ 20.912899] dump_stack+0xd4/0x130
[ 20.916291] print_address_description.isra.9+0x6c/0x3b8
[ 20.921592] __kasan_report+0x12c/0x23c
[ 20.925417] kasan_report+0xc/0x18
[ 20.928808] __asan_load4+0x94/0xb8
[ 20.932286] pcc_data_alloc+0x40/0xb8
[ 20.935938] acpi_cppc_processor_probe+0x4e8/0xb08
[ 20.940717] __acpi_processor_start+0x48/0xb0
[ 20.945062] acpi_processor_start+0x40/0x60
[ 20.949235] really_probe+0x118/0x548
[ 20.952887] driver_probe_device+0x7c/0x148
[ 20.957059] device_driver_attach+0x94/0xa0
[ 20.961231] __driver_attach+0xa4/0x110
[ 20.965055] bus_for_each_dev+0xe8/0x158
[ 20.968966] driver_attach+0x30/0x40
[ 20.972531] bus_add_driver+0x234/0x2f0
[ 20.976356] driver_register+0xbc/0x1d0
[ 20.980182] acpi_processor_driver_init+0x40/0xe4
[ 20.984875] do_one_initcall+0xb4/0x254
[ 20.988700] kernel_init_freeable+0x24c/0x2f8
[ 20.993047] kernel_init+0x10/0x118
[ 20.996524] ret_from_fork+0x10/0x18
[ 21.000087]
[ 21.001567] Allocated by task 1:
[ 21.004785] save_stack+0x28/0xc8
[ 21.008089] __kasan_kmalloc.isra.9+0xbc/0xd8
[ 21.012435] kasan_kmalloc+0xc/0x18
[ 21.015913] pcc_data_alloc+0x94/0xb8
[ 21.019564] acpi_cppc_processor_probe+0x4e8/0xb08
[ 21.024343] __acpi_processor_start+0x48/0xb0
[ 21.028689] acpi_processor_start+0x40/0x60
[ 21.032860] really_probe+0x118/0x548
[ 21.036512] driver_probe_device+0x7c/0x148
[ 21.040684] device_driver_attach+0x94/0xa0
[ 21.044855] __driver_attach+0xa4/0x110
[ 21.048680] bus_for_each_dev+0xe8/0x158
[ 21.052591] driver_attach+0x30/0x40
[ 21.056155] bus_add_driver+0x234/0x2f0
[ 21.059980] driver_register+0xbc/0x1d0
[ 21.063805] acpi_processor_driver_init+0x40/0xe4
[ 21.068497] do_one_initcall+0xb4/0x254
[ 21.072322] kernel_init_freeable+0x24c/0x2f8
[ 21.076667] kernel_init+0x10/0x118
[ 21.080144] ret_from_fork+0x10/0x18
[ 21.083707]
[ 21.085186] Freed by task 1:
[ 21.088056] save_stack+0x28/0xc8
[ 21.091360] __kasan_slab_free+0x118/0x180
[ 21.095445] kasan_slab_free+0x10/0x18
[ 21.099183] kfree+0x80/0x268
[ 21.102139] acpi_cppc_processor_exit+0x1a8/0x1b8
[ 21.106832] acpi_processor_stop+0x70/0x80
[ 21.110917] really_probe+0x174/0x548
[ 21.114568] driver_probe_device+0x7c/0x148
[ 21.118740] device_driver_attach+0x94/0xa0
[ 21.122912] __driver_attach+0xa4/0x110
[ 21.126736] bus_for_each_dev+0xe8/0x158
[ 21.130648] driver_attach+0x30/0x40
[ 21.134212] bus_add_driver+0x234/0x2f0
[ 21.0x10/0x18
[ 21.161764]
[ 21.163244] The buggy address belongs to the object at ffff00236cdeb600
[ 21.163244] which belongs to the cache kmalloc-256 of size 256
[ 21.175750] The buggy address is located 132 bytes inside of
[ 21.175750] 256-byte region [ffff00236cdeb600, ffff00236cdeb700)
[ 21.187473] The buggy address belongs to the page:
[ 21.192254] page:fffffe008d937a00 refcount:1 mapcount:0 mapping:ffff002370c0fa00 index:0x0 compound_mapcount: 0
[ 21.202331] flags: 0x1ffff00000010200(slab|head)
[ 21.206940] raw: 1ffff00000010200 dead000000000100 dead000000000122 ffff002370c0fa00
[ 21.214671] raw: 0000000000000000 00000000802a002a 00000001ffffffff 0000000000000000
[ 21.222400] page dumped because: kasan: bad access detected
[ 21.227959]
[ 21.229438] Memory state around the buggy address:
[ 21.234218] ffff00236cdeb580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 21.241427] ffff00236cdeb600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 21.248637] >ffff00236cdeb680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 21.255845] ^
[ 21.259062] ffff00236cdeb700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 21.266272] ffff00236cdeb780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 21.273480] ==================================================================
It seems that global pcc_data[pcc_ss_id] can be freed in
acpi_cppc_processor_exit(), but we may later reference this value, so
NULLify it when freed.
Also remove the useless setting of data "pcc_channel_acquired", which
we're about to free.
Fixes: 85b1407bf6d2 ("ACPI / CPPC: Make CPPC ACPI driver aware of PCC subspace IDs")
Signed-off-by: John Garry <john.garry@huawei.com>
Cc: 4.15+ <stable@vger.kernel.org> # 4.15+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
* pm-cpufreq:
ACPI: processor: Avoid NULL pointer dereferences at init time
cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown
* pm-sleep:
PM: sleep: include <linux/pm_runtime.h> for pm_wq
ACPI: PM: Drop Dell XPS13 9360 from LPS0 Idle _DSM blacklist
|