kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2023-11-08	virtio-mmio: fix memory leak of vm_dev	Maximilian Heyne	1	-5/+14
	commit fab7f259227b8f70aa6d54e1de1a1f5f4729041c upstream. With the recent removal of vm_dev from devres its memory is only freed via the callback virtio_mmio_release_dev. However, this only takes effect after device_add is called by register_virtio_device. Until then it's an unmanaged resource and must be explicitly freed on error exit. This bug was discovered and resolved using Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc. Cc: stable@vger.kernel.org Fixes: 55c91fedd03d ("virtio-mmio: don't break lifecycle of vm_dev") Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Message-Id: <20230911090328.40538-1-mheyne@amazon.de> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2023-11-08	virtio_balloon: Fix endless deflation and inflation on arm64	Gavin Shan	1	-1/+5
	commit 07622bd415639e9709579f400afd19e7e9866e5e upstream. The deflation request to the target, which isn't unaligned to the guest page size causes endless deflation and inflation actions. For example, we receive the flooding QMP events for the changes on memory balloon's size after a deflation request to the unaligned target is sent for the ARM64 guest, where we have 64KB base page size. /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host -cpu host \ -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1 \ -m 1024M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=512M \ -object memory-backend-ram,id=mem1,size=512M \ -numa node,nodeid=0,memdev=mem0,cpus=0-3 \ -numa node,nodeid=1,memdev=mem1,cpus=4-7 \ : \ -device virtio-balloon-pci,id=balloon0,bus=pcie.10 { "execute" : "balloon", "arguments": { "value" : 1073672192 } } {"return": {}} {"timestamp": {"seconds": 1693272173, "microseconds": 88667}, \ "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}} {"timestamp": {"seconds": 1693272174, "microseconds": 89704}, \ "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}} {"timestamp": {"seconds": 1693272175, "microseconds": 90819}, \ "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}} {"timestamp": {"seconds": 1693272176, "microseconds": 91961}, \ "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}} {"timestamp": {"seconds": 1693272177, "microseconds": 93040}, \ "event": "BALLOON_CHANGE", "data": {"actual": 1073676288}} {"timestamp": {"seconds": 1693272178, "microseconds": 94117}, \ "event": "BALLOON_CHANGE", "data": {"actual": 1073676288}} {"timestamp": {"seconds": 1693272179, "microseconds": 95337}, \ "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}} {"timestamp": {"seconds": 1693272180, "microseconds": 96615}, \ "event": "BALLOON_CHANGE", "data": {"actual": 1073676288}} {"timestamp": {"seconds": 1693272181, "microseconds": 97626}, \ "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}} {"timestamp": {"seconds": 1693272182, "microseconds": 98693}, \ "event": "BALLOON_CHANGE", "data": {"actual": 1073676288}} {"timestamp": {"seconds": 1693272183, "microseconds": 99698}, \ "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}} {"timestamp": {"seconds": 1693272184, "microseconds": 100727}, \ "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}} {"timestamp": {"seconds": 1693272185, "microseconds": 90430}, \ "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}} {"timestamp": {"seconds": 1693272186, "microseconds": 102999}, \ "event": "BALLOON_CHANGE", "data": {"actual": 1073676288}} : <The similar QMP events repeat> Fix it by aligning the target up to the guest page size, 64KB in this specific case. With this applied, no flooding QMP events are observed and the memory balloon's size can be stablizied to 0x3ffe0000 soon after the deflation request is sent. { "execute" : "balloon", "arguments": { "value" : 1073672192 } } {"return": {}} {"timestamp": {"seconds": 1693273328, "microseconds": 793075}, \ "event": "BALLOON_CHANGE", "data": {"actual": 1073610752}} { "execute" : "query-balloon" } {"return": {"actual": 1073610752}} Cc: stable@vger.kernel.org Signed-off-by: Gavin Shan <gshan@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Message-Id: <20230831011007.1032822-1-gshan@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-23	virtio_ring: fix avail_wrap_counter in virtqueue_add_packed	Yuan Yao	1	-1/+1
	[ Upstream commit 1acfe2c1225899eab5ab724c91b7e1eb2881b9ab ] In current packed virtqueue implementation, the avail_wrap_counter won't flip, in the case when the driver supplies a descriptor chain with a length equals to the queue size; total_sg == vq->packed.vring.num. Let’s assume the following situation: vq->packed.vring.num=4 vq->packed.next_avail_idx: 1 vq->packed.avail_wrap_counter: 0 Then the driver adds a descriptor chain containing 4 descriptors. We expect the following result with avail_wrap_counter flipped: vq->packed.next_avail_idx: 1 vq->packed.avail_wrap_counter: 1 But, the current implementation gives the following result: vq->packed.next_avail_idx: 1 vq->packed.avail_wrap_counter: 0 To reproduce the bug, you can set a packed queue size as small as possible, so that the driver is more likely to provide a descriptor chain with a length equal to the packed queue size. For example, in qemu run following commands: sudo qemu-system-x86_64 \ -enable-kvm \ -nographic \ -kernel "path/to/kernel_image" \ -m 1G \ -drive file="path/to/rootfs",if=none,id=disk \ -device virtio-blk,drive=disk \ -drive file="path/to/disk_image",if=none,id=rwdisk \ -device virtio-blk,drive=rwdisk,packed=on,queue-size=4,\ indirect_desc=off \ -append "console=ttyS0 root=/dev/vda rw init=/bin/bash" Inside the VM, create a directory and mount the rwdisk device on it. The rwdisk will hang and mount operation will not complete. This commit fixes the wrap counter error by flipping the packed.avail_wrap_counter, when start of descriptor chain equals to the end of descriptor chain (head == i). Fixes: 1ce9e6055fa0 ("virtio_ring: introduce packed ring support") Signed-off-by: Yuan Yao <yuanyaogoog@chromium.org> Message-Id: <20230808051110.3492693-1-yuanyaogoog@chromium.org> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30	virtio-mmio: don't break lifecycle of vm_dev	Wolfram Sang	1	-3/+2
	[ Upstream commit 55c91fedd03d7b9cf0c5199b2eb12b9b8e95281a ] vm_dev has a separate lifecycle because it has a 'struct device' embedded. Thus, having a release callback for it is correct. Allocating the vm_dev struct with devres totally breaks this protection, though. Instead of waiting for the vm_dev release callback, the memory is freed when the platform_device is removed. Resulting in a use-after-free when finally the callback is to be called. To easily see the problem, compile the kernel with CONFIG_DEBUG_KOBJECT_RELEASE and unbind with sysfs. The fix is easy, don't use devres in this case. Found during my research about object lifetime problems. Fixes: 7eb781b1bbb7 ("virtio_mmio: add cleanup for virtio_mmio_probe") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Message-Id: <20230629120526.7184-1-wsa+renesas@sang-engineering.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30	virtio-mmio: Use to_virtio_mmio_device() to simply code	Tang Bin	1	-2/+1
	[ Upstream commit da98b54d02981de5b07d8044b2a632bf6ba3ac45 ] The file virtio_mmio.c has defined the function to_virtio_mmio_device, so use it instead of container_of() to simply code. Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20210222055724.220-1-tangbin@cmss.chinamobile.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Stable-dep-of: 55c91fedd03d ("virtio-mmio: don't break lifecycle of vm_dev") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30	virtio-mmio: convert to devm_platform_ioremap_resource	Yangtao Li	1	-12/+3
	[ Upstream commit c64eb62cfce242a57a7276ca8280ae0baab29d05 ] Use devm_platform_ioremap_resource() to simplify code, which contains platform_get_resource, devm_request_mem_region and devm_ioremap. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Stable-dep-of: 55c91fedd03d ("virtio-mmio: don't break lifecycle of vm_dev") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09	treewide: Remove uninitialized_var() usage	Kees Cook	1	-3/+3
	commit 3f649ab728cda8038259d8f14492fe400fbab911 upstream. Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' \| cut -d: -f1 \| sort -u \| \ xargs perl -pi -e \ 's/\buninitialized_var$([^$]+)\)/\1/g; s:\s/\ (GCC be quiet\|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-21	virtio_mmio: Restore guest page size on resume	Stephan Gerhold	1	-0/+3
	[ Upstream commit e0c2ce8217955537dd5434baeba061f209797119 ] Virtio devices might lose their state when the VMM is restarted after a suspend to disk (hibernation) cycle. This means that the guest page size register must be restored for the virtio_mmio legacy interface, since otherwise the virtio queues are not functional. This is particularly problematic for QEMU that currently still defaults to using the legacy interface for virtio_mmio. Write the guest page size register again in virtio_mmio_restore() to make legacy virtio_mmio devices work correctly after hibernation. Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com> Message-Id: <20220621110621.3638025-3-stephan.gerhold@kernkonzept.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-21	virtio_mmio: Add missing PM calls to freeze/restore	Stephan Gerhold	1	-0/+23
	[ Upstream commit ed7ac37fde33ccd84e4bd2b9363c191f925364c7 ] Most virtio drivers provide freeze/restore callbacks to finish up device usage before suspend and to reinitialize the virtio device after resume. However, these callbacks are currently only called when using virtio_pci. virtio_mmio does not have any PM ops defined. This causes problems for example after suspend to disk (hibernation), since the virtio devices might lose their state after the VMM is restarted. Calling virtio_device_freeze()/restore() ensures that the virtio devices are re-initialized correctly. Fix this by implementing the dev_pm_ops for virtio_mmio, similar to virtio_pci_common. Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com> Message-Id: <20220621110621.3638025-2-stephan.gerhold@kernkonzept.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-22	virtio-pci: Remove wrong address verification in vp_del_vqs()	Murilo Opsfelder Araujo	1	-2/+1
	commit 7e415282b41bf0d15c6e0fe268f822d9b083f2f7 upstream. GCC 12 enhanced -Waddress when comparing array address to null [0], which warns: drivers/virtio/virtio_pci_common.c: In function ‘vp_del_vqs’: drivers/virtio/virtio_pci_common.c:257:29: warning: the comparison will always evaluate as ‘true’ for the pointer operand in ‘vp_dev->msix_affinity_masks + (sizetype)((long unsigned int)i * 256)’ must not be NULL [-Waddress] 257 \| if (vp_dev->msix_affinity_masks[i]) \| ^~~~~~ In fact, the verification is comparing the result of a pointer arithmetic, the address "msix_affinity_masks + i", which will always evaluate to true. Under the hood, free_cpumask_var() calls kfree(), which is safe to pass NULL, not requiring non-null verification. So remove the verification to make compiler happy (happy compiler, happy life). [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102103 Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Message-Id: <20220415023002.49805-1-muriloo@linux.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Christophe de Dinechin <dinechin@redhat.com> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-22	virtio-mmio: fix missing put_device() when vm_cmdline_parent registration failed	chengkaitao	1	-0/+1
	[ Upstream commit a58a7f97ba11391d2d0d408e0b24f38d86ae748e ] The reference must be released when device_register(&vm_cmdline_parent) failed. Add the corresponding 'put_device()' in the error handling path. Signed-off-by: chengkaitao <pilgrimtao@gmail.com> Message-Id: <20220602005542.16489-1-chengkaitao@didiglobal.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-16	virtio: acknowledge all features before access	Michael S. Tsirkin	1	-17/+21
	commit 4fa59ede95195f267101a1b8916992cf3f245cdb upstream. The feature negotiation was designed in a way that makes it possible for devices to know which config fields will be accessed by drivers. This is broken since commit 404123c2db79 ("virtio: allow drivers to validate features") with fallout in at least block and net. We have a partial work-around in commit 2f9a174f918e ("virtio: write back F_VERSION_1 before validate") which at least lets devices find out which format should config space have, but this is a partial fix: guests should not access config space without acknowledging features since otherwise we'll never be able to change the config space format. To fix, split finalize_features from virtio_finalize_features and call finalize_features with all feature bits before validation, and then - if validation changed any bits - once again after. Since virtio_finalize_features no longer writes out features rename it to virtio_features_ok - since that is what it does: checks that features are ok with the device. As a side effect, this also reduces the amount of hypervisor accesses - we now only acknowledge features once unless we are clearing any features when validating (which is uncommon). IRC I think that this was more or less always the intent in the spec but unfortunately the way the spec is worded does not say this explicitly, I plan to address this at the spec level, too. Acked-by: Jason Wang <jasowang@redhat.com> Cc: stable@vger.kernel.org Fixes: 404123c2db79 ("virtio: allow drivers to validate features") Fixes: 2f9a174f918e ("virtio: write back F_VERSION_1 before validate") Cc: "Halil Pasic" <pasic@linux.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-16	virtio: unexport virtio_finalize_features	Michael S. Tsirkin	1	-2/+1
	commit 838d6d3461db0fdbf33fc5f8a69c27b50b4a46da upstream. virtio_finalize_features is only used internally within virtio. No reason to export it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-22	virtio_ring: Fix querying of maximum DMA mapping size for virtio device	Will Deacon	1	-1/+1
	commit 817fc978b5a29b039db0418a91072b31c9aab152 upstream. virtio_max_dma_size() returns the maximum DMA mapping size of the virtio device by querying dma_max_mapping_size() for the device when the DMA API is in use for the vring. Unfortunately, the device passed is initialised by register_virtio_device() and does not inherit the DMA configuration from its parent, resulting in SWIOTLB errors when bouncing is enabled and the default 256K mapping limit (IO_TLB_SEGSIZE) is not respected: \| virtio-pci 0000:00:01.0: swiotlb buffer is full (sz: 294912 bytes), total 1024 (slots), used 725 (slots) Follow the pattern used elsewhere in the virtio_ring code when calling into the DMA layer and pass the parent device to dma_max_mapping_size() instead. Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20211201112018.25276-1-will@kernel.org Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Fixes: e6d6dd6c875e ("virtio: Introduce virtio_max_dma_size()") Cc: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-20	virtio: write back F_VERSION_1 before validate	Halil Pasic	1	-0/+11
	commit 2f9a174f918e29608564c7a4e8329893ab604fb4 upstream. The virtio specification virtio-v1.1-cs01 states: "Transitional devices MUST detect Legacy drivers by detecting that VIRTIO_F_VERSION_1 has not been acknowledged by the driver." This is exactly what QEMU as of 6.1 has done relying solely on VIRTIO_F_VERSION_1 for detecting that. However, the specification also says: "... the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device ..." before setting FEATURES_OK. In that case, any transitional device relying solely on VIRTIO_F_VERSION_1 for detecting legacy drivers will return data in legacy format. In particular, this implies that it is in big endian format for big endian guests. This naturally confuses the driver which expects little endian in the modern mode. It is probably a good idea to amend the spec to clarify that VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation is complete. Before validate callback existed, config space was only read after FEATURES_OK. However, we already have two regressions, so let's address this here as well. The regressions affect the VIRTIO_NET_F_MTU feature of virtio-net and the VIRTIO_BLK_F_BLK_SIZE feature of virtio-blk for BE guests when virtio 1.0 is used on both sides. The latter renders virtio-blk unusable with DASD backing, because things simply don't work with the default. See Fixes tags for relevant commits. For QEMU, we can work around the issue by writing out the feature bits with VIRTIO_F_VERSION_1 bit set. We (ab)use the finalize_features config op for this. This isn't enough to address all vhost devices since these do not get the features until FEATURES_OK, however it looks like the affected devices actually never handled the endianness for legacy mode correctly, so at least that's not a regression. No devices except virtio net and virtio blk seem to be affected. Long term the right thing to do is to fix the hypervisors. Cc: <stable@vger.kernel.org> #v4.11 Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space") Fixes: fe36cbe0671e ("virtio_net: clear MTU when out of range") Reported-by: markver@us.ibm.com Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20211011053921.1198936-1-pasic@linux.ibm.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-03	virtio_pci: Support surprise removal of virtio pci device	Parav Pandit	1	-0/+7
	[ Upstream commit 43bb40c5b92659966bdf4bfe584fde0a3575a049 ] When a virtio pci device undergo surprise removal (aka async removal in PCIe spec), mark the device as broken so that any upper layer drivers can abort any outstanding operation. When a virtio net pci device undergo surprise removal which is used by a NetworkManager, a below call trace was observed. kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/1:1:27059] watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [kworker/1:1:27059] CPU: 1 PID: 27059 Comm: kworker/1:1 Tainted: G S W I L 5.13.0-hotplug+ #8 Hardware name: Dell Inc. PowerEdge R640/0H28RR, BIOS 2.9.4 11/06/2020 Workqueue: events linkwatch_event RIP: 0010:virtnet_send_command+0xfc/0x150 [virtio_net] Call Trace: virtnet_set_rx_mode+0xcf/0x2a7 [virtio_net] ? __hw_addr_create_ex+0x85/0xc0 __dev_mc_add+0x72/0x80 igmp6_group_added+0xa7/0xd0 ipv6_mc_up+0x3c/0x60 ipv6_find_idev+0x36/0x80 addrconf_add_dev+0x1e/0xa0 addrconf_dev_config+0x71/0x130 addrconf_notify+0x1f5/0xb40 ? rtnl_is_locked+0x11/0x20 ? __switch_to_asm+0x42/0x70 ? finish_task_switch+0xaf/0x2c0 ? raw_notifier_call_chain+0x3e/0x50 raw_notifier_call_chain+0x3e/0x50 netdev_state_change+0x67/0x90 linkwatch_do_dev+0x3c/0x50 __linkwatch_run_queue+0xd2/0x220 linkwatch_event+0x21/0x30 process_one_work+0x1c8/0x370 worker_thread+0x30/0x380 ? process_one_work+0x370/0x370 kthread+0x118/0x140 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 Hence, add the ability to abort the command on surprise removal which prevents infinite loop and system lockup. Signed-off-by: Parav Pandit <parav@nvidia.com> Link: https://lore.kernel.org/r/20210721142648.1525924-5-parav@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-03	virtio: Improve vq->broken access to avoid any compiler optimization	Parav Pandit	1	-2/+4
	[ Upstream commit 60f0779862e4ab943810187752c462e85f5fa371 ] Currently vq->broken field is read by virtqueue_is_broken() in busy loop in one context by virtnet_send_command(). vq->broken is set to true in other process context by virtio_break_device(). Reader and writer are accessing it without any synchronization. This may lead to a compiler optimization which may result to optimize reading vq->broken only once. Hence, force reading vq->broken on each invocation of virtqueue_is_broken() and also force writing it so that such update is visible to the readers. It is a theoretical fix that isn't yet encountered in the field. Signed-off-by: Parav Pandit <parav@nvidia.com> Link: https://lore.kernel.org/r/20210721142648.1525924-2-parav@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-08-26	virtio: Protect vqs list access	Parav Pandit	2	-0/+9
	[ Upstream commit 0e566c8f0f2e8325e35f6f97e13cde5356b41814 ] VQs may be accessed to mark the device broken while they are created/destroyed. Hence protect the access to the vqs list. Fixes: e2dcdfe95c0b ("virtio: virtio_break_device() to mark all virtqueues broken.") Signed-off-by: Parav Pandit <parav@nvidia.com> Link: https://lore.kernel.org/r/20210721142648.1525924-4-parav@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30	virtio_ring: Fix two use after free bugs	Dan Carpenter	1	-2/+2
	[ Upstream commit e152d8af4220a05c9797591609151d404866beaa ] The "vq" struct is added to the "vdev->vqs" list prematurely. If we encounter an error later in the function then the "vq" is freed, but since it is still on the list that could lead to a use after free bug. Fixes: cbeedb72b97a ("virtio_ring: allocate desc state for split ring separately") Reported-by: Robert Buhren <robert.buhren@sect.tu-berlin.de> Reported-by: Felicitas Hetzelt <file@sect.tu-berlin.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/X8pGaG/zkI3jk8mk@mwanda Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30	virtio_ring: Cut and paste bugs in vring_create_virtqueue_packed()	Dan Carpenter	1	-2/+2
	[ Upstream commit ae93d8ea0fa701e84ab9df0db9fb60ec6c80d7b8 ] There is a copy and paste bug in the error handling of this code and it uses "ring_dma_addr" three times instead of "device_event_dma_addr" and "driver_event_dma_addr". Fixes: 1ce9e6055fa0 (" virtio_ring: introduce packed ring support") Reported-by: Robert Buhren <robert.buhren@sect.tu-berlin.de> Reported-by: Felicitas Hetzelt <file@sect.tu-berlin.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/X8pGRJlEzyn+04u2@mwanda Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26	virtio_ring: Avoid loop when vq is broken in virtqueue_poll	Mao Wenan	1	-0/+3
	[ Upstream commit 481a0d7422db26fb63e2d64f0652667a5c6d0f3e ] The loop may exist if vq->broken is true, virtqueue_get_buf_ctx_packed or virtqueue_get_buf_ctx_split will return NULL, so virtnet_poll will reschedule napi to receive packet, it will lead cpu usage(si) to 100%. call trace as below: virtnet_poll virtnet_receive virtqueue_get_buf_ctx virtqueue_get_buf_ctx_packed virtqueue_get_buf_ctx_split virtqueue_napi_complete virtqueue_poll //return true virtqueue_napi_schedule //it will reschedule napi to fix this, return false if vq is broken in virtqueue_poll. Signed-off-by: Mao Wenan <wenan.mao@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/1596354249-96204-1-git-send-email-wenan.mao@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-05	virtio_balloon: fix up endian-ness for free cmd id	Michael S. Tsirkin	1	-1/+5
	commit 168c358af2f8c5a37f8b5f877ba2cc93995606ee upstream. free cmd id is read using virtio endian, spec says all fields in balloon are LE. Fix it up. Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Wei Wang <wei.w.wang@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18	virtio_ring: Fix mem leak with vring_new_virtqueue()	Suman Anna	1	-2/+2
	commit f13f09a12cbd0c7b776e083c5d008b6c6a9c4e0b upstream. The functions vring_new_virtqueue() and __vring_new_virtqueue() are used with split rings, and any allocations within these functions are managed outside of the .we_own_ring flag. The commit cbeedb72b97a ("virtio_ring: allocate desc state for split ring separately") allocates the desc state within the __vring_new_virtqueue() but frees it only when the .we_own_ring flag is set. This leads to a memory leak when freeing such allocated virtqueues with the vring_del_virtqueue() function. Fix this by moving the desc_state free code outside the flag and only for split rings. Issue was discovered during testing with remoteproc and virtio_rpmsg. Fixes: cbeedb72b97a ("virtio_ring: allocate desc state for split ring separately") Signed-off-by: Suman Anna <s-anna@ti.com> Link: https://lore.kernel.org/r/20200224212643.30672-1-s-anna@ti.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18	virtio_balloon: Adjust label in virtballoon_probe	Nathan Chancellor	1	-1/+1
	commit 6ae4edab2fbf86ec92fbf0a8f0c60b857d90d50f upstream. Clang warns when CONFIG_BALLOON_COMPACTION is unset: ../drivers/virtio/virtio_balloon.c:963:1: warning: unused label 'out_del_vqs' [-Wunused-label] out_del_vqs: ^~~~~~~~~~~~ 1 warning generated. Move the label within the preprocessor block since it is only used when CONFIG_BALLOON_COMPACTION is set. Fixes: 1ad6f58ea936 ("virtio_balloon: Fix memory leaks on errors in virtballoon_probe()") Link: https://github.com/ClangBuiltLinux/linux/issues/886 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Link: https://lore.kernel.org/r/20200216004039.23464-1-natechancellor@gmail.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-24	virtio_balloon: prevent pfn array overflow	Michael S. Tsirkin	1	-0/+2
	[ Upstream commit 6e9826e77249355c09db6ba41cd3f84e89f4b614 ] Make sure, at build time, that pfn array is big enough to hold a single page. It happens to be true since the PAGE_SHIFT value at the moment is 20, which is 1M - exactly 256 4K balloon pages. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11	virtio_balloon: Fix memory leaks on errors in virtballoon_probe()	David Hildenbrand	1	-4/+9
	commit 1ad6f58ea9364b0a5d8ae06249653ac9304a8578 upstream. We forget to put the inode and unmount the kernfs used for compaction. Fixes: 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker") Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Wei Wang <wei.w.wang@intel.com> Cc: Liang Li <liang.z.li@intel.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200205163402.42627-3-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11	virtio-balloon: Fix memory leak when unloading while hinting is in progress	David Hildenbrand	1	-0/+4
	commit 6c22dc61c76b7e7d355f1697ba0ecf26d1334ba6 upstream. When unloading the driver while hinting is in progress, we will not release the free page blocks back to MM, resulting in a memory leak. Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Wei Wang <wei.w.wang@intel.com> Cc: Liang Li <liang.z.li@intel.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200205163402.42627-2-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11	virtio-pci: check name when counting MSI-X vectors	Daniel Verkamp	1	-1/+1
	commit 303090b513fd1ee45aa1536b71a3838dc054bc05 upstream. VQs without a name specified are not valid; they are skipped in the later loop that assigns MSI-X vectors to queues, but the per_vq_vectors loop above that counts the required number of vectors previously still counted any queue with a non-NULL callback as needing a vector. Add a check to the per_vq_vectors loop so that vectors with no name are not counted to make the two loops consistent. This prevents over-counting unnecessary vectors (e.g. for features which were not negotiated with the device). Cc: stable@vger.kernel.org Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Daniel Verkamp <dverkamp@chromium.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Wang, Wei W <wei.w.wang@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11	virtio-balloon: initialize all vq callbacks	Daniel Verkamp	1	-0/+2
	commit 5790b53390e18fdd21e70776e46d058c05eda2f2 upstream. Ensure that elements of the callbacks array that correspond to unavailable features are set to NULL; previously, they would be left uninitialized. Since the corresponding names array elements were explicitly set to NULL, the uninitialized callback pointers would not actually be dereferenced; however, the uninitialized callbacks elements would still be read in vp_find_vqs_msix() and used to calculate the number of MSI-X vectors required. Cc: stable@vger.kernel.org Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Daniel Verkamp <dverkamp@chromium.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17	virtio-balloon: fix managed page counts when migrating pages between zones	David Hildenbrand	1	-0/+11
	commit 63341ab03706e11a31e3dd8ccc0fbc9beaf723f0 upstream. In case we have to migrate a ballon page to a newpage of another zone, the managed page count of both zones is wrong. Paired with memory offlining (which will adjust the managed page count), we can trigger kernel crashes and all kinds of different symptoms. One way to reproduce: 1. Start a QEMU guest with 4GB, no NUMA 2. Hotplug a 1GB DIMM and online the memory to ZONE_NORMAL 3. Inflate the balloon to 1GB 4. Unplug the DIMM (be quick, otherwise unmovable data ends up on it) 5. Observe /proc/zoneinfo Node 0, zone Normal pages free 16810 min 24848885473806 low 18471592959183339 high 36918337032892872 spanned 262144 present 262144 managed 18446744073709533486 6. Do anything that requires some memory (e.g., inflate the balloon some more). The OOM goes crazy and the system crashes [ 238.324946] Out of memory: Killed process 537 (login) total-vm:27584kB, anon-rss:860kB, file-rss:0kB, shmem-rss:00 [ 238.338585] systemd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [ 238.339420] CPU: 0 PID: 1 Comm: systemd Tainted: G D W 5.4.0-next-20191204+ #75 [ 238.340139] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4 [ 238.341121] Call Trace: [ 238.341337] dump_stack+0x8f/0xd0 [ 238.341630] dump_header+0x61/0x5ea [ 238.341942] oom_kill_process.cold+0xb/0x10 [ 238.342299] out_of_memory+0x24d/0x5a0 [ 238.342625] __alloc_pages_slowpath+0xd12/0x1020 [ 238.343024] __alloc_pages_nodemask+0x391/0x410 [ 238.343407] pagecache_get_page+0xc3/0x3a0 [ 238.343757] filemap_fault+0x804/0xc30 [ 238.344083] ? ext4_filemap_fault+0x28/0x42 [ 238.344444] ext4_filemap_fault+0x30/0x42 [ 238.344789] __do_fault+0x37/0x1a0 [ 238.345087] __handle_mm_fault+0x104d/0x1ab0 [ 238.345450] handle_mm_fault+0x169/0x360 [ 238.345790] do_user_addr_fault+0x20d/0x490 [ 238.346154] do_page_fault+0x31/0x210 [ 238.346468] async_page_fault+0x43/0x50 [ 238.346797] RIP: 0033:0x7f47eba4197e [ 238.347110] Code: Bad RIP value. [ 238.347387] RSP: 002b:00007ffd7c0c1890 EFLAGS: 00010293 [ 238.347834] RAX: 0000000000000002 RBX: 000055d196a20a20 RCX: 00007f47eba4197e [ 238.348437] RDX: 0000000000000033 RSI: 00007ffd7c0c18c0 RDI: 0000000000000004 [ 238.349047] RBP: 00007ffd7c0c1c20 R08: 0000000000000000 R09: 0000000000000033 [ 238.349660] R10: 00000000ffffffff R11: 0000000000000293 R12: 0000000000000001 [ 238.350261] R13: ffffffffffffffff R14: 0000000000000000 R15: 00007ffd7c0c18c0 [ 238.350878] Mem-Info: [ 238.351085] active_anon:3121 inactive_anon:51 isolated_anon:0 [ 238.351085] active_file:12 inactive_file:7 isolated_file:0 [ 238.351085] unevictable:0 dirty:0 writeback:0 unstable:0 [ 238.351085] slab_reclaimable:5565 slab_unreclaimable:10170 [ 238.351085] mapped:3 shmem:111 pagetables:155 bounce:0 [ 238.351085] free:720717 free_pcp:2 free_cma:0 [ 238.353757] Node 0 active_anon:12484kB inactive_anon:204kB active_file:48kB inactive_file:28kB unevictable:0kB iss [ 238.355979] Node 0 DMA free:11556kB min:36kB low:48kB high:60kB reserved_highatomic:0KB active_anon:152kB inactivB [ 238.358345] lowmem_reserve[]: 0 2955 2884 2884 2884 [ 238.358761] Node 0 DMA32 free:2677864kB min:7004kB low:10028kB high:13052kB reserved_highatomic:0KB active_anon:0B [ 238.361202] lowmem_reserve[]: 0 0 72057594037927865 72057594037927865 72057594037927865 [ 238.361888] Node 0 Normal free:193448kB min:99395541895224kB low:73886371836733356kB high:147673348131571488kB reB [ 238.364765] lowmem_reserve[]: 0 0 0 0 0 [ 238.365101] Node 0 DMA: 74kB (U) 58kB (UE) 616kB (UME) 232kB (UM) 164kB (U) 2128kB (UE) 3256kB (UME) 2512B [ 238.366379] Node 0 DMA32: 04kB 18kB (U) 216kB (UM) 232kB (UM) 264kB (UM) 1128kB (U) 1256kB (U) 1512kB (U)B [ 238.367654] Node 0 Normal: 19854kB (UME) 13218kB (UME) 84416kB (UME) 52432kB (UME) 30064kB (UME) 138128kB (B [ 238.369184] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 238.369915] 130 total pagecache pages [ 238.370241] 0 pages in swap cache [ 238.370533] Swap cache stats: add 0, delete 0, find 0/0 [ 238.370981] Free swap = 0kB [ 238.371239] Total swap = 0kB [ 238.371488] 1048445 pages RAM [ 238.371756] 0 pages HighMem/MovableOnly [ 238.372090] 306992 pages reserved [ 238.372376] 0 pages cma reserved [ 238.372661] 0 pages hwpoisoned In another instance (older kernel), I was able to observe this (negative page count :/): [ 180.896971] Offlined Pages 32768 [ 182.667462] Offlined Pages 32768 [ 184.408117] Offlined Pages 32768 [ 186.026321] Offlined Pages 32768 [ 187.684861] Offlined Pages 32768 [ 189.227013] Offlined Pages 32768 [ 190.830303] Offlined Pages 32768 [ 190.833071] Built 1 zonelists, mobility grouping on. Total pages: -36920272750453009 In another instance (older kernel), I was no longer able to start any process: [root@vm ~]# [ 214.348068] Offlined Pages 32768 [ 215.973009] Offlined Pages 32768 cat /proc/meminfo -bash: fork: Cannot allocate memory [root@vm ~]# cat /proc/meminfo -bash: fork: Cannot allocate memory Fix it by properly adjusting the managed page count when migrating if the zone changed. The managed page count of the zones now looks after unplug of the DIMM (and after deflating the balloon) just like before inflating the balloon (and plugging+onlining the DIMM). We'll temporarily modify the totalram page count. If this ever becomes a problem, we can fine tune by providing helpers that don't touch the totalram pages (e.g., adjust_zone_managed_page_count()). Please note that fixing up the managed page count is only necessary when we adjusted the managed page count when inflating - only if we don't have VIRTIO_BALLOON_F_DEFLATE_ON_OOM. With that feature, the managed page count is not touched when inflating/deflating. Reported-by: Yumei Huang <yuhuang@redhat.com> Fixes: 3dcc0571cd64 ("mm: correctly update zone->managed_pages") Cc: <stable@vger.kernel.org> # v3.11+ Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-20	virtio_balloon: fix shrinker count	Wei Wang	1	-1/+1
	Instead of multiplying by page order, virtio balloon divided by page order. The result is that it can return 0 if there are a bit less than MAX_ORDER - 1 pages in use, and then shrinker scan won't be called. Cc: stable@vger.kernel.org Fixes: 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker") Signed-off-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>
2019-11-20	virtio_balloon: fix shrinker scan number of pages	Michael S. Tsirkin	1	-6/+12
	virtio_balloon_shrinker_scan should return number of system pages freed, but because it's calling functions that deal with balloon pages, it gets confused and sometimes returns the number of balloon pages. It does not matter practically as the exact number isn't used, but it seems better to be consistent in case someone starts using this API. Further, if we ever tried to iteratively leak pages as virtio_balloon_shrinker_scan tries to do, we'd run into issues - this is because freed_pages was accumulating total freed pages, but was also subtracted on each iteration from pages_to_free, which can result in either leaking less memory than we were supposed to free, or more if pages_to_free underruns. On a system with 4K pages we are lucky that we are never asked to leak more than 128 pages while we can leak up to 256 at a time, but it looks like a real issue for systems with page size != 4K. Fixes: 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker") Reported-by: Khazhismel Kumykov <khazhy@google.com> Reviewed-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-11-19	virtio_ring: fix return code on DMA mapping fails	Halil Pasic	1	-2/+2
	Commit 780bc7903a32 ("virtio_ring: Support DMA APIs") makes virtqueue_add() return -EIO when we fail to map our I/O buffers. This is a very realistic scenario for guests with encrypted memory, as swiotlb may run out of space, depending on it's size and the I/O load. The virtio-blk driver interprets -EIO form virtqueue_add() as an IO error, despite the fact that swiotlb full is in absence of bugs a recoverable condition. Let us change the return code to -ENOMEM, and make the block layer recover form these failures when virtio-blk encounters the condition described above. Cc: stable@vger.kernel.org Fixes: 780bc7903a32 ("virtio_ring: Support DMA APIs") Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Michael Mueller <mimu@linux.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-10-28	virtio_ring: fix stalls for packed rings	Marvin Liu	1	-4/+3
	When VIRTIO_F_RING_EVENT_IDX is negotiated, virtio devices can use virtqueue_enable_cb_delayed_packed to reduce the number of device interrupts. At the moment, this is the case for virtio-net when the napi_tx module parameter is set to false. In this case, the virtio driver selects an event offset and expects that the device will send a notification when rolling over the event offset in the ring. However, if this roll-over happens before the event suppression structure update, the notification won't be sent. To address this race condition the driver needs to check wether the device rolled over the offset after updating the event suppression structure. With VIRTIO_F_RING_PACKED, the virtio driver did this by reading the flags field of the descriptor at the specified offset. Unfortunately, checking at the event offset isn't reliable: if descriptors are chained (e.g. when INDIRECT is off) not all descriptors are overwritten by the device, so it's possible that the device skipped the specific descriptor driver is checking when writing out used descriptors. If this happens, the driver won't detect the race condition and will incorrectly expect the device to send a notification. For virtio-net, the result will be a TX queue stall, with the transmission getting blocked forever. With the packed ring, it isn't easy to find a location which is guaranteed to change upon the roll-over, except the next device descriptor, as described in the spec: Writes of device and driver descriptors can generally be reordered, but each side (driver and device) are only required to poll (or test) a single location in memory: the next device descriptor after the one they processed previously, in circular order. while this might be sub-optimal, let's do exactly this for now. Cc: stable@vger.kernel.org Cc: Jason Wang <jasowang@redhat.com> Fixes: f51f982682e2a ("virtio_ring: leverage event idx in packed ring") Signed-off-by: Marvin Liu <yong.liu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-09-09	virtio_ring: fix unmap of indirect descriptors	Matthias Lange	1	-2/+6
	The function virtqueue_add_split() DMA-maps the scatterlist buffers. In case a mapping error occurs the already mapped buffers must be unmapped. This happens by jumping to the 'unmap_release' label. In case of indirect descriptors the release is wrong and may leak kernel memory. Because the implementation assumes that the head descriptor is already mapped it starts iterating over the descriptor list starting from the head descriptor. However for indirect descriptors the head descriptor is never mapped in case of an error. The fix is to initialize the start index with zero in case of indirect descriptors and use the 'desc' pointer directly for iterating over the descriptor chain. Signed-off-by: Matthias Lange <matthias.lange@kernkonzept.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-07-19	Merge branch 'work.mount0' of ↵	Linus Torvalds	1	-9/+4
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs mount updates from Al Viro: "The first part of mount updates. Convert filesystems to use the new mount API" * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) mnt_init(): call shmem_init() unconditionally constify ksys_mount() string arguments don't bother with registering rootfs init_rootfs(): don't bother with init_ramfs_fs() vfs: Convert smackfs to use the new mount API vfs: Convert selinuxfs to use the new mount API vfs: Convert securityfs to use the new mount API vfs: Convert apparmorfs to use the new mount API vfs: Convert openpromfs to use the new mount API vfs: Convert xenfs to use the new mount API vfs: Convert gadgetfs to use the new mount API vfs: Convert oprofilefs to use the new mount API vfs: Convert ibmasmfs to use the new mount API vfs: Convert qib_fs/ipathfs to use the new mount API vfs: Convert efivarfs to use the new mount API vfs: Convert configfs to use the new mount API vfs: Convert binfmt_misc to use the new mount API convenience helper: get_tree_single() convenience helper get_tree_nodev() vfs: Kill sget_userns() ...
2019-07-18	Merge tag 'libnvdimm-for-5.3' of ↵	Linus Torvalds	1	-0/+11
	git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "Primarily just the virtio_pmem driver: - virtio_pmem The new virtio_pmem facility introduces a paravirtualized persistent memory device that allows a guest VM to use DAX mechanisms to access a host-file with host-page-cache. It arranges for MAP_SYNC to be disabled and instead triggers a host fsync() when a 'write-cache flush' command is sent to the virtual disk device. - Miscellaneous small fixups" * tag 'libnvdimm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: virtio_pmem: fix sparse warning xfs: disable map_sync for async flush ext4: disable map_sync for async flush dax: check synchronous mapping is supported dm: enable synchronous dax libnvdimm: add dax_dev sync flag virtio-pmem: Add virtio pmem driver libnvdimm: nd_region flush callback support libnvdimm, namespace: Drop uuid_t implementation detail
2019-07-11	virtio-mmio: add error check for platform_get_irq	Ihor Matushchak	1	-1/+6
	in vm_find_vqs() irq has a wrong type so, in case of no IRQ resource defined, wrong parameter will be passed to request_irq() Signed-off-by: Ihor Matushchak <ihor.matushchak@foobox.net> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Ivan T. Ivanov <iivanov.xz@gmail.com>
2019-07-06	virtio-pmem: Add virtio pmem driver	Pankaj Gupta	1	-0/+11
	This patch adds virtio-pmem driver for KVM guest. Guest reads the persistent memory range information from Qemu over VIRTIO and registers it on nvdimm_bus. It also creates a nd_region object with the persistent memory range information so that existing 'nvdimm/pmem' driver can reserve this into system memory map. This way 'virtio-pmem' driver uses existing functionality of pmem driver to register persistent memory compatible for DAX capable filesystems. This also provides function to perform guest flush over VIRTIO from 'pmem' driver when userspace performs flush on DAX memory range. Signed-off-by: Pankaj Gupta <pagupta@redhat.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jakub Staron <jstaron@google.com> Tested-by: Jakub Staron <jstaron@google.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-05-27	virtio: Fix indentation of VIRTIO_MMIO	Fabrizio Castro	1	-4/+4
	VIRTIO_MMIO config option block starts with a space, fix that. Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-05-26	vfs: Convert virtio_balloon to use the new mount API	David Howells	1	-4/+4
	Convert the virtio_balloon filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: David Howells <dhowells@redhat.com> cc: "Michael S. Tsirkin" <mst@redhat.com> cc: Jason Wang <jasowang@redhat.com> cc: virtualization@lists.linux-foundation.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-26	mount_pseudo(): drop 'name' argument, switch to d_make_root()	Al Viro	1	-2/+1
	Once upon a time we used to set ->d_name of e.g. pipefs root so that d_path() on pipes would work. These days it's completely pointless - dentries of pipes are not even connected to pipefs root. However, mount_pseudo() had set the root dentry name (passed as the second argument) and callers kept inventing names to pass to it. Including those that didn't have any non-root dentries to start with... All of that had been pointless for about 8 years now; it's time to get rid of that cargo-culting... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-24	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 102	Thomas Gleixner	2	-28/+2
	Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 51 franklin st fifth floor boston ma 02110 1301 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 50 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190523091649.499889647@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 78	Thomas Gleixner	5	-21/+5
	Based on 1 normalized pattern(s): this work is licensed under the terms of the gnu gpl version 2 or later see the copying file in the top level directory extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 6 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520075210.858783702@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21	treewide: Add SPDX license identifier - Makefile/Kconfig	Thomas Gleixner	1	-0/+1
	Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21	treewide: Add SPDX license identifier for more missed files	Thomas Gleixner	2	-0/+2
	Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-20	balloon: don't bother with dentry_operations	Al Viro	1	-5/+1
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-12	virtio/virtio_ring: do some comment fixes	Jiang Biao	1	-13/+14
	There are lots of mismatches between comments and codes, this patch do these comment fixes. Signed-off-by: Jiang Biao <benbjiang@tencent.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-05-12	virtio_ring: Fix potential mem leak in virtqueue_add_indirect_packed	YueHaibing	1	-0/+1
	'desc' should be freed before leaving from err handing path. Fixes: 1ce9e6055fa0 ("virtio_ring: introduce packed ring support") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> stable@vger.kernel.org
2019-04-09	virtio: Honour 'may_reduce_num' in vring_create_virtqueue	Cornelia Huck	1	-0/+2
	vring_create_virtqueue() allows the caller to specify via the may_reduce_num parameter whether the vring code is allowed to allocate a smaller ring than specified. However, the split ring allocation code tries to allocate a smaller ring on allocation failure regardless of what the caller specified. This may cause trouble for e.g. virtio-pci in legacy mode, which does not support ring resizing. (The packed ring code does not resize in any case.) Let's fix this by bailing out immediately in the split ring code if the requested size cannot be allocated and may_reduce_num has not been specified. While at it, fix a typo in the usage instructions. Fixes: 2a2d1382fe9d ("virtio: Add improved queue allocation API") Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Jens Freimann <jfreimann@redhat.com>