<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/vfio, branch v4.14.85</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.14.85</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.14.85'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2018-08-03T05:50:23+00:00</updated>
<entry>
<title>vfio/type1: Fix task tracking for QEMU vCPU hotplug</title>
<updated>2018-08-03T05:50:23+00:00</updated>
<author>
<name>Alex Williamson</name>
<email>alex.williamson@redhat.com</email>
</author>
<published>2018-05-11T15:05:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=827faa4eb5668e25baf7f3752dc7ae7fd46894c2'/>
<id>urn:sha1:827faa4eb5668e25baf7f3752dc7ae7fd46894c2</id>
<content type='text'>
[ Upstream commit 48d8476b41eed63567dd2f0ad125c895b9ac648a ]

MAP_DMA ioctls might be called from various threads within a process,
for example when using QEMU, the vCPU threads are often generating
these calls and we therefore take a reference to that vCPU task.
However, QEMU also supports vCPU hotplug on some machines and the task
that called MAP_DMA may have exited by the time UNMAP_DMA is called,
resulting in the mm_struct pointer being NULL and thus a failure to
match against the existing mapping.

To resolve this, we instead take a reference to the thread
group_leader, which has the same mm_struct and resource limits, but
is less likely exit, at least in the QEMU case.  A difficulty here is
guaranteeing that the capabilities of the group_leader match that of
the calling thread, which we resolve by tracking CAP_IPC_LOCK at the
time of calling rather than at an indeterminate time in the future.
Potentially this also results in better efficiency as this is now
recorded once per MAP_DMA ioctl.

Reported-by: Xu Yandong &lt;xuyandong2@huawei.com&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>vfio/mdev: Check globally for duplicate devices</title>
<updated>2018-08-03T05:50:22+00:00</updated>
<author>
<name>Alex Williamson</name>
<email>alex.williamson@redhat.com</email>
</author>
<published>2018-05-15T19:53:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8f38152f2ae22b09589a6160086f5d71881ae30b'/>
<id>urn:sha1:8f38152f2ae22b09589a6160086f5d71881ae30b</id>
<content type='text'>
[ Upstream commit 002fe996f67f4f46d8917b14cfb6e4313c20685a ]

When we create an mdev device, we check for duplicates against the
parent device and return -EEXIST if found, but the mdev device
namespace is global since we'll link all devices from the bus.  We do
catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
with it comes a kernel warning and stack trace for trying to create
duplicate sysfs links, which makes it an undesirable response.

Therefore we should really be looking for duplicates across all mdev
parent devices, or as implemented here, against our mdev device list.
Using mdev_list to prevent duplicates means that we can remove
mdev_parent.lock, but in order not to serialize mdev device creation
and removal globally, we add mdev_device.active which allows UUIDs to
be reserved such that we can drop the mdev_list_lock before the mdev
device is fully in place.

Two behavioral notes; first, mdev_parent.lock had the side-effect of
serializing mdev create and remove ops per parent device.  This was
an implementation detail, not an intentional guarantee provided to
the mdev vendor drivers.  Vendor drivers can trivially provide this
serialization internally if necessary.  Second, review comments note
the new -EAGAIN behavior when the device, and in particular the remove
attribute, becomes visible in sysfs.  If a remove is triggered prior
to completion of mdev_device_create() the user will see a -EAGAIN
error.  While the errno is different, receiving an error during this
period is not, the previous implementation returned -ENODEV for the
same condition.  Furthermore, the consistency to the user is improved
in the case where mdev_device_remove_ops() returns error.  Previously
concurrent calls to mdev_device_remove() could see the device
disappear with -ENODEV and return in the case of error.  Now a user
would see -EAGAIN while the device is in this transitory state.

Reviewed-by: Kirti Wankhede &lt;kwankhede@nvidia.com&gt;
Reviewed-by: Cornelia Huck &lt;cohuck@redhat.com&gt;
Acked-by: Halil Pasic &lt;pasic@linux.ibm.com&gt;
Acked-by: Zhenyu Wang &lt;zhenyuw@linux.intel.com&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>vfio: platform: Fix reset module leak in error path</title>
<updated>2018-08-03T05:50:22+00:00</updated>
<author>
<name>Geert Uytterhoeven</name>
<email>geert+renesas@glider.be</email>
</author>
<published>2018-04-11T09:15:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ca014df110e97a4765f7e581376b414f26496db7'/>
<id>urn:sha1:ca014df110e97a4765f7e581376b414f26496db7</id>
<content type='text'>
[ Upstream commit 28a68387888997e8a7fa57940ea5d55f2e16b594 ]

If the IOMMU group setup fails, the reset module is not released.

Fixes: b5add544d677d363 ("vfio, platform: make reset driver a requirement by default")
Signed-off-by: Geert Uytterhoeven &lt;geert+renesas@glider.be&gt;
Reviewed-by: Eric Auger &lt;eric.auger@redhat.com&gt;
Reviewed-by: Simon Horman &lt;horms+renesas@verge.net.au&gt;
Acked-by: Eric Auger &lt;eric.auger@redhat.com&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>KVM: PPC: Check if IOMMU page is contained in the pinned physical page</title>
<updated>2018-07-28T05:55:41+00:00</updated>
<author>
<name>Alexey Kardashevskiy</name>
<email>aik@ozlabs.ru</email>
</author>
<published>2018-07-17T07:19:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=58113603a4ea409a2b77daa1e1caa663c648c748'/>
<id>urn:sha1:58113603a4ea409a2b77daa1e1caa663c648c748</id>
<content type='text'>
commit 76fa4975f3ed12d15762bc979ca44078598ed8ee upstream.

A VM which has:
 - a DMA capable device passed through to it (eg. network card);
 - running a malicious kernel that ignores H_PUT_TCE failure;
 - capability of using IOMMU pages bigger that physical pages
can create an IOMMU mapping that exposes (for example) 16MB of
the host physical memory to the device when only 64K was allocated to the VM.

The remaining 16MB - 64K will be some other content of host memory, possibly
including pages of the VM, but also pages of host kernel memory, host
programs or other VMs.

The attacking VM does not control the location of the page it can map,
and is only allowed to map as many pages as it has pages of RAM.

We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that
an IOMMU page is contained in the physical page so the PCI hardware won't
get access to unassigned host memory; however this check is missing in
the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and
did not hit this yet as the very first time when the mapping happens
we do not have tbl::it_userspace allocated yet and fall back to
the userspace which in turn calls VFIO IOMMU driver, this fails and
the guest does not retry,

This stores the smallest preregistered page size in the preregistered
region descriptor and changes the mm_iommu_xxx API to check this against
the IOMMU page size.

This calculates maximum page size as a minimum of the natural region
alignment and compound page size. For the page shift this uses the shift
returned by find_linux_pte() which indicates how the page is mapped to
the current userspace - if the page is huge and this is not a zero, then
it is a leaf pte and the page is mapped within the range.

Fixes: 121f80ba68f1 ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Alexey Kardashevskiy &lt;aik@ozlabs.ru&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Signed-off-by: Alexey Kardashevskiy &lt;aik@ozlabs.ru&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</content>
</entry>
<entry>
<title>vfio/spapr: Use IOMMU pageshift rather than pagesize</title>
<updated>2018-07-25T09:25:08+00:00</updated>
<author>
<name>Alexey Kardashevskiy</name>
<email>aik@ozlabs.ru</email>
</author>
<published>2018-07-17T07:19:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9a2e4a01ded2c88e11ef6640443462862c990a9b'/>
<id>urn:sha1:9a2e4a01ded2c88e11ef6640443462862c990a9b</id>
<content type='text'>
commit 1463edca6734d42ab4406fa2896e20b45478ea36 upstream.

The size is always equal to 1 page so let's use this. Later on this will
be used for other checks which use page shifts to check the granularity
of access.

This should cause no behavioral change.

Cc: stable@vger.kernel.org # v4.12+
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Acked-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Alexey Kardashevskiy &lt;aik@ozlabs.ru&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>vfio/pci: Fix potential Spectre v1</title>
<updated>2018-07-25T09:25:08+00:00</updated>
<author>
<name>Gustavo A. R. Silva</name>
<email>gustavo@embeddedor.com</email>
</author>
<published>2018-07-17T17:39:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a5b8eae536723d3377d21fe60cb2f294332bb322'/>
<id>urn:sha1:a5b8eae536723d3377d21fe60cb2f294332bb322</id>
<content type='text'>
commit 0e714d27786ce1fb3efa9aac58abc096e68b1c2a upstream.

info.index can be indirectly controlled by user-space, hence leading
to a potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

drivers/vfio/pci/vfio_pci.c:734 vfio_pci_ioctl()
warn: potential spectre issue 'vdev-&gt;region'

Fix this by sanitizing info.index before indirectly using it to index
vdev-&gt;region

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel&amp;m=152449131114778&amp;w=2

Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva &lt;gustavo@embeddedor.com&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>vfio: Use get_user_pages_longterm correctly</title>
<updated>2018-07-11T14:29:15+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@mellanox.com</email>
</author>
<published>2018-06-29T17:31:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5d8ddc819c848a6009aed5a560e8d53f7cbafa08'/>
<id>urn:sha1:5d8ddc819c848a6009aed5a560e8d53f7cbafa08</id>
<content type='text'>
commit bb94b55af3461e26b32f0e23d455abeae0cfca5d upstream.

The patch noted in the fixes below converted get_user_pages_fast() to
get_user_pages_longterm(), however the two calls differ in a few ways.

First _fast() is documented to not require the mmap_sem, while _longterm()
is documented to need it. Hold the mmap sem as required.

Second, _fast accepts an 'int write' while _longterm uses 'unsigned int
gup_flags', so the expression '!!(prot &amp; IOMMU_WRITE)' is only working by
luck as FOLL_WRITE is currently == 0x1. Use the expected FOLL_WRITE
constant instead.

Fixes: 94db151dc892 ("vfio: disable filesystem-dax page pinning")
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
Acked-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>vfio/pci: Virtualize Maximum Read Request Size</title>
<updated>2018-04-24T07:36:34+00:00</updated>
<author>
<name>Alex Williamson</name>
<email>alex.williamson@redhat.com</email>
</author>
<published>2017-10-02T18:39:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2ccdea040e81553aaaa9f380f0ff745f71d27eff'/>
<id>urn:sha1:2ccdea040e81553aaaa9f380f0ff745f71d27eff</id>
<content type='text'>
commit cf0d53ba4947aad6e471491d5b20a567cbe92e56 upstream.

MRRS defines the maximum read request size a device is allowed to
make.  Drivers will often increase this to allow more data transfer
with a single request.  Completions to this request are bound by the
MPS setting for the bus.  Aside from device quirks (none known), it
doesn't seem to make sense to set an MRRS value less than MPS, yet
this is a likely scenario given that user drivers do not have a
system-wide view of the PCI topology.  Virtualize MRRS such that the
user can set MRRS &gt;= MPS, but use MPS as the floor value that we'll
write to hardware.

Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>vfio: disable filesystem-dax page pinning</title>
<updated>2018-03-09T06:41:06+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2018-02-04T18:34:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d5168ce354349fb8e336c75cedc447921b81e6d3'/>
<id>urn:sha1:d5168ce354349fb8e336c75cedc447921b81e6d3</id>
<content type='text'>
commit 94db151dc89262bfa82922c44e8320cea2334667 upstream.

Filesystem-DAX is incompatible with 'longterm' page pinning. Without
page cache indirection a DAX mapping maps filesystem blocks directly.
This means that the filesystem must not modify a file's block map while
any page in a mapping is pinned. In order to prevent the situation of
userspace holding of filesystem operations indefinitely, disallow
'longterm' Filesystem-DAX mappings.

RDMA has the same conflict and the plan there is to add a 'with lease'
mechanism to allow the kernel to notify userspace that the mapping is
being torn down for block-map maintenance. Perhaps something similar can
be put in place for vfio.

Note that xfs and ext4 still report:

   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk"

...at mount time, and resolving the dax-dma-vs-truncate problem is one
of the last hurdles to remove that designation.

Acked-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: kvm@vger.kernel.org
Cc: &lt;stable@vger.kernel.org&gt;
Reported-by: Haozhong Zhang &lt;haozhong.zhang@intel.com&gt;
Tested-by: Haozhong Zhang &lt;haozhong.zhang@intel.com&gt;
Fixes: d475c6346a38 ("dax,ext2: replace XIP read and write with DAX I/O")
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>vfio/pci: Virtualize Maximum Payload Size</title>
<updated>2017-12-25T13:26:29+00:00</updated>
<author>
<name>Alex Williamson</name>
<email>alex.williamson@redhat.com</email>
</author>
<published>2017-10-02T18:39:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b27bbf1f5b9e62b7b0b8ce9a4f5e474c3119a9a9'/>
<id>urn:sha1:b27bbf1f5b9e62b7b0b8ce9a4f5e474c3119a9a9</id>
<content type='text'>
[ Upstream commit 523184972b282cd9ca17a76f6ca4742394856818 ]

With virtual PCI-Express chipsets, we now see userspace/guest drivers
trying to match the physical MPS setting to a virtual downstream port.
Of course a lone physical device surrounded by virtual interconnects
cannot make a correct decision for a proper MPS setting.  Instead,
let's virtualize the MPS control register so that writes through to
hardware are disallowed.  Userspace drivers like QEMU assume they can
write anything to the device and we'll filter out anything dangerous.
Since mismatched MPS can lead to AER and other faults, let's add it
to the kernel side rather than relying on userspace virtualization to
handle it.

Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Reviewed-by: Eric Auger &lt;eric.auger@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
