kernel/linux.git/drivers/xen, branch v3.4.105

xen/manage: Always freeze/thaw processes when suspend/resuming

2014-12-01T10:02:25+00:00

commit 61a734d305e16944b42730ef582a7171dc733321 upstream. Always freeze processes when suspending and thaw processes when resuming to prevent a race noticeable with HVM guests. This prevents a deadlock where the khubd kthread (which is designed to be freezable) acquires a usb device lock and then tries to allocate memory which requires the disk which hasn't been resumed yet. Meanwhile, the xenwatch thread deadlocks waiting for the usb device lock. Freezing processes fixes this because the khubd thread is only thawed after the xenwatch thread finishes resuming all the devices. Signed-off-by: Ross Lagerwall Signed-off-by: David Vrabel [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li

xen/events: mask events when changing their VCPU binding

2014-03-11T23:10:07+00:00

commit 5e72fdb8d827560893642e85a251d339109a00f4 upstream. commit 4704fe4f03a5ab27e3c36184af85d5000e0f8a48 upstream. When a event is being bound to a VCPU there is a window between the EVTCHNOP_bind_vpcu call and the adjustment of the local per-cpu masks where an event may be lost. The hypervisor upcalls the new VCPU but the kernel thinks that event is still bound to the old VCPU and ignores it. There is even a problem when the event is being bound to the same VCPU as there is a small window beween the clear_bit() and set_bit() calls in bind_evtchn_to_cpu(). When scanning for pending events, the kernel may read the bit when it is momentarily clear and ignore the event. Avoid this by masking the event during the whole bind operation. Signed-off-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk Reviewed-by: Jan Beulich [bwh: Backported to 3.2: remove the BM() cast] Signed-off-by: Ben Hutchings Cc: Yijing Wang Signed-off-by: Greg Kroah-Hartman

xen-gnt: prevent adding duplicate gnt callbacks

2013-09-27T00:15:30+00:00

commit 5f338d9001094a56cf87bd8a280b4e7ff953bb59 upstream. With the current implementation, the callback in the tail of the list can be added twice, because the check done in gnttab_request_free_callback is bogus, callback->next can be NULL if it is the last callback in the list. If we add the same callback twice we end up with an infinite loop, were callback == callback->next. Replace this check with a proper one that iterates over the list to see if the callback has already been added. Signed-off-by: Roger Pau Monné Cc: Konrad Rzeszutek Wilk Cc: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk Acked-by: Matt Wilson Reviewed-by: David Vrabel Signed-off-by: Greg Kroah-Hartman

xen/events: initialize local per-cpu mask for all possible events

2013-08-29T16:50:12+00:00

commit 84ca7a8e45dafb49cd5ca90a343ba033e2885c17 upstream. The sizeof() argument in init_evtchn_cpu_bindings() is incorrect resulting in only the first 64 (or 32 in 32-bit guests) ports having their bindings being initialized to VCPU 0. In most cases this does not cause a problem as request_irq() will set the irq affinity which will set the correct local per-cpu mask. However, if the request_irq() is called on a VCPU other than 0, there is a window between the unmasking of the event and the affinity being set were an event may be lost because it is not locally unmasked on any VCPU. If request_irq() is called on VCPU 0 then local irqs are disabled during the window and the race does not occur. Fix this by initializing all NR_EVENT_CHANNEL bits in the local per-cpu masks. Signed-off-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman

xen/evtchn: avoid a deadlock when unbinding an event channel

2013-08-04T08:26:00+00:00

commit 179fbd5a45f0d4034cc6fd37b8d367a3b79663c4 upstream. Unbinding an event channel (either with the ioctl or when the evtchn device is closed) may deadlock because disable_irq() is called with port_user_lock held which is also locked by the interrupt handler. Think of the IOCTL_EVTCHN_UNBIND is being serviced, the routine has just taken the lock, and an interrupt happens. The evtchn_interrupt is invoked, tries to take the lock and spins forever. A quick glance at the code shows that the spinlock is a local IRQ variant. Unfortunately that does not help as "disable_irq() waits for the interrupt handler on all CPUs to stop running. If the irq occurs on another VCPU, it tries to take port_user_lock and can't because the unbind ioctl is holding it." (from David). Hence we cannot depend on the said spinlock to protect us. We could make it a system wide IRQ disable spinlock but there is a better way. We can piggyback on the fact that the existence of the spinlock is to make get_port_user() checks be up-to-date. And we can alter those checks to not depend on the spin lock (as it's protected by u->bind_mutex in the ioctl) and can remove the unnecessary locking (this is IOCTL_EVTCHN_UNBIND) path. In the interrupt handler we cannot use the mutex, but we do not need it. "The unbind disables the irq before making the port user stale, so when you clear it you are guaranteed that the interrupt handler that might use that port cannot be running." (from David). Hence this patch removes the spinlock usage on the teardown path and piggybacks on disable_irq happening before we muck with the get_port_user() data. This ensures that the interrupt handler will never run on stale data. Signed-off-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk [v1: Expanded the commit description a bit] Signed-off-by: Jonghwan Choi Signed-off-by: Greg Kroah-Hartman

xen-pciback: rate limit error messages from xen_pcibk_enable_msi{,x}()

2013-06-13T16:45:02+00:00

commit 51ac8893a7a51b196501164e645583bf78138699 upstream. ... as being guest triggerable (e.g. by invoking XEN_PCI_OP_enable_msi{,x} on a device not being MSI/MSI-X capable). This is CVE-2013-0231 / XSA-43. Also make the two messages uniform in both their wording and severity. Signed-off-by: Jan Beulich Acked-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk [stable tree: Added two extra #include files] Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman

xen/events: Handle VIRQ_TIMER before any other hardirq in event loop.

2013-06-07T19:49:34+00:00

commit bee980d9e9642e96351fa3ca9077b853ecf62f57 upstream. This avoids any other hardirq handler seeing a very stale jiffies value immediately after wakeup from a long idle period. The one observable symptom of this was a USB keyboard, with software keyboard repeat, which would always repeat a key immediately that it was pressed. This is due to the key press waking the guest, the key handler immediately runs, sees an old jiffies value, and then that jiffies value significantly updated, before the key is unpressed. Reviewed-by: David Vrabel Signed-off-by: Keir Fraser Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman

xen/pciback: Don't disable a PCI device that is already disabled.

2013-03-20T20:04:57+00:00

commit bdc5c1812cea6efe1aaefb3131fcba28cd0b2b68 upstream. While shuting down a HVM guest with pci devices passed through we get this: pciback 0000:04:00.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100002) ------------[ cut here ]------------ WARNING: at drivers/pci/pci.c:1397 pci_disable_device+0x88/0xa0() Hardware name: MS-7640 Device pciback disabling already-disabled device Modules linked in: Pid: 53, comm: xenwatch Not tainted 3.9.0-rc1-20130304a+ #1 Call Trace: [] warn_slowpath_common+0x7a/0xc0 [] warn_slowpath_fmt+0x41/0x50 [] pci_disable_device+0x88/0xa0 [] xen_pcibk_reset_device+0x37/0xd0 [] ? pcistub_put_pci_dev+0x6f/0x120 [] pcistub_put_pci_dev+0x8d/0x120 [] __xen_pcibk_release_devices+0x59/0xa0 This fixes the bug. Reported-and-Tested-by: Sander Eikelenboom Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman

xen: close evtchn port if binding to irq fails

2013-02-28T14:59:00+00:00

commit e7e44e444876478d50630f57b0c31d29f6725020 upstream. Signed-off-by: Wei Liu Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman

xen/grant-table: correctly initialize grant table version 1

2013-01-21T19:45:25+00:00

commit d0b4d64aadb9f4a90669848de9ef3819050a98cd upstream. Commit 85ff6acb075a484780b3d763fdf41596d8fc0970 (xen/granttable: Grant tables V2 implementation) changed the GREFS_PER_GRANT_FRAME macro from a constant to a conditional expression. The expression depends on grant_table_version being appropriately set. Unfortunately, at init time grant_table_version will be 0. The GREFS_PER_GRANT_FRAME conditional expression checks for "grant_table_version == 1", and therefore returns the number of grant references per frame for v2. This causes gnttab_init() to allocate fewer pages for gnttab_list, as a frame can old half the number of v2 entries than v1 entries. After gnttab_resume() is called, grant_table_version is appropriately set. nr_init_grefs will then be miscalculated and gnttab_free_count will hold a value larger than the actual number of free gref entries. If a guest is heavily utilizing improperly initialized v1 grant tables, memory corruption can occur. One common manifestation is corruption of the vmalloc list, resulting in a poisoned pointer derefrence when accessing /proc/meminfo or /proc/vmallocinfo: [ 40.770064] BUG: unable to handle kernel paging request at 0000200200001407 [ 40.770083] IP: [] get_vmalloc_info+0x70/0x110 [ 40.770102] PGD 0 [ 40.770107] Oops: 0000 [#1] SMP [ 40.770114] CPU 10 This patch introduces a static variable, grefs_per_grant_frame, to cache the calculated value. gnttab_init() now calls gnttab_request_version() early so that grant_table_version and grefs_per_grant_frame can be appropriately set. A few BUG_ON()s have been added to prevent this type of bug from reoccurring in the future. Signed-off-by: Matt Wilson Reviewed-and-Tested-by: Steven Noonan Acked-by: Ian Campbell Cc: Konrad Rzeszutek Wilk Cc: Annie Li Cc: xen-devel@lists.xen.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman