blob: 69ee62d3d4dccce1a04f5afb6d4ae4358d7dd83e (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
|
POWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1)
==========================================================
Device types supported:
KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1
This device acts as a VM interrupt controller. It provides the KVM
interface to configure the interrupt sources of a VM in the underlying
POWER9 XIVE interrupt controller.
Only one XIVE instance may be instantiated. A guest XIVE device
requires a POWER9 host and the guest OS should have support for the
XIVE native exploitation interrupt mode. If not, it should run using
the legacy interrupt mode, referred as XICS (POWER7/8).
* Device Mappings
The KVM device exposes different MMIO ranges of the XIVE HW which
are required for interrupt management. These are exposed to the
guest in VMAs populated with a custom VM fault handler.
1. Thread Interrupt Management Area (TIMA)
Each thread has an associated Thread Interrupt Management context
composed of a set of registers. These registers let the thread
handle priority management and interrupt acknowledgment. The most
important are :
- Interrupt Pending Buffer (IPB)
- Current Processor Priority (CPPR)
- Notification Source Register (NSR)
They are exposed to software in four different pages each proposing
a view with a different privilege. The first page is for the
physical thread context and the second for the hypervisor. Only the
third (operating system) and the fourth (user level) are exposed the
guest.
2. Event State Buffer (ESB)
Each source is associated with an Event State Buffer (ESB) with
either a pair of even/odd pair of pages which provides commands to
manage the source: to trigger, to EOI, to turn off the source for
instance.
* Groups:
1. KVM_DEV_XIVE_GRP_CTRL
Provides global controls on the device
Attributes:
1.1 KVM_DEV_XIVE_RESET (write only)
Resets the interrupt controller configuration for sources and event
queues. To be used by kexec and kdump.
Errors: none
1.2 KVM_DEV_XIVE_EQ_SYNC (write only)
Sync all the sources and queues and mark the EQ pages dirty. This
to make sure that a consistent memory state is captured when
migrating the VM.
Errors: none
2. KVM_DEV_XIVE_GRP_SOURCE (write only)
Initializes a new source in the XIVE device and mask it.
Attributes:
Interrupt source number (64-bit)
The kvm_device_attr.addr points to a __u64 value:
bits: | 63 .... 2 | 1 | 0
values: | unused | level | type
- type: 0:MSI 1:LSI
- level: assertion level in case of an LSI.
Errors:
-E2BIG: Interrupt source number is out of range
-ENOMEM: Could not create a new source block
-EFAULT: Invalid user pointer for attr->addr.
-ENXIO: Could not allocate underlying HW interrupt
3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only)
Configures source targeting
Attributes:
Interrupt source number (64-bit)
The kvm_device_attr.addr points to a __u64 value:
bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0
values: | eisn | mask | server | priority
- priority: 0-7 interrupt priority level
- server: CPU number chosen to handle the interrupt
- mask: mask flag (unused)
- eisn: Effective Interrupt Source Number
Errors:
-ENOENT: Unknown source number
-EINVAL: Not initialized source number
-EINVAL: Invalid priority
-EINVAL: Invalid CPU number.
-EFAULT: Invalid user pointer for attr->addr.
-ENXIO: CPU event queues not configured or configuration of the
underlying HW interrupt failed
-EBUSY: No CPU available to serve interrupt
4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write)
Configures an event queue of a CPU
Attributes:
EQ descriptor identifier (64-bit)
The EQ descriptor identifier is a tuple (server, priority) :
bits: | 63 .... 32 | 31 .. 3 | 2 .. 0
values: | unused | server | priority
The kvm_device_attr.addr points to :
struct kvm_ppc_xive_eq {
__u32 flags;
__u32 qshift;
__u64 qaddr;
__u32 qtoggle;
__u32 qindex;
__u8 pad[40];
};
- flags: queue flags
KVM_XIVE_EQ_ALWAYS_NOTIFY (required)
forces notification without using the coalescing mechanism
provided by the XIVE END ESBs.
- qshift: queue size (power of 2)
- qaddr: real address of queue
- qtoggle: current queue toggle bit
- qindex: current queue index
- pad: reserved for future use
Errors:
-ENOENT: Invalid CPU number
-EINVAL: Invalid priority
-EINVAL: Invalid flags
-EINVAL: Invalid queue size
-EINVAL: Invalid queue address
-EFAULT: Invalid user pointer for attr->addr.
-EIO: Configuration of the underlying HW failed
5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only)
Synchronize the source to flush event notifications
Attributes:
Interrupt source number (64-bit)
Errors:
-ENOENT: Unknown source number
-EINVAL: Not initialized source number
* VCPU state
The XIVE IC maintains VP interrupt state in an internal structure
called the NVT. When a VP is not dispatched on a HW processor
thread, this structure can be updated by HW if the VP is the target
of an event notification.
It is important for migration to capture the cached IPB from the NVT
as it synthesizes the priorities of the pending interrupts. We
capture a bit more to report debug information.
KVM_REG_PPC_VP_STATE (2 * 64bits)
bits: | 63 .... 32 | 31 .... 0 |
values: | TIMA word0 | TIMA word1 |
bits: | 127 .......... 64 |
values: | unused |
* Migration:
Saving the state of a VM using the XIVE native exploitation mode
should follow a specific sequence. When the VM is stopped :
1. Mask all sources (PQ=01) to stop the flow of events.
2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to
flush any in-flight event notification and to stabilize the EQs. At
this stage, the EQ pages are marked dirty to make sure they are
transferred in the migration sequence.
3. Capture the state of the source targeting, the EQs configuration
and the state of thread interrupt context registers.
Restore is similar :
1. Restore the EQ configuration. As targeting depends on it.
2. Restore targeting
3. Restore the thread interrupt contexts
4. Restore the source states
5. Let the vCPU run
|