diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2017-11-14 05:29:23 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2017-11-14 05:29:23 +0300 |
commit | b18d62891aaff49d0ee8367d4b6bb9452469f807 (patch) | |
tree | 02508b3602ff667a20cd107a5125ca5e57ce6806 /drivers | |
parent | 7d58e1c9059eefe0066c5acf2ffa582f6f0180e3 (diff) | |
parent | 141d3b1daacd11bdbd6fa74c2b163093e10d17ee (diff) | |
download | linux-b18d62891aaff49d0ee8367d4b6bb9452469f807.tar.xz |
Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 APIC updates from Thomas Gleixner:
"This update provides a major overhaul of the APIC initialization and
vector allocation code:
- Unification of the APIC and interrupt mode setup which was
scattered all over the place and was hard to follow. This also
distangles the timer setup from the APIC initialization which
brings a clear separation of functionality.
Great detective work from Dou Lyiang!
- Refactoring of the x86 vector allocation mechanism. The existing
code was based on nested loops and rather convoluted APIC callbacks
which had a horrible worst case behaviour and tried to serve all
different use cases in one go. This led to quite odd hacks when
supporting the new managed interupt facility for multiqueue devices
and made it more or less impossible to deal with the vector space
exhaustion which was a major roadblock for server hibernation.
Aside of that the code dealing with cpu hotplug and the system
vectors was disconnected from the actual vector management and
allocation code, which made it hard to follow and maintain.
Utilizing the new bitmap matrix allocator core mechanism, the new
allocator and management code consolidates the handling of system
vectors, legacy vectors, cpu hotplug mechanisms and the actual
allocation which needs to be aware of system and legacy vectors and
hotplug constraints into a single consistent entity.
This has one visible change: The support for multi CPU targets of
interrupts, which is only available on a certain subset of
CPUs/APIC variants has been removed in favour of single interrupt
targets. A proper analysis of the multi CPU target feature revealed
that there is no real advantage as the vast majority of interrupts
end up on the CPU with the lowest APIC id in the set of target CPUs
anyway. That change was agreed on by the relevant folks and allowed
to simplify the implementation significantly and to replace rather
fragile constructs like the vector cleanup IPI with straight
forward and solid code.
Furthermore this allowed to cleanly separate the allocation details
for legacy, normal and managed interrupts:
* Legacy interrupts are not longer wasting 16 vectors
unconditionally
* Managed interrupts have now a guaranteed vector reservation, but
the actual vector assignment happens when the interrupt is
requested. It's guaranteed not to fail.
* Normal interrupts no longer allocate vectors unconditionally
when the interrupt is set up (IO/APIC init or MSI(X) enable).
The mechanism has been switched to a best effort reservation
mode. The actual allocation happens when the interrupt is
requested. Contrary to managed interrupts the request can fail
due to vector space exhaustion, but drivers must handle a fail
of request_irq() anyway. When the interrupt is freed, the vector
is handed back as well.
This solves a long standing problem with large unconditional
vector allocations for a certain class of enterprise devices
which prevented server hibernation due to vector space
exhaustion when the unused allocated vectors had to be migrated
to CPU0 while unplugging all non boot CPUs.
The code has been equipped with trace points and detailed debugfs
information to aid analysis of the vector space"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
x86/vector/msi: Select CONFIG_GENERIC_IRQ_RESERVATION_MODE
PCI/MSI: Set MSI_FLAG_MUST_REACTIVATE in core code
genirq: Add config option for reservation mode
x86/vector: Use correct per cpu variable in free_moved_vector()
x86/apic/vector: Ignore set_affinity call for inactive interrupts
x86/apic: Fix spelling mistake: "symmectic" -> "symmetric"
x86/apic: Use dead_cpu instead of current CPU when cleaning up
ACPI/init: Invoke early ACPI initialization earlier
x86/vector: Respect affinity mask in irq descriptor
x86/irq: Simplify hotplug vector accounting
x86/vector: Switch IOAPIC to global reservation mode
x86/vector/msi: Switch to global reservation mode
x86/vector: Handle managed interrupts proper
x86/io_apic: Reevaluate vector configuration on activate()
iommu/amd: Reevaluate vector configuration on activate()
iommu/vt-d: Reevaluate vector configuration on activate()
x86/apic/msi: Force reactivation of interrupts at startup time
x86/vector: Untangle internal state from irq_cfg
x86/vector: Compile SMP only code conditionally
x86/apic: Remove unused callbacks
...
Diffstat (limited to 'drivers')
-rw-r--r-- | drivers/iommu/amd_iommu.c | 39 | ||||
-rw-r--r-- | drivers/iommu/intel_irq_remapping.c | 38 | ||||
-rw-r--r-- | drivers/pci/msi.c | 2 |
3 files changed, 52 insertions, 27 deletions
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 330856803e90..9c848e36f209 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -4173,16 +4173,25 @@ static void irq_remapping_free(struct irq_domain *domain, unsigned int virq, irq_domain_free_irqs_common(domain, virq, nr_irqs); } +static void amd_ir_update_irte(struct irq_data *irqd, struct amd_iommu *iommu, + struct amd_ir_data *ir_data, + struct irq_2_irte *irte_info, + struct irq_cfg *cfg); + static int irq_remapping_activate(struct irq_domain *domain, struct irq_data *irq_data, bool early) { struct amd_ir_data *data = irq_data->chip_data; struct irq_2_irte *irte_info = &data->irq_2_irte; struct amd_iommu *iommu = amd_iommu_rlookup_table[irte_info->devid]; + struct irq_cfg *cfg = irqd_cfg(irq_data); - if (iommu) - iommu->irte_ops->activate(data->entry, irte_info->devid, - irte_info->index); + if (!iommu) + return 0; + + iommu->irte_ops->activate(data->entry, irte_info->devid, + irte_info->index); + amd_ir_update_irte(irq_data, iommu, data, irte_info, cfg); return 0; } @@ -4270,6 +4279,22 @@ static int amd_ir_set_vcpu_affinity(struct irq_data *data, void *vcpu_info) return modify_irte_ga(irte_info->devid, irte_info->index, irte, ir_data); } + +static void amd_ir_update_irte(struct irq_data *irqd, struct amd_iommu *iommu, + struct amd_ir_data *ir_data, + struct irq_2_irte *irte_info, + struct irq_cfg *cfg) +{ + + /* + * Atomically updates the IRTE with the new destination, vector + * and flushes the interrupt entry cache. + */ + iommu->irte_ops->set_affinity(ir_data->entry, irte_info->devid, + irte_info->index, cfg->vector, + cfg->dest_apicid); +} + static int amd_ir_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force) { @@ -4287,13 +4312,7 @@ static int amd_ir_set_affinity(struct irq_data *data, if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE) return ret; - /* - * Atomically updates the IRTE with the new destination, vector - * and flushes the interrupt entry cache. - */ - iommu->irte_ops->set_affinity(ir_data->entry, irte_info->devid, - irte_info->index, cfg->vector, cfg->dest_apicid); - + amd_ir_update_irte(data, iommu, ir_data, irte_info, cfg); /* * After this point, all the interrupts will start arriving * at the new destination. So, time to cleanup the previous diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c index 324163330eaa..76a193c7fcfc 100644 --- a/drivers/iommu/intel_irq_remapping.c +++ b/drivers/iommu/intel_irq_remapping.c @@ -1122,6 +1122,24 @@ struct irq_remap_ops intel_irq_remap_ops = { .get_irq_domain = intel_get_irq_domain, }; +static void intel_ir_reconfigure_irte(struct irq_data *irqd, bool force) +{ + struct intel_ir_data *ir_data = irqd->chip_data; + struct irte *irte = &ir_data->irte_entry; + struct irq_cfg *cfg = irqd_cfg(irqd); + + /* + * Atomically updates the IRTE with the new destination, vector + * and flushes the interrupt entry cache. + */ + irte->vector = cfg->vector; + irte->dest_id = IRTE_DEST(cfg->dest_apicid); + + /* Update the hardware only if the interrupt is in remapped mode. */ + if (!force || ir_data->irq_2_iommu.mode == IRQ_REMAPPING) + modify_irte(&ir_data->irq_2_iommu, irte); +} + /* * Migrate the IO-APIC irq in the presence of intr-remapping. * @@ -1140,27 +1158,15 @@ static int intel_ir_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force) { - struct intel_ir_data *ir_data = data->chip_data; - struct irte *irte = &ir_data->irte_entry; - struct irq_cfg *cfg = irqd_cfg(data); struct irq_data *parent = data->parent_data; + struct irq_cfg *cfg = irqd_cfg(data); int ret; ret = parent->chip->irq_set_affinity(parent, mask, force); if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE) return ret; - /* - * Atomically updates the IRTE with the new destination, vector - * and flushes the interrupt entry cache. - */ - irte->vector = cfg->vector; - irte->dest_id = IRTE_DEST(cfg->dest_apicid); - - /* Update the hardware only if the interrupt is in remapped mode. */ - if (ir_data->irq_2_iommu.mode == IRQ_REMAPPING) - modify_irte(&ir_data->irq_2_iommu, irte); - + intel_ir_reconfigure_irte(data, false); /* * After this point, all the interrupts will start arriving * at the new destination. So, time to cleanup the previous @@ -1393,9 +1399,7 @@ static void intel_irq_remapping_free(struct irq_domain *domain, static int intel_irq_remapping_activate(struct irq_domain *domain, struct irq_data *irq_data, bool early) { - struct intel_ir_data *data = irq_data->chip_data; - - modify_irte(&data->irq_2_iommu, &data->irte_entry); + intel_ir_reconfigure_irte(irq_data, true); return 0; } diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 496ed9130600..e06607167858 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1441,6 +1441,8 @@ struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode, pci_msi_domain_update_chip_ops(info); info->flags |= MSI_FLAG_ACTIVATE_EARLY; + if (IS_ENABLED(CONFIG_GENERIC_IRQ_RESERVATION_MODE)) + info->flags |= MSI_FLAG_MUST_REACTIVATE; domain = msi_create_irq_domain(fwnode, info, parent); if (!domain) |