From e00b0ab86c79c4e82eb821ac6d6a3daef2e3e600 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 1 May 2020 17:37:51 +0200 Subject: docs: add IRQ documentation at the core-api book There are 4 IRQ documentation files under Documentation/*.txt. Move them into a new directory (core-api/irq) and add a new index file for it. While here, use a title markup for the Debugging section of the irq-domain.rst file. Signed-off-by: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/2da7485c3718e1442e6b4c2dd66857b776e8899b.1588345503.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet --- Documentation/IRQ-affinity.txt | 70 ------ Documentation/IRQ-domain.txt | 269 -------------------- Documentation/IRQ.txt | 24 -- Documentation/admin-guide/hw-vuln/l1tf.rst | 2 +- .../admin-guide/kernel-per-CPU-kthreads.rst | 2 +- Documentation/core-api/index.rst | 1 + Documentation/core-api/irq/concepts.rst | 24 ++ Documentation/core-api/irq/index.rst | 11 + Documentation/core-api/irq/irq-affinity.rst | 70 ++++++ Documentation/core-api/irq/irq-domain.rst | 270 +++++++++++++++++++++ Documentation/core-api/irq/irqflags-tracing.rst | 52 ++++ Documentation/ia64/irq-redir.rst | 2 +- Documentation/irqflags-tracing.txt | 52 ---- Documentation/networking/scaling.rst | 4 +- Documentation/translations/zh_CN/IRQ.txt | 4 +- 15 files changed, 435 insertions(+), 422 deletions(-) delete mode 100644 Documentation/IRQ-affinity.txt delete mode 100644 Documentation/IRQ-domain.txt delete mode 100644 Documentation/IRQ.txt create mode 100644 Documentation/core-api/irq/concepts.rst create mode 100644 Documentation/core-api/irq/index.rst create mode 100644 Documentation/core-api/irq/irq-affinity.rst create mode 100644 Documentation/core-api/irq/irq-domain.rst create mode 100644 Documentation/core-api/irq/irqflags-tracing.rst delete mode 100644 Documentation/irqflags-tracing.txt (limited to 'Documentation') diff --git a/Documentation/IRQ-affinity.txt b/Documentation/IRQ-affinity.txt deleted file mode 100644 index 29da5000836a..000000000000 --- a/Documentation/IRQ-affinity.txt +++ /dev/null @@ -1,70 +0,0 @@ -================ -SMP IRQ affinity -================ - -ChangeLog: - - Started by Ingo Molnar - - Update by Max Krasnyansky - - -/proc/irq/IRQ#/smp_affinity and /proc/irq/IRQ#/smp_affinity_list specify -which target CPUs are permitted for a given IRQ source. It's a bitmask -(smp_affinity) or cpu list (smp_affinity_list) of allowed CPUs. It's not -allowed to turn off all CPUs, and if an IRQ controller does not support -IRQ affinity then the value will not change from the default of all cpus. - -/proc/irq/default_smp_affinity specifies default affinity mask that applies -to all non-active IRQs. Once IRQ is allocated/activated its affinity bitmask -will be set to the default mask. It can then be changed as described above. -Default mask is 0xffffffff. - -Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting -it to CPU4-7 (this is an 8-CPU SMP box):: - - [root@moon 44]# cd /proc/irq/44 - [root@moon 44]# cat smp_affinity - ffffffff - - [root@moon 44]# echo 0f > smp_affinity - [root@moon 44]# cat smp_affinity - 0000000f - [root@moon 44]# ping -f h - PING hell (195.4.7.3): 56 data bytes - ... - --- hell ping statistics --- - 6029 packets transmitted, 6027 packets received, 0% packet loss - round-trip min/avg/max = 0.1/0.1/0.4 ms - [root@moon 44]# cat /proc/interrupts | grep 'CPU\|44:' - CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 - 44: 1068 1785 1785 1783 0 0 0 0 IO-APIC-level eth1 - -As can be seen from the line above IRQ44 was delivered only to the first four -processors (0-3). -Now lets restrict that IRQ to CPU(4-7). - -:: - - [root@moon 44]# echo f0 > smp_affinity - [root@moon 44]# cat smp_affinity - 000000f0 - [root@moon 44]# ping -f h - PING hell (195.4.7.3): 56 data bytes - .. - --- hell ping statistics --- - 2779 packets transmitted, 2777 packets received, 0% packet loss - round-trip min/avg/max = 0.1/0.5/585.4 ms - [root@moon 44]# cat /proc/interrupts | 'CPU\|44:' - CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 - 44: 1068 1785 1785 1783 1784 1069 1070 1069 IO-APIC-level eth1 - -This time around IRQ44 was delivered only to the last four processors. -i.e counters for the CPU0-3 did not change. - -Here is an example of limiting that same irq (44) to cpus 1024 to 1031:: - - [root@moon 44]# echo 1024-1031 > smp_affinity_list - [root@moon 44]# cat smp_affinity_list - 1024-1031 - -Note that to do this with a bitmask would require 32 bitmasks of zero -to follow the pertinent one. diff --git a/Documentation/IRQ-domain.txt b/Documentation/IRQ-domain.txt deleted file mode 100644 index 507775cce753..000000000000 --- a/Documentation/IRQ-domain.txt +++ /dev/null @@ -1,269 +0,0 @@ -=============================================== -The irq_domain interrupt number mapping library -=============================================== - -The current design of the Linux kernel uses a single large number -space where each separate IRQ source is assigned a different number. -This is simple when there is only one interrupt controller, but in -systems with multiple interrupt controllers the kernel must ensure -that each one gets assigned non-overlapping allocations of Linux -IRQ numbers. - -The number of interrupt controllers registered as unique irqchips -show a rising tendency: for example subdrivers of different kinds -such as GPIO controllers avoid reimplementing identical callback -mechanisms as the IRQ core system by modelling their interrupt -handlers as irqchips, i.e. in effect cascading interrupt controllers. - -Here the interrupt number loose all kind of correspondence to -hardware interrupt numbers: whereas in the past, IRQ numbers could -be chosen so they matched the hardware IRQ line into the root -interrupt controller (i.e. the component actually fireing the -interrupt line to the CPU) nowadays this number is just a number. - -For this reason we need a mechanism to separate controller-local -interrupt numbers, called hardware irq's, from Linux IRQ numbers. - -The irq_alloc_desc*() and irq_free_desc*() APIs provide allocation of -irq numbers, but they don't provide any support for reverse mapping of -the controller-local IRQ (hwirq) number into the Linux IRQ number -space. - -The irq_domain library adds mapping between hwirq and IRQ numbers on -top of the irq_alloc_desc*() API. An irq_domain to manage mapping is -preferred over interrupt controller drivers open coding their own -reverse mapping scheme. - -irq_domain also implements translation from an abstract irq_fwspec -structure to hwirq numbers (Device Tree and ACPI GSI so far), and can -be easily extended to support other IRQ topology data sources. - -irq_domain usage -================ - -An interrupt controller driver creates and registers an irq_domain by -calling one of the irq_domain_add_*() functions (each mapping method -has a different allocator function, more on that later). The function -will return a pointer to the irq_domain on success. The caller must -provide the allocator function with an irq_domain_ops structure. - -In most cases, the irq_domain will begin empty without any mappings -between hwirq and IRQ numbers. Mappings are added to the irq_domain -by calling irq_create_mapping() which accepts the irq_domain and a -hwirq number as arguments. If a mapping for the hwirq doesn't already -exist then it will allocate a new Linux irq_desc, associate it with -the hwirq, and call the .map() callback so the driver can perform any -required hardware setup. - -When an interrupt is received, irq_find_mapping() function should -be used to find the Linux IRQ number from the hwirq number. - -The irq_create_mapping() function must be called *atleast once* -before any call to irq_find_mapping(), lest the descriptor will not -be allocated. - -If the driver has the Linux IRQ number or the irq_data pointer, and -needs to know the associated hwirq number (such as in the irq_chip -callbacks) then it can be directly obtained from irq_data->hwirq. - -Types of irq_domain mappings -============================ - -There are several mechanisms available for reverse mapping from hwirq -to Linux irq, and each mechanism uses a different allocation function. -Which reverse map type should be used depends on the use case. Each -of the reverse map types are described below: - -Linear ------- - -:: - - irq_domain_add_linear() - irq_domain_create_linear() - -The linear reverse map maintains a fixed size table indexed by the -hwirq number. When a hwirq is mapped, an irq_desc is allocated for -the hwirq, and the IRQ number is stored in the table. - -The Linear map is a good choice when the maximum number of hwirqs is -fixed and a relatively small number (~ < 256). The advantages of this -map are fixed time lookup for IRQ numbers, and irq_descs are only -allocated for in-use IRQs. The disadvantage is that the table must be -as large as the largest possible hwirq number. - -irq_domain_add_linear() and irq_domain_create_linear() are functionally -equivalent, except for the first argument is different - the former -accepts an Open Firmware specific 'struct device_node', while the latter -accepts a more general abstraction 'struct fwnode_handle'. - -The majority of drivers should use the linear map. - -Tree ----- - -:: - - irq_domain_add_tree() - irq_domain_create_tree() - -The irq_domain maintains a radix tree map from hwirq numbers to Linux -IRQs. When an hwirq is mapped, an irq_desc is allocated and the -hwirq is used as the lookup key for the radix tree. - -The tree map is a good choice if the hwirq number can be very large -since it doesn't need to allocate a table as large as the largest -hwirq number. The disadvantage is that hwirq to IRQ number lookup is -dependent on how many entries are in the table. - -irq_domain_add_tree() and irq_domain_create_tree() are functionally -equivalent, except for the first argument is different - the former -accepts an Open Firmware specific 'struct device_node', while the latter -accepts a more general abstraction 'struct fwnode_handle'. - -Very few drivers should need this mapping. - -No Map ------- - -:: - - irq_domain_add_nomap() - -The No Map mapping is to be used when the hwirq number is -programmable in the hardware. In this case it is best to program the -Linux IRQ number into the hardware itself so that no mapping is -required. Calling irq_create_direct_mapping() will allocate a Linux -IRQ number and call the .map() callback so that driver can program the -Linux IRQ number into the hardware. - -Most drivers cannot use this mapping. - -Legacy ------- - -:: - - irq_domain_add_simple() - irq_domain_add_legacy() - irq_domain_add_legacy_isa() - -The Legacy mapping is a special case for drivers that already have a -range of irq_descs allocated for the hwirqs. It is used when the -driver cannot be immediately converted to use the linear mapping. For -example, many embedded system board support files use a set of #defines -for IRQ numbers that are passed to struct device registrations. In that -case the Linux IRQ numbers cannot be dynamically assigned and the legacy -mapping should be used. - -The legacy map assumes a contiguous range of IRQ numbers has already -been allocated for the controller and that the IRQ number can be -calculated by adding a fixed offset to the hwirq number, and -visa-versa. The disadvantage is that it requires the interrupt -controller to manage IRQ allocations and it requires an irq_desc to be -allocated for every hwirq, even if it is unused. - -The legacy map should only be used if fixed IRQ mappings must be -supported. For example, ISA controllers would use the legacy map for -mapping Linux IRQs 0-15 so that existing ISA drivers get the correct IRQ -numbers. - -Most users of legacy mappings should use irq_domain_add_simple() which -will use a legacy domain only if an IRQ range is supplied by the -system and will otherwise use a linear domain mapping. The semantics -of this call are such that if an IRQ range is specified then -descriptors will be allocated on-the-fly for it, and if no range is -specified it will fall through to irq_domain_add_linear() which means -*no* irq descriptors will be allocated. - -A typical use case for simple domains is where an irqchip provider -is supporting both dynamic and static IRQ assignments. - -In order to avoid ending up in a situation where a linear domain is -used and no descriptor gets allocated it is very important to make sure -that the driver using the simple domain call irq_create_mapping() -before any irq_find_mapping() since the latter will actually work -for the static IRQ assignment case. - -Hierarchy IRQ domain --------------------- - -On some architectures, there may be multiple interrupt controllers -involved in delivering an interrupt from the device to the target CPU. -Let's look at a typical interrupt delivering path on x86 platforms:: - - Device --> IOAPIC -> Interrupt remapping Controller -> Local APIC -> CPU - -There are three interrupt controllers involved: - -1) IOAPIC controller -2) Interrupt remapping controller -3) Local APIC controller - -To support such a hardware topology and make software architecture match -hardware architecture, an irq_domain data structure is built for each -interrupt controller and those irq_domains are organized into hierarchy. -When building irq_domain hierarchy, the irq_domain near to the device is -child and the irq_domain near to CPU is parent. So a hierarchy structure -as below will be built for the example above:: - - CPU Vector irq_domain (root irq_domain to manage CPU vectors) - ^ - | - Interrupt Remapping irq_domain (manage irq_remapping entries) - ^ - | - IOAPIC irq_domain (manage IOAPIC delivery entries/pins) - -There are four major interfaces to use hierarchy irq_domain: - -1) irq_domain_alloc_irqs(): allocate IRQ descriptors and interrupt - controller related resources to deliver these interrupts. -2) irq_domain_free_irqs(): free IRQ descriptors and interrupt controller - related resources associated with these interrupts. -3) irq_domain_activate_irq(): activate interrupt controller hardware to - deliver the interrupt. -4) irq_domain_deactivate_irq(): deactivate interrupt controller hardware - to stop delivering the interrupt. - -Following changes are needed to support hierarchy irq_domain: - -1) a new field 'parent' is added to struct irq_domain; it's used to - maintain irq_domain hierarchy information. -2) a new field 'parent_data' is added to struct irq_data; it's used to - build hierarchy irq_data to match hierarchy irq_domains. The irq_data - is used to store irq_domain pointer and hardware irq number. -3) new callbacks are added to struct irq_domain_ops to support hierarchy - irq_domain operations. - -With support of hierarchy irq_domain and hierarchy irq_data ready, an -irq_domain structure is built for each interrupt controller, and an -irq_data structure is allocated for each irq_domain associated with an -IRQ. Now we could go one step further to support stacked(hierarchy) -irq_chip. That is, an irq_chip is associated with each irq_data along -the hierarchy. A child irq_chip may implement a required action by -itself or by cooperating with its parent irq_chip. - -With stacked irq_chip, interrupt controller driver only needs to deal -with the hardware managed by itself and may ask for services from its -parent irq_chip when needed. So we could achieve a much cleaner -software architecture. - -For an interrupt controller driver to support hierarchy irq_domain, it -needs to: - -1) Implement irq_domain_ops.alloc and irq_domain_ops.free -2) Optionally implement irq_domain_ops.activate and - irq_domain_ops.deactivate. -3) Optionally implement an irq_chip to manage the interrupt controller - hardware. -4) No need to implement irq_domain_ops.map and irq_domain_ops.unmap, - they are unused with hierarchy irq_domain. - -Hierarchy irq_domain is in no way x86 specific, and is heavily used to -support other architectures, such as ARM, ARM64 etc. - -=== Debugging === - -Most of the internals of the IRQ subsystem are exposed in debugfs by -turning CONFIG_GENERIC_IRQ_DEBUGFS on. diff --git a/Documentation/IRQ.txt b/Documentation/IRQ.txt deleted file mode 100644 index 4273806a606b..000000000000 --- a/Documentation/IRQ.txt +++ /dev/null @@ -1,24 +0,0 @@ -=============== -What is an IRQ? -=============== - -An IRQ is an interrupt request from a device. -Currently they can come in over a pin, or over a packet. -Several devices may be connected to the same pin thus -sharing an IRQ. - -An IRQ number is a kernel identifier used to talk about a hardware -interrupt source. Typically this is an index into the global irq_desc -array, but except for what linux/interrupt.h implements the details -are architecture specific. - -An IRQ number is an enumeration of the possible interrupt sources on a -machine. Typically what is enumerated is the number of input pins on -all of the interrupt controller in the system. In the case of ISA -what is enumerated are the 16 input pins on the two i8259 interrupt -controllers. - -Architectures can assign additional meaning to the IRQ numbers, and -are encouraged to in the case where there is any manual configuration -of the hardware involved. The ISA IRQs are a classic example of -assigning this kind of additional meaning. diff --git a/Documentation/admin-guide/hw-vuln/l1tf.rst b/Documentation/admin-guide/hw-vuln/l1tf.rst index f83212fae4d5..3eeeb488d955 100644 --- a/Documentation/admin-guide/hw-vuln/l1tf.rst +++ b/Documentation/admin-guide/hw-vuln/l1tf.rst @@ -268,7 +268,7 @@ Guest mitigation mechanisms /proc/irq/$NR/smp_affinity[_list] files. Limited documentation is available at: - https://www.kernel.org/doc/Documentation/IRQ-affinity.txt + https://www.kernel.org/doc/Documentation/core-api/irq/irq-affinity.rst .. _smt_control: diff --git a/Documentation/admin-guide/kernel-per-CPU-kthreads.rst b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst index 21818aca4708..dc36aeb65d0a 100644 --- a/Documentation/admin-guide/kernel-per-CPU-kthreads.rst +++ b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst @@ -10,7 +10,7 @@ them to a "housekeeping" CPU dedicated to such work. References ========== -- Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs. +- Documentation/core-api/irq/irq-affinity.rst: Binding interrupts to sets of CPUs. - Documentation/admin-guide/cgroup-v1: Using cgroups to bind tasks to sets of CPUs. diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index 2cfd07a34173..0caaed576225 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -52,6 +52,7 @@ How Linux keeps everything from happening at the same time. See atomic_ops refcount-vs-atomic + irq/index local_ops padata ../RCU/index diff --git a/Documentation/core-api/irq/concepts.rst b/Documentation/core-api/irq/concepts.rst new file mode 100644 index 000000000000..4273806a606b --- /dev/null +++ b/Documentation/core-api/irq/concepts.rst @@ -0,0 +1,24 @@ +=============== +What is an IRQ? +=============== + +An IRQ is an interrupt request from a device. +Currently they can come in over a pin, or over a packet. +Several devices may be connected to the same pin thus +sharing an IRQ. + +An IRQ number is a kernel identifier used to talk about a hardware +interrupt source. Typically this is an index into the global irq_desc +array, but except for what linux/interrupt.h implements the details +are architecture specific. + +An IRQ number is an enumeration of the possible interrupt sources on a +machine. Typically what is enumerated is the number of input pins on +all of the interrupt controller in the system. In the case of ISA +what is enumerated are the 16 input pins on the two i8259 interrupt +controllers. + +Architectures can assign additional meaning to the IRQ numbers, and +are encouraged to in the case where there is any manual configuration +of the hardware involved. The ISA IRQs are a classic example of +assigning this kind of additional meaning. diff --git a/Documentation/core-api/irq/index.rst b/Documentation/core-api/irq/index.rst new file mode 100644 index 000000000000..0d65d11e5420 --- /dev/null +++ b/Documentation/core-api/irq/index.rst @@ -0,0 +1,11 @@ +==== +IRQs +==== + +.. toctree:: + :maxdepth: 1 + + concepts + irq-affinity + irq-domain + irqflags-tracing diff --git a/Documentation/core-api/irq/irq-affinity.rst b/Documentation/core-api/irq/irq-affinity.rst new file mode 100644 index 000000000000..29da5000836a --- /dev/null +++ b/Documentation/core-api/irq/irq-affinity.rst @@ -0,0 +1,70 @@ +================ +SMP IRQ affinity +================ + +ChangeLog: + - Started by Ingo Molnar + - Update by Max Krasnyansky + + +/proc/irq/IRQ#/smp_affinity and /proc/irq/IRQ#/smp_affinity_list specify +which target CPUs are permitted for a given IRQ source. It's a bitmask +(smp_affinity) or cpu list (smp_affinity_list) of allowed CPUs. It's not +allowed to turn off all CPUs, and if an IRQ controller does not support +IRQ affinity then the value will not change from the default of all cpus. + +/proc/irq/default_smp_affinity specifies default affinity mask that applies +to all non-active IRQs. Once IRQ is allocated/activated its affinity bitmask +will be set to the default mask. It can then be changed as described above. +Default mask is 0xffffffff. + +Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting +it to CPU4-7 (this is an 8-CPU SMP box):: + + [root@moon 44]# cd /proc/irq/44 + [root@moon 44]# cat smp_affinity + ffffffff + + [root@moon 44]# echo 0f > smp_affinity + [root@moon 44]# cat smp_affinity + 0000000f + [root@moon 44]# ping -f h + PING hell (195.4.7.3): 56 data bytes + ... + --- hell ping statistics --- + 6029 packets transmitted, 6027 packets received, 0% packet loss + round-trip min/avg/max = 0.1/0.1/0.4 ms + [root@moon 44]# cat /proc/interrupts | grep 'CPU\|44:' + CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 + 44: 1068 1785 1785 1783 0 0 0 0 IO-APIC-level eth1 + +As can be seen from the line above IRQ44 was delivered only to the first four +processors (0-3). +Now lets restrict that IRQ to CPU(4-7). + +:: + + [root@moon 44]# echo f0 > smp_affinity + [root@moon 44]# cat smp_affinity + 000000f0 + [root@moon 44]# ping -f h + PING hell (195.4.7.3): 56 data bytes + .. + --- hell ping statistics --- + 2779 packets transmitted, 2777 packets received, 0% packet loss + round-trip min/avg/max = 0.1/0.5/585.4 ms + [root@moon 44]# cat /proc/interrupts | 'CPU\|44:' + CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 + 44: 1068 1785 1785 1783 1784 1069 1070 1069 IO-APIC-level eth1 + +This time around IRQ44 was delivered only to the last four processors. +i.e counters for the CPU0-3 did not change. + +Here is an example of limiting that same irq (44) to cpus 1024 to 1031:: + + [root@moon 44]# echo 1024-1031 > smp_affinity_list + [root@moon 44]# cat smp_affinity_list + 1024-1031 + +Note that to do this with a bitmask would require 32 bitmasks of zero +to follow the pertinent one. diff --git a/Documentation/core-api/irq/irq-domain.rst b/Documentation/core-api/irq/irq-domain.rst new file mode 100644 index 000000000000..096db12f32d5 --- /dev/null +++ b/Documentation/core-api/irq/irq-domain.rst @@ -0,0 +1,270 @@ +=============================================== +The irq_domain interrupt number mapping library +=============================================== + +The current design of the Linux kernel uses a single large number +space where each separate IRQ source is assigned a different number. +This is simple when there is only one interrupt controller, but in +systems with multiple interrupt controllers the kernel must ensure +that each one gets assigned non-overlapping allocations of Linux +IRQ numbers. + +The number of interrupt controllers registered as unique irqchips +show a rising tendency: for example subdrivers of different kinds +such as GPIO controllers avoid reimplementing identical callback +mechanisms as the IRQ core system by modelling their interrupt +handlers as irqchips, i.e. in effect cascading interrupt controllers. + +Here the interrupt number loose all kind of correspondence to +hardware interrupt numbers: whereas in the past, IRQ numbers could +be chosen so they matched the hardware IRQ line into the root +interrupt controller (i.e. the component actually fireing the +interrupt line to the CPU) nowadays this number is just a number. + +For this reason we need a mechanism to separate controller-local +interrupt numbers, called hardware irq's, from Linux IRQ numbers. + +The irq_alloc_desc*() and irq_free_desc*() APIs provide allocation of +irq numbers, but they don't provide any support for reverse mapping of +the controller-local IRQ (hwirq) number into the Linux IRQ number +space. + +The irq_domain library adds mapping between hwirq and IRQ numbers on +top of the irq_alloc_desc*() API. An irq_domain to manage mapping is +preferred over interrupt controller drivers open coding their own +reverse mapping scheme. + +irq_domain also implements translation from an abstract irq_fwspec +structure to hwirq numbers (Device Tree and ACPI GSI so far), and can +be easily extended to support other IRQ topology data sources. + +irq_domain usage +================ + +An interrupt controller driver creates and registers an irq_domain by +calling one of the irq_domain_add_*() functions (each mapping method +has a different allocator function, more on that later). The function +will return a pointer to the irq_domain on success. The caller must +provide the allocator function with an irq_domain_ops structure. + +In most cases, the irq_domain will begin empty without any mappings +between hwirq and IRQ numbers. Mappings are added to the irq_domain +by calling irq_create_mapping() which accepts the irq_domain and a +hwirq number as arguments. If a mapping for the hwirq doesn't already +exist then it will allocate a new Linux irq_desc, associate it with +the hwirq, and call the .map() callback so the driver can perform any +required hardware setup. + +When an interrupt is received, irq_find_mapping() function should +be used to find the Linux IRQ number from the hwirq number. + +The irq_create_mapping() function must be called *atleast once* +before any call to irq_find_mapping(), lest the descriptor will not +be allocated. + +If the driver has the Linux IRQ number or the irq_data pointer, and +needs to know the associated hwirq number (such as in the irq_chip +callbacks) then it can be directly obtained from irq_data->hwirq. + +Types of irq_domain mappings +============================ + +There are several mechanisms available for reverse mapping from hwirq +to Linux irq, and each mechanism uses a different allocation function. +Which reverse map type should be used depends on the use case. Each +of the reverse map types are described below: + +Linear +------ + +:: + + irq_domain_add_linear() + irq_domain_create_linear() + +The linear reverse map maintains a fixed size table indexed by the +hwirq number. When a hwirq is mapped, an irq_desc is allocated for +the hwirq, and the IRQ number is stored in the table. + +The Linear map is a good choice when the maximum number of hwirqs is +fixed and a relatively small number (~ < 256). The advantages of this +map are fixed time lookup for IRQ numbers, and irq_descs are only +allocated for in-use IRQs. The disadvantage is that the table must be +as large as the largest possible hwirq number. + +irq_domain_add_linear() and irq_domain_create_linear() are functionally +equivalent, except for the first argument is different - the former +accepts an Open Firmware specific 'struct device_node', while the latter +accepts a more general abstraction 'struct fwnode_handle'. + +The majority of drivers should use the linear map. + +Tree +---- + +:: + + irq_domain_add_tree() + irq_domain_create_tree() + +The irq_domain maintains a radix tree map from hwirq numbers to Linux +IRQs. When an hwirq is mapped, an irq_desc is allocated and the +hwirq is used as the lookup key for the radix tree. + +The tree map is a good choice if the hwirq number can be very large +since it doesn't need to allocate a table as large as the largest +hwirq number. The disadvantage is that hwirq to IRQ number lookup is +dependent on how many entries are in the table. + +irq_domain_add_tree() and irq_domain_create_tree() are functionally +equivalent, except for the first argument is different - the former +accepts an Open Firmware specific 'struct device_node', while the latter +accepts a more general abstraction 'struct fwnode_handle'. + +Very few drivers should need this mapping. + +No Map +------ + +:: + + irq_domain_add_nomap() + +The No Map mapping is to be used when the hwirq number is +programmable in the hardware. In this case it is best to program the +Linux IRQ number into the hardware itself so that no mapping is +required. Calling irq_create_direct_mapping() will allocate a Linux +IRQ number and call the .map() callback so that driver can program the +Linux IRQ number into the hardware. + +Most drivers cannot use this mapping. + +Legacy +------ + +:: + + irq_domain_add_simple() + irq_domain_add_legacy() + irq_domain_add_legacy_isa() + +The Legacy mapping is a special case for drivers that already have a +range of irq_descs allocated for the hwirqs. It is used when the +driver cannot be immediately converted to use the linear mapping. For +example, many embedded system board support files use a set of #defines +for IRQ numbers that are passed to struct device registrations. In that +case the Linux IRQ numbers cannot be dynamically assigned and the legacy +mapping should be used. + +The legacy map assumes a contiguous range of IRQ numbers has already +been allocated for the controller and that the IRQ number can be +calculated by adding a fixed offset to the hwirq number, and +visa-versa. The disadvantage is that it requires the interrupt +controller to manage IRQ allocations and it requires an irq_desc to be +allocated for every hwirq, even if it is unused. + +The legacy map should only be used if fixed IRQ mappings must be +supported. For example, ISA controllers would use the legacy map for +mapping Linux IRQs 0-15 so that existing ISA drivers get the correct IRQ +numbers. + +Most users of legacy mappings should use irq_domain_add_simple() which +will use a legacy domain only if an IRQ range is supplied by the +system and will otherwise use a linear domain mapping. The semantics +of this call are such that if an IRQ range is specified then +descriptors will be allocated on-the-fly for it, and if no range is +specified it will fall through to irq_domain_add_linear() which means +*no* irq descriptors will be allocated. + +A typical use case for simple domains is where an irqchip provider +is supporting both dynamic and static IRQ assignments. + +In order to avoid ending up in a situation where a linear domain is +used and no descriptor gets allocated it is very important to make sure +that the driver using the simple domain call irq_create_mapping() +before any irq_find_mapping() since the latter will actually work +for the static IRQ assignment case. + +Hierarchy IRQ domain +-------------------- + +On some architectures, there may be multiple interrupt controllers +involved in delivering an interrupt from the device to the target CPU. +Let's look at a typical interrupt delivering path on x86 platforms:: + + Device --> IOAPIC -> Interrupt remapping Controller -> Local APIC -> CPU + +There are three interrupt controllers involved: + +1) IOAPIC controller +2) Interrupt remapping controller +3) Local APIC controller + +To support such a hardware topology and make software architecture match +hardware architecture, an irq_domain data structure is built for each +interrupt controller and those irq_domains are organized into hierarchy. +When building irq_domain hierarchy, the irq_domain near to the device is +child and the irq_domain near to CPU is parent. So a hierarchy structure +as below will be built for the example above:: + + CPU Vector irq_domain (root irq_domain to manage CPU vectors) + ^ + | + Interrupt Remapping irq_domain (manage irq_remapping entries) + ^ + | + IOAPIC irq_domain (manage IOAPIC delivery entries/pins) + +There are four major interfaces to use hierarchy irq_domain: + +1) irq_domain_alloc_irqs(): allocate IRQ descriptors and interrupt + controller related resources to deliver these interrupts. +2) irq_domain_free_irqs(): free IRQ descriptors and interrupt controller + related resources associated with these interrupts. +3) irq_domain_activate_irq(): activate interrupt controller hardware to + deliver the interrupt. +4) irq_domain_deactivate_irq(): deactivate interrupt controller hardware + to stop delivering the interrupt. + +Following changes are needed to support hierarchy irq_domain: + +1) a new field 'parent' is added to struct irq_domain; it's used to + maintain irq_domain hierarchy information. +2) a new field 'parent_data' is added to struct irq_data; it's used to + build hierarchy irq_data to match hierarchy irq_domains. The irq_data + is used to store irq_domain pointer and hardware irq number. +3) new callbacks are added to struct irq_domain_ops to support hierarchy + irq_domain operations. + +With support of hierarchy irq_domain and hierarchy irq_data ready, an +irq_domain structure is built for each interrupt controller, and an +irq_data structure is allocated for each irq_domain associated with an +IRQ. Now we could go one step further to support stacked(hierarchy) +irq_chip. That is, an irq_chip is associated with each irq_data along +the hierarchy. A child irq_chip may implement a required action by +itself or by cooperating with its parent irq_chip. + +With stacked irq_chip, interrupt controller driver only needs to deal +with the hardware managed by itself and may ask for services from its +parent irq_chip when needed. So we could achieve a much cleaner +software architecture. + +For an interrupt controller driver to support hierarchy irq_domain, it +needs to: + +1) Implement irq_domain_ops.alloc and irq_domain_ops.free +2) Optionally implement irq_domain_ops.activate and + irq_domain_ops.deactivate. +3) Optionally implement an irq_chip to manage the interrupt controller + hardware. +4) No need to implement irq_domain_ops.map and irq_domain_ops.unmap, + they are unused with hierarchy irq_domain. + +Hierarchy irq_domain is in no way x86 specific, and is heavily used to +support other architectures, such as ARM, ARM64 etc. + +Debugging +========= + +Most of the internals of the IRQ subsystem are exposed in debugfs by +turning CONFIG_GENERIC_IRQ_DEBUGFS on. diff --git a/Documentation/core-api/irq/irqflags-tracing.rst b/Documentation/core-api/irq/irqflags-tracing.rst new file mode 100644 index 000000000000..bdd208259fb3 --- /dev/null +++ b/Documentation/core-api/irq/irqflags-tracing.rst @@ -0,0 +1,52 @@ +======================= +IRQ-flags state tracing +======================= + +:Author: started by Ingo Molnar + +The "irq-flags tracing" feature "traces" hardirq and softirq state, in +that it gives interested subsystems an opportunity to be notified of +every hardirqs-off/hardirqs-on, softirqs-off/softirqs-on event that +happens in the kernel. + +CONFIG_TRACE_IRQFLAGS_SUPPORT is needed for CONFIG_PROVE_SPIN_LOCKING +and CONFIG_PROVE_RW_LOCKING to be offered by the generic lock debugging +code. Otherwise only CONFIG_PROVE_MUTEX_LOCKING and +CONFIG_PROVE_RWSEM_LOCKING will be offered on an architecture - these +are locking APIs that are not used in IRQ context. (the one exception +for rwsems is worked around) + +Architecture support for this is certainly not in the "trivial" +category, because lots of lowlevel assembly code deal with irq-flags +state changes. But an architecture can be irq-flags-tracing enabled in a +rather straightforward and risk-free manner. + +Architectures that want to support this need to do a couple of +code-organizational changes first: + +- add and enable TRACE_IRQFLAGS_SUPPORT in their arch level Kconfig file + +and then a couple of functional changes are needed as well to implement +irq-flags-tracing support: + +- in lowlevel entry code add (build-conditional) calls to the + trace_hardirqs_off()/trace_hardirqs_on() functions. The lock validator + closely guards whether the 'real' irq-flags matches the 'virtual' + irq-flags state, and complains loudly (and turns itself off) if the + two do not match. Usually most of the time for arch support for + irq-flags-tracing is spent in this state: look at the lockdep + complaint, try to figure out the assembly code we did not cover yet, + fix and repeat. Once the system has booted up and works without a + lockdep complaint in the irq-flags-tracing functions arch support is + complete. +- if the architecture has non-maskable interrupts then those need to be + excluded from the irq-tracing [and lock validation] mechanism via + lockdep_off()/lockdep_on(). + +In general there is no risk from having an incomplete irq-flags-tracing +implementation in an architecture: lockdep will detect that and will +turn itself off. I.e. the lock validator will still be reliable. There +should be no crashes due to irq-tracing bugs. (except if the assembly +changes break other code by modifying conditions or registers that +shouldn't be) + diff --git a/Documentation/ia64/irq-redir.rst b/Documentation/ia64/irq-redir.rst index 39bf94484a15..6bbbbe4f73ef 100644 --- a/Documentation/ia64/irq-redir.rst +++ b/Documentation/ia64/irq-redir.rst @@ -7,7 +7,7 @@ IRQ affinity on IA64 platforms By writing to /proc/irq/IRQ#/smp_affinity the interrupt routing can be controlled. The behavior on IA64 platforms is slightly different from -that described in Documentation/IRQ-affinity.txt for i386 systems. +that described in Documentation/core-api/irq/irq-affinity.rst for i386 systems. Because of the usage of SAPIC mode and physical destination mode the IRQ target is one particular CPU and cannot be a mask of several diff --git a/Documentation/irqflags-tracing.txt b/Documentation/irqflags-tracing.txt deleted file mode 100644 index bdd208259fb3..000000000000 --- a/Documentation/irqflags-tracing.txt +++ /dev/null @@ -1,52 +0,0 @@ -======================= -IRQ-flags state tracing -======================= - -:Author: started by Ingo Molnar - -The "irq-flags tracing" feature "traces" hardirq and softirq state, in -that it gives interested subsystems an opportunity to be notified of -every hardirqs-off/hardirqs-on, softirqs-off/softirqs-on event that -happens in the kernel. - -CONFIG_TRACE_IRQFLAGS_SUPPORT is needed for CONFIG_PROVE_SPIN_LOCKING -and CONFIG_PROVE_RW_LOCKING to be offered by the generic lock debugging -code. Otherwise only CONFIG_PROVE_MUTEX_LOCKING and -CONFIG_PROVE_RWSEM_LOCKING will be offered on an architecture - these -are locking APIs that are not used in IRQ context. (the one exception -for rwsems is worked around) - -Architecture support for this is certainly not in the "trivial" -category, because lots of lowlevel assembly code deal with irq-flags -state changes. But an architecture can be irq-flags-tracing enabled in a -rather straightforward and risk-free manner. - -Architectures that want to support this need to do a couple of -code-organizational changes first: - -- add and enable TRACE_IRQFLAGS_SUPPORT in their arch level Kconfig file - -and then a couple of functional changes are needed as well to implement -irq-flags-tracing support: - -- in lowlevel entry code add (build-conditional) calls to the - trace_hardirqs_off()/trace_hardirqs_on() functions. The lock validator - closely guards whether the 'real' irq-flags matches the 'virtual' - irq-flags state, and complains loudly (and turns itself off) if the - two do not match. Usually most of the time for arch support for - irq-flags-tracing is spent in this state: look at the lockdep - complaint, try to figure out the assembly code we did not cover yet, - fix and repeat. Once the system has booted up and works without a - lockdep complaint in the irq-flags-tracing functions arch support is - complete. -- if the architecture has non-maskable interrupts then those need to be - excluded from the irq-tracing [and lock validation] mechanism via - lockdep_off()/lockdep_on(). - -In general there is no risk from having an incomplete irq-flags-tracing -implementation in an architecture: lockdep will detect that and will -turn itself off. I.e. the lock validator will still be reliable. There -should be no crashes due to irq-tracing bugs. (except if the assembly -changes break other code by modifying conditions or registers that -shouldn't be) - diff --git a/Documentation/networking/scaling.rst b/Documentation/networking/scaling.rst index f78d7bf27ff5..8f0347b9fb3d 100644 --- a/Documentation/networking/scaling.rst +++ b/Documentation/networking/scaling.rst @@ -81,7 +81,7 @@ of queues to IRQs can be determined from /proc/interrupts. By default, an IRQ may be handled on any CPU. Because a non-negligible part of packet processing takes place in receive interrupt handling, it is advantageous to spread receive interrupts between CPUs. To manually adjust the IRQ -affinity of each interrupt see Documentation/IRQ-affinity.txt. Some systems +affinity of each interrupt see Documentation/core-api/irq/irq-affinity.rst. Some systems will be running irqbalance, a daemon that dynamically optimizes IRQ assignments and as a result may override any manual settings. @@ -160,7 +160,7 @@ can be configured for each receive queue using a sysfs file entry:: This file implements a bitmap of CPUs. RPS is disabled when it is zero (the default), in which case packets are processed on the interrupting -CPU. Documentation/IRQ-affinity.txt explains how CPUs are assigned to +CPU. Documentation/core-api/irq/irq-affinity.rst explains how CPUs are assigned to the bitmap. diff --git a/Documentation/translations/zh_CN/IRQ.txt b/Documentation/translations/zh_CN/IRQ.txt index 956026d5cf82..9aec8dca4fcf 100644 --- a/Documentation/translations/zh_CN/IRQ.txt +++ b/Documentation/translations/zh_CN/IRQ.txt @@ -1,4 +1,4 @@ -Chinese translated version of Documentation/IRQ.txt +Chinese translated version of Documentation/core-api/irq/index.rst If you have any comment or update to the content, please contact the original document maintainer directly. However, if you have a problem @@ -9,7 +9,7 @@ or if there is a problem with the translation. Maintainer: Eric W. Biederman Chinese maintainer: Fu Wei --------------------------------------------------------------------- -Documentation/IRQ.txt 的中文翻译 +Documentation/core-api/irq/index.rst 的中文翻译 如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 -- cgit v1.2.3