diff options
Diffstat (limited to 'Documentation/arch')
-rw-r--r-- | Documentation/arch/arm64/asymmetric-32bit.rst | 8 | ||||
-rw-r--r-- | Documentation/arch/arm64/booting.rst | 34 | ||||
-rw-r--r-- | Documentation/arch/arm64/elf_hwcaps.rst | 50 | ||||
-rw-r--r-- | Documentation/arch/arm64/gcs.rst | 2 | ||||
-rw-r--r-- | Documentation/arch/arm64/memory.rst | 65 | ||||
-rw-r--r-- | Documentation/arch/arm64/silicon-errata.rst | 5 | ||||
-rw-r--r-- | Documentation/arch/powerpc/cxl.rst | 3 | ||||
-rw-r--r-- | Documentation/arch/riscv/hwprobe.rst | 10 | ||||
-rw-r--r-- | Documentation/arch/x86/amd-memory-encryption.rst | 118 | ||||
-rw-r--r-- | Documentation/arch/x86/boot.rst | 369 | ||||
-rw-r--r-- | Documentation/arch/x86/resctrl.rst | 10 | ||||
-rw-r--r-- | Documentation/arch/x86/sva.rst | 4 | ||||
-rw-r--r-- | Documentation/arch/x86/topology.rst | 4 | ||||
-rw-r--r-- | Documentation/arch/x86/x86_64/boot-options.rst | 312 | ||||
-rw-r--r-- | Documentation/arch/x86/x86_64/fake-numa-for-cpusets.rst | 2 | ||||
-rw-r--r-- | Documentation/arch/x86/x86_64/index.rst | 1 | ||||
-rw-r--r-- | Documentation/arch/x86/x86_64/uefi.rst | 37 |
17 files changed, 455 insertions, 579 deletions
diff --git a/Documentation/arch/arm64/asymmetric-32bit.rst b/Documentation/arch/arm64/asymmetric-32bit.rst index 64a0b505da7d..1ca2b359a907 100644 --- a/Documentation/arch/arm64/asymmetric-32bit.rst +++ b/Documentation/arch/arm64/asymmetric-32bit.rst @@ -153,3 +153,11 @@ asymmetric system, a broken guest at EL1 could still attempt to execute mode will return to host userspace with an ``exit_reason`` of ``KVM_EXIT_FAIL_ENTRY`` and will remain non-runnable until successfully re-initialised by a subsequent ``KVM_ARM_VCPU_INIT`` operation. + +NOHZ FULL +--------- + +To avoid perturbing an adaptive-ticks CPU (specified using +``nohz_full=``) when a 32-bit task is forcefully migrated, these CPUs +are treated as 64-bit-only when support for asymmetric 32-bit systems +is enabled. diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst index 3278fb4bf219..dee7b6de864f 100644 --- a/Documentation/arch/arm64/booting.rst +++ b/Documentation/arch/arm64/booting.rst @@ -288,6 +288,12 @@ Before jumping into the kernel, the following conditions must be met: - SCR_EL3.FGTEn (bit 27) must be initialised to 0b1. + For CPUs with the Fine Grained Traps 2 (FEAT_FGT2) extension present: + + - If EL3 is present and the kernel is entered at EL2: + + - SCR_EL3.FGTEn2 (bit 59) must be initialised to 0b1. + For CPUs with support for HCRX_EL2 (FEAT_HCX) present: - If EL3 is present and the kernel is entered at EL2: @@ -382,6 +388,22 @@ Before jumping into the kernel, the following conditions must be met: - SMCR_EL2.EZT0 (bit 30) must be initialised to 0b1. + For CPUs with the Performance Monitors Extension (FEAT_PMUv3p9): + + - If EL3 is present: + + - MDCR_EL3.EnPM2 (bit 7) must be initialised to 0b1. + + - If the kernel is entered at EL1 and EL2 is present: + + - HDFGRTR2_EL2.nPMICNTR_EL0 (bit 2) must be initialised to 0b1. + - HDFGRTR2_EL2.nPMICFILTR_EL0 (bit 3) must be initialised to 0b1. + - HDFGRTR2_EL2.nPMUACR_EL1 (bit 4) must be initialised to 0b1. + + - HDFGWTR2_EL2.nPMICNTR_EL0 (bit 2) must be initialised to 0b1. + - HDFGWTR2_EL2.nPMICFILTR_EL0 (bit 3) must be initialised to 0b1. + - HDFGWTR2_EL2.nPMUACR_EL1 (bit 4) must be initialised to 0b1. + For CPUs with Memory Copy and Memory Set instructions (FEAT_MOPS): - If the kernel is entered at EL1 and EL2 is present: @@ -449,6 +471,18 @@ Before jumping into the kernel, the following conditions must be met: - HFGWTR_EL2.nGCS_EL0 (bit 52) must be initialised to 0b1. + - For CPUs with debug architecture i.e FEAT_Debugv8pN (all versions): + + - If EL3 is present: + + - MDCR_EL3.TDA (bit 9) must be initialized to 0b0 + + - For CPUs with FEAT_PMUv3: + + - If EL3 is present: + + - MDCR_EL3.TPM (bit 6) must be initialized to 0b0 + The requirements described above for CPU mode, caches, MMUs, architected timers, coherency and system registers apply to all CPUs. All CPUs must enter the kernel in the same exception level. Where the values documented diff --git a/Documentation/arch/arm64/elf_hwcaps.rst b/Documentation/arch/arm64/elf_hwcaps.rst index 1a31723e79fd..69d7afe56853 100644 --- a/Documentation/arch/arm64/elf_hwcaps.rst +++ b/Documentation/arch/arm64/elf_hwcaps.rst @@ -174,6 +174,56 @@ HWCAP_GCS Functionality implied by ID_AA64PFR1_EL1.GCS == 0b1, as described by Documentation/arch/arm64/gcs.rst. +HWCAP_CMPBR + Functionality implied by ID_AA64ISAR2_EL1.CSSC == 0b0010. + +HWCAP_FPRCVT + Functionality implied by ID_AA64ISAR3_EL1.FPRCVT == 0b0001. + +HWCAP_F8MM8 + Functionality implied by ID_AA64FPFR0_EL1.F8MM8 == 0b0001. + +HWCAP_F8MM4 + Functionality implied by ID_AA64FPFR0_EL1.F8MM4 == 0b0001. + +HWCAP_SVE_F16MM + Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and + ID_AA64ZFR0_EL1.F16MM == 0b0001. + +HWCAP_SVE_ELTPERM + Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and + ID_AA64ZFR0_EL1.ELTPERM == 0b0001. + +HWCAP_SVE_AES2 + Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and + ID_AA64ZFR0_EL1.AES == 0b0011. + +HWCAP_SVE_BFSCALE + Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and + ID_AA64ZFR0_EL1.B16B16 == 0b0010. + +HWCAP_SVE2P2 + Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001 and + ID_AA64ZFR0_EL1.SVEver == 0b0011. + +HWCAP_SME2P2 + Functionality implied by ID_AA64SMFR0_EL1.SMEver == 0b0011. + +HWCAP_SME_SBITPERM + Functionality implied by ID_AA64SMFR0_EL1.SBitPerm == 0b1. + +HWCAP_SME_AES + Functionality implied by ID_AA64SMFR0_EL1.AES == 0b1. + +HWCAP_SME_SFEXPA + Functionality implied by ID_AA64SMFR0_EL1.SFEXPA == 0b1. + +HWCAP_SME_STMOP + Functionality implied by ID_AA64SMFR0_EL1.STMOP == 0b1. + +HWCAP_SME_SMOP4 + Functionality implied by ID_AA64SMFR0_EL1.SMOP4 == 0b1. + HWCAP2_DCPODP Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0010. diff --git a/Documentation/arch/arm64/gcs.rst b/Documentation/arch/arm64/gcs.rst index 1f65a3193e77..226c0b008456 100644 --- a/Documentation/arch/arm64/gcs.rst +++ b/Documentation/arch/arm64/gcs.rst @@ -37,7 +37,7 @@ intended to be exhaustive. shadow stacks rather than GCS. * Support for GCS is reported to userspace via HWCAP_GCS in the aux vector - AT_HWCAP2 entry. + AT_HWCAP entry. * GCS is enabled per thread. While there is support for disabling GCS at runtime this should be done with great care. diff --git a/Documentation/arch/arm64/memory.rst b/Documentation/arch/arm64/memory.rst index 8a658984b8bb..678fbb418c3a 100644 --- a/Documentation/arch/arm64/memory.rst +++ b/Documentation/arch/arm64/memory.rst @@ -23,71 +23,6 @@ swapper_pg_dir contains only kernel (global) mappings while the user pgd contains only user (non-global) mappings. The swapper_pg_dir address is written to TTBR1 and never written to TTBR0. - -AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit):: - - Start End Size Use - ----------------------------------------------------------------------- - 0000000000000000 0000ffffffffffff 256TB user - ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map - [ffff600000000000 ffff7fffffffffff] 32TB [kasan shadow region] - ffff800000000000 ffff80007fffffff 2GB modules - ffff800080000000 fffffbffefffffff 124TB vmalloc - fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down) - fffffbfffe000000 fffffbfffe7fffff 8MB [guard region] - fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space - fffffbffff800000 fffffbffffffffff 8MB [guard region] - fffffc0000000000 fffffdffffffffff 2TB vmemmap - fffffe0000000000 ffffffffffffffff 2TB [guard region] - - -AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support):: - - Start End Size Use - ----------------------------------------------------------------------- - 0000000000000000 000fffffffffffff 4PB user - fff0000000000000 ffff7fffffffffff ~4PB kernel logical memory map - [fffd800000000000 ffff7fffffffffff] 512TB [kasan shadow region] - ffff800000000000 ffff80007fffffff 2GB modules - ffff800080000000 fffffbffefffffff 124TB vmalloc - fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down) - fffffbfffe000000 fffffbfffe7fffff 8MB [guard region] - fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space - fffffbffff800000 fffffbffffffffff 8MB [guard region] - fffffc0000000000 ffffffdfffffffff ~4TB vmemmap - ffffffe000000000 ffffffffffffffff 128GB [guard region] - - -Translation table lookup with 4KB pages:: - - +--------+--------+--------+--------+--------+--------+--------+--------+ - |63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| - +--------+--------+--------+--------+--------+--------+--------+--------+ - | | | | | | - | | | | | v - | | | | | [11:0] in-page offset - | | | | +-> [20:12] L3 index - | | | +-----------> [29:21] L2 index - | | +---------------------> [38:30] L1 index - | +-------------------------------> [47:39] L0 index - +----------------------------------------> [55] TTBR0/1 - - -Translation table lookup with 64KB pages:: - - +--------+--------+--------+--------+--------+--------+--------+--------+ - |63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| - +--------+--------+--------+--------+--------+--------+--------+--------+ - | | | | | - | | | | v - | | | | [15:0] in-page offset - | | | +----------> [28:16] L3 index - | | +--------------------------> [41:29] L2 index - | +-------------------------------> [47:42] L1 index (48-bit) - | [51:42] L1 index (52-bit) - +----------------------------------------> [55] TTBR0/1 - - When using KVM without the Virtualization Host Extensions, the hypervisor maps kernel pages in EL2 at a fixed (and potentially random) offset from the linear mapping. See the kern_hyp_va macro and diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst index b42fea07c5ce..f968c13b46a7 100644 --- a/Documentation/arch/arm64/silicon-errata.rst +++ b/Documentation/arch/arm64/silicon-errata.rst @@ -198,7 +198,8 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-V3 | #3312417 | ARM64_ERRATUM_3194386 | +----------------+-----------------+-----------------+-----------------------------+ -| ARM | MMU-500 | #841119,826419 | N/A | +| ARM | MMU-500 | #841119,826419 | ARM_SMMU_MMU_500_CPRE_ERRATA| +| | | #562869,1047329 | | +----------------+-----------------+-----------------+-----------------------------+ | ARM | MMU-600 | #1076982,1209401| N/A | +----------------+-----------------+-----------------+-----------------------------+ @@ -283,6 +284,8 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | Rockchip | RK3588 | #3588001 | ROCKCHIP_ERRATUM_3588001 | +----------------+-----------------+-----------------+-----------------------------+ +| Rockchip | RK3568 | #3568002 | ROCKCHIP_ERRATUM_3568002 | ++----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ | Fujitsu | A64FX | E#010001 | FUJITSU_ERRATUM_010001 | +----------------+-----------------+-----------------+-----------------------------+ diff --git a/Documentation/arch/powerpc/cxl.rst b/Documentation/arch/powerpc/cxl.rst index d2d77057610e..778adda740d2 100644 --- a/Documentation/arch/powerpc/cxl.rst +++ b/Documentation/arch/powerpc/cxl.rst @@ -18,6 +18,7 @@ Introduction both access system memory directly and with the same effective addresses. + **This driver is deprecated and will be removed in a future release.** Hardware overview ================= @@ -453,7 +454,7 @@ Sysfs Class A cxl sysfs class is added under /sys/class/cxl to facilitate enumeration and tuning of the accelerators. Its layout is - described in Documentation/ABI/testing/sysfs-class-cxl + described in Documentation/ABI/obsolete/sysfs-class-cxl Udev rules diff --git a/Documentation/arch/riscv/hwprobe.rst b/Documentation/arch/riscv/hwprobe.rst index 955fbcd19ce9..f273ea15a8e8 100644 --- a/Documentation/arch/riscv/hwprobe.rst +++ b/Documentation/arch/riscv/hwprobe.rst @@ -293,3 +293,13 @@ The following keys are defined: * :c:macro:`RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED`: Misaligned vector accesses are not supported at all and will generate a misaligned address fault. + +* :c:macro:`RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0`: A bitmask containing the + thead vendor extensions that are compatible with the + :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`: base system behavior. + + * T-HEAD + + * :c:macro:`RISCV_HWPROBE_VENDOR_EXT_XTHEADVECTOR`: The xtheadvector vendor + extension is supported in the T-Head ISA extensions spec starting from + commit a18c801634 ("Add T-Head VECTOR vendor extension. "). diff --git a/Documentation/arch/x86/amd-memory-encryption.rst b/Documentation/arch/x86/amd-memory-encryption.rst index 6df3264f23b9..bd840df708ea 100644 --- a/Documentation/arch/x86/amd-memory-encryption.rst +++ b/Documentation/arch/x86/amd-memory-encryption.rst @@ -130,8 +130,126 @@ SNP feature support. More details in AMD64 APM[1] Vol 2: 15.34.10 SEV_STATUS MSR +Reverse Map Table (RMP) +======================= + +The RMP is a structure in system memory that is used to ensure a one-to-one +mapping between system physical addresses and guest physical addresses. Each +page of memory that is potentially assignable to guests has one entry within +the RMP. + +The RMP table can be either contiguous in memory or a collection of segments +in memory. + +Contiguous RMP +-------------- + +Support for this form of the RMP is present when support for SEV-SNP is +present, which can be determined using the CPUID instruction:: + + 0x8000001f[eax]: + Bit[4] indicates support for SEV-SNP + +The location of the RMP is identified to the hardware through two MSRs:: + + 0xc0010132 (RMP_BASE): + System physical address of the first byte of the RMP + + 0xc0010133 (RMP_END): + System physical address of the last byte of the RMP + +Hardware requires that RMP_BASE and (RPM_END + 1) be 8KB aligned, but SEV +firmware increases the alignment requirement to require a 1MB alignment. + +The RMP consists of a 16KB region used for processor bookkeeping followed +by the RMP entries, which are 16 bytes in size. The size of the RMP +determines the range of physical memory that the hypervisor can assign to +SEV-SNP guests. The RMP covers the system physical address from:: + + 0 to ((RMP_END + 1 - RMP_BASE - 16KB) / 16B) x 4KB. + +The current Linux support relies on BIOS to allocate/reserve the memory for +the RMP and to set RMP_BASE and RMP_END appropriately. Linux uses the MSR +values to locate the RMP and determine the size of the RMP. The RMP must +cover all of system memory in order for Linux to enable SEV-SNP. + +Segmented RMP +------------- + +Segmented RMP support is a new way of representing the layout of an RMP. +Initial RMP support required the RMP table to be contiguous in memory. +RMP accesses from a NUMA node on which the RMP doesn't reside +can take longer than accesses from a NUMA node on which the RMP resides. +Segmented RMP support allows the RMP entries to be located on the same +node as the memory the RMP is covering, potentially reducing latency +associated with accessing an RMP entry associated with the memory. Each +RMP segment covers a specific range of system physical addresses. + +Support for this form of the RMP can be determined using the CPUID +instruction:: + + 0x8000001f[eax]: + Bit[23] indicates support for segmented RMP + +If supported, segmented RMP attributes can be found using the CPUID +instruction:: + + 0x80000025[eax]: + Bits[5:0] minimum supported RMP segment size + Bits[11:6] maximum supported RMP segment size + + 0x80000025[ebx]: + Bits[9:0] number of cacheable RMP segment definitions + Bit[10] indicates if the number of cacheable RMP segments + is a hard limit + +To enable a segmented RMP, a new MSR is available:: + + 0xc0010136 (RMP_CFG): + Bit[0] indicates if segmented RMP is enabled + Bits[13:8] contains the size of memory covered by an RMP + segment (expressed as a power of 2) + +The RMP segment size defined in the RMP_CFG MSR applies to all segments +of the RMP. Therefore each RMP segment covers a specific range of system +physical addresses. For example, if the RMP_CFG MSR value is 0x2401, then +the RMP segment coverage value is 0x24 => 36, meaning the size of memory +covered by an RMP segment is 64GB (1 << 36). So the first RMP segment +covers physical addresses from 0 to 0xF_FFFF_FFFF, the second RMP segment +covers physical addresses from 0x10_0000_0000 to 0x1F_FFFF_FFFF, etc. + +When a segmented RMP is enabled, RMP_BASE points to the RMP bookkeeping +area as it does today (16K in size). However, instead of RMP entries +beginning immediately after the bookkeeping area, there is a 4K RMP +segment table (RST). Each entry in the RST is 8-bytes in size and represents +an RMP segment:: + + Bits[19:0] mapped size (in GB) + The mapped size can be less than the defined segment size. + A value of zero, indicates that no RMP exists for the range + of system physical addresses associated with this segment. + Bits[51:20] segment physical address + This address is left shift 20-bits (or just masked when + read) to form the physical address of the segment (1MB + alignment). + +The RST can hold 512 segment entries but can be limited in size to the number +of cacheable RMP segments (CPUID 0x80000025_EBX[9:0]) if the number of cacheable +RMP segments is a hard limit (CPUID 0x80000025_EBX[10]). + +The current Linux support relies on BIOS to allocate/reserve the memory for +the segmented RMP (the bookkeeping area, RST, and all segments), build the RST +and to set RMP_BASE, RMP_END, and RMP_CFG appropriately. Linux uses the MSR +values to locate the RMP and determine the size and location of the RMP +segments. The RMP must cover all of system memory in order for Linux to enable +SEV-SNP. + +More details in the AMD64 APM Vol 2, section "15.36.3 Reverse Map Table", +docID: 24593. + Secure VM Service Module (SVSM) =============================== + SNP provides a feature called Virtual Machine Privilege Levels (VMPL) which defines four privilege levels at which guest software can run. The most privileged level is 0 and numerically higher numbers have lesser privileges. diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst index ad2d8ddad27f..76f53d3450e7 100644 --- a/Documentation/arch/x86/boot.rst +++ b/Documentation/arch/x86/boot.rst @@ -77,7 +77,7 @@ Protocol 2.14 BURNT BY INCORRECT COMMIT Protocol 2.15 (Kernel 5.5) Added the kernel_info and kernel_info.setup_type_max. ============= ============================================================ - .. note:: +.. note:: The protocol version number should be changed only if the setup header is changed. There is no need to update the version number if boot_params or kernel_info are changed. Additionally, it is recommended to use @@ -95,27 +95,27 @@ Memory Layout The traditional memory map for the kernel loader, used for Image or zImage kernels, typically looks like:: - | | - 0A0000 +------------------------+ - | Reserved for BIOS | Do not use. Reserved for BIOS EBDA. - 09A000 +------------------------+ - | Command line | - | Stack/heap | For use by the kernel real-mode code. - 098000 +------------------------+ - | Kernel setup | The kernel real-mode code. - 090200 +------------------------+ - | Kernel boot sector | The kernel legacy boot sector. - 090000 +------------------------+ - | Protected-mode kernel | The bulk of the kernel image. - 010000 +------------------------+ - | Boot loader | <- Boot sector entry point 0000:7C00 - 001000 +------------------------+ - | Reserved for MBR/BIOS | - 000800 +------------------------+ - | Typically used by MBR | - 000600 +------------------------+ - | BIOS use only | - 000000 +------------------------+ + | | + 0A0000 +------------------------+ + | Reserved for BIOS | Do not use. Reserved for BIOS EBDA. + 09A000 +------------------------+ + | Command line | + | Stack/heap | For use by the kernel real-mode code. + 098000 +------------------------+ + | Kernel setup | The kernel real-mode code. + 090200 +------------------------+ + | Kernel boot sector | The kernel legacy boot sector. + 090000 +------------------------+ + | Protected-mode kernel | The bulk of the kernel image. + 010000 +------------------------+ + | Boot loader | <- Boot sector entry point 0000:7C00 + 001000 +------------------------+ + | Reserved for MBR/BIOS | + 000800 +------------------------+ + | Typically used by MBR | + 000600 +------------------------+ + | BIOS use only | + 000000 +------------------------+ When using bzImage, the protected-mode kernel was relocated to 0x100000 ("high memory"), and the kernel real-mode block (boot sector, @@ -142,28 +142,28 @@ above the 0x9A000 point; too many BIOSes will break above that point. For a modern bzImage kernel with boot protocol version >= 2.02, a memory layout like the following is suggested:: - ~ ~ - | Protected-mode kernel | - 100000 +------------------------+ - | I/O memory hole | - 0A0000 +------------------------+ - | Reserved for BIOS | Leave as much as possible unused - ~ ~ - | Command line | (Can also be below the X+10000 mark) - X+10000 +------------------------+ - | Stack/heap | For use by the kernel real-mode code. - X+08000 +------------------------+ - | Kernel setup | The kernel real-mode code. - | Kernel boot sector | The kernel legacy boot sector. - X +------------------------+ - | Boot loader | <- Boot sector entry point 0000:7C00 - 001000 +------------------------+ - | Reserved for MBR/BIOS | - 000800 +------------------------+ - | Typically used by MBR | - 000600 +------------------------+ - | BIOS use only | - 000000 +------------------------+ + ~ ~ + | Protected-mode kernel | + 100000 +------------------------+ + | I/O memory hole | + 0A0000 +------------------------+ + | Reserved for BIOS | Leave as much as possible unused + ~ ~ + | Command line | (Can also be below the X+10000 mark) + X+10000 +------------------------+ + | Stack/heap | For use by the kernel real-mode code. + X+08000 +------------------------+ + | Kernel setup | The kernel real-mode code. + | Kernel boot sector | The kernel legacy boot sector. + X +------------------------+ + | Boot loader | <- Boot sector entry point 0000:7C00 + 001000 +------------------------+ + | Reserved for MBR/BIOS | + 000800 +------------------------+ + | Typically used by MBR | + 000600 +------------------------+ + | BIOS use only | + 000000 +------------------------+ ... where the address X is as low as the design of the boot loader permits. @@ -229,22 +229,22 @@ Offset/Size Proto Name Meaning =========== ======== ===================== ============================================ .. note:: - (1) For backwards compatibility, if the setup_sects field contains 0, the - real value is 4. + (1) For backwards compatibility, if the setup_sects field contains 0, + the real value is 4. - (2) For boot protocol prior to 2.04, the upper two bytes of the syssize - field are unusable, which means the size of a bzImage kernel - cannot be determined. + (2) For boot protocol prior to 2.04, the upper two bytes of the syssize + field are unusable, which means the size of a bzImage kernel + cannot be determined. - (3) Ignored, but safe to set, for boot protocols 2.02-2.09. + (3) Ignored, but safe to set, for boot protocols 2.02-2.09. If the "HdrS" (0x53726448) magic number is not found at offset 0x202, the boot protocol version is "old". Loading an old kernel, the following parameters should be assumed:: - Image type = zImage - initrd not supported - Real-mode kernel must be located at 0x90000. + Image type = zImage + initrd not supported + Real-mode kernel must be located at 0x90000. Otherwise, the "version" field contains the protocol version, e.g. protocol version 2.01 will contain 0x0201 in this field. When @@ -265,7 +265,7 @@ All general purpose boot loaders should write the fields marked nonstandard address should fill in the fields marked (reloc); other boot loaders can ignore those fields. -The byte order of all fields is littleendian (this is x86, after all.) +The byte order of all fields is little endian (this is x86, after all.) ============ =========== Field name: setup_sects @@ -365,7 +365,7 @@ Offset/size: 0x206/2 Protocol: 2.00+ ============ ======= - Contains the boot protocol version, in (major << 8)+minor format, + Contains the boot protocol version, in (major << 8) + minor format, e.g. 0x0204 for version 2.04, and 0x0a11 for a hypothetical version 10.17. @@ -397,17 +397,17 @@ Protocol: 2.00+ If set to a nonzero value, contains a pointer to a NUL-terminated human-readable kernel version number string, less 0x200. This can be used to display the kernel version to the user. This value - should be less than (0x200*setup_sects). + should be less than (0x200 * setup_sects). For example, if this value is set to 0x1c00, the kernel version number string can be found at offset 0x1e00 in the kernel file. This is a valid value if and only if the "setup_sects" field contains the value 15 or higher, as:: - 0x1c00 < 15*0x200 (= 0x1e00) but - 0x1c00 >= 14*0x200 (= 0x1c00) + 0x1c00 < 15 * 0x200 (= 0x1e00) but + 0x1c00 >= 14 * 0x200 (= 0x1c00) - 0x1c00 >> 9 = 14, So the minimum value for setup_secs is 15. + 0x1c00 >> 9 = 14, So the minimum value for setup_secs is 15. ============ ================== Field name: type_of_loader @@ -427,9 +427,9 @@ Protocol: 2.00+ For example, for T = 0x15, V = 0x234, write:: - type_of_loader <- 0xE4 - ext_loader_type <- 0x05 - ext_loader_ver <- 0x23 + type_of_loader <- 0xE4 + ext_loader_type <- 0x05 + ext_loader_ver <- 0x23 Assigned boot loader ids (hexadecimal): @@ -686,7 +686,7 @@ Protocol: 2.10+ If a boot loader makes use of this field, it should update the kernel_alignment field with the alignment unit desired; typically:: - kernel_alignment = 1 << min_alignment + kernel_alignment = 1 << min_alignment; There may be a considerable performance cost with an excessively misaligned kernel. Therefore, a loader should typically try each @@ -754,7 +754,7 @@ Protocol: 2.07+ 0x00000000 The default x86/PC environment 0x00000001 lguest 0x00000002 Xen - 0x00000003 Moorestown MID + 0x00000003 Intel MID (Moorestown, CloverTrail, Merrifield, Moorefield) 0x00000004 CE4100 TV Platform ========== ============================== @@ -808,13 +808,13 @@ Protocol: 2.09+ parameters passing mechanism. The definition of struct setup_data is as follow:: - struct setup_data { - u64 next; - u32 type; - u32 len; - u8 data[0]; - }; - + struct setup_data { + __u64 next; + __u32 type; + __u32 len; + __u8 data[]; + } + Where, the next is a 64-bit physical pointer to the next node of linked list, the next field of the last node is 0; the type is used to identify the contents of data; the len is the length of data @@ -834,12 +834,12 @@ Protocol: 2.09+ Thus setup_indirect struct and SETUP_INDIRECT type were introduced in protocol 2.15:: - struct setup_indirect { - __u32 type; - __u32 reserved; /* Reserved, must be set to zero. */ - __u64 len; - __u64 addr; - }; + struct setup_indirect { + __u32 type; + __u32 reserved; /* Reserved, must be set to zero. */ + __u64 len; + __u64 addr; + }; The type member is a SETUP_INDIRECT | SETUP_* type. However, it cannot be SETUP_INDIRECT itself since making the setup_indirect a tree structure @@ -849,17 +849,17 @@ Protocol: 2.09+ Let's give an example how to point to SETUP_E820_EXT data using setup_indirect. In this case setup_data and setup_indirect will look like this:: - struct setup_data { - __u64 next = 0 or <addr_of_next_setup_data_struct>; - __u32 type = SETUP_INDIRECT; - __u32 len = sizeof(setup_indirect); - __u8 data[sizeof(setup_indirect)] = struct setup_indirect { - __u32 type = SETUP_INDIRECT | SETUP_E820_EXT; - __u32 reserved = 0; - __u64 len = <len_of_SETUP_E820_EXT_data>; - __u64 addr = <addr_of_SETUP_E820_EXT_data>; - } - } + struct setup_data { + .next = 0, /* or <addr_of_next_setup_data_struct> */ + .type = SETUP_INDIRECT, + .len = sizeof(setup_indirect), + .data[sizeof(setup_indirect)] = (struct setup_indirect) { + .type = SETUP_INDIRECT | SETUP_E820_EXT, + .reserved = 0, + .len = <len_of_SETUP_E820_EXT_data>, + .addr = <addr_of_SETUP_E820_EXT_data>, + }, + } .. note:: SETUP_INDIRECT | SETUP_NONE objects cannot be properly distinguished @@ -896,19 +896,19 @@ Offset/size: 0x260/4 The kernel runtime start address is determined by the following algorithm:: - if (relocatable_kernel) { - if (load_address < pref_address) - load_address = pref_address; - runtime_start = align_up(load_address, kernel_alignment); - } else { - runtime_start = pref_address; - } + if (relocatable_kernel) { + if (load_address < pref_address) + load_address = pref_address; + runtime_start = align_up(load_address, kernel_alignment); + } else { + runtime_start = pref_address; + } Hence the necessary memory window location and size can be estimated by a boot loader as:: - memory_window_start = runtime_start; - memory_window_size = init_size; + memory_window_start = runtime_start; + memory_window_size = init_size; ============ =============== Field name: handover_offset @@ -938,12 +938,12 @@ The kernel_info =============== The relationships between the headers are analogous to the various data -sections: +sections:: setup_header = .data boot_params/setup_data = .bss -What is missing from the above list? That's right: +What is missing from the above list? That's right:: kernel_info = .rodata @@ -975,22 +975,22 @@ after kernel_info_var_len_data label. Each chunk of variable size data has to be prefixed with header/magic and its size, e.g.:: kernel_info: - .ascii "LToP" /* Header, Linux top (structure). */ - .long kernel_info_var_len_data - kernel_info - .long kernel_info_end - kernel_info - .long 0x01234567 /* Some fixed size data for the bootloaders. */ + .ascii "LToP" /* Header, Linux top (structure). */ + .long kernel_info_var_len_data - kernel_info + .long kernel_info_end - kernel_info + .long 0x01234567 /* Some fixed size data for the bootloaders. */ kernel_info_var_len_data: - example_struct: /* Some variable size data for the bootloaders. */ - .ascii "0123" /* Header/Magic. */ - .long example_struct_end - example_struct - .ascii "Struct" - .long 0x89012345 + example_struct: /* Some variable size data for the bootloaders. */ + .ascii "0123" /* Header/Magic. */ + .long example_struct_end - example_struct + .ascii "Struct" + .long 0x89012345 example_struct_end: - example_strings: /* Some variable size data for the bootloaders. */ - .ascii "ABCD" /* Header/Magic. */ - .long example_strings_end - example_strings - .asciz "String_0" - .asciz "String_1" + example_strings: /* Some variable size data for the bootloaders. */ + .ascii "ABCD" /* Header/Magic. */ + .long example_strings_end - example_strings + .asciz "String_0" + .asciz "String_1" example_strings_end: kernel_info_end: @@ -1139,67 +1139,63 @@ mode segment. Such a boot loader should enter the following fields in the header:: - unsigned long base_ptr; /* base address for real-mode segment */ - - if ( setup_sects == 0 ) { - setup_sects = 4; - } + unsigned long base_ptr; /* base address for real-mode segment */ - if ( protocol >= 0x0200 ) { - type_of_loader = <type code>; - if ( loading_initrd ) { - ramdisk_image = <initrd_address>; - ramdisk_size = <initrd_size>; - } + if (setup_sects == 0) + setup_sects = 4; - if ( protocol >= 0x0202 && loadflags & 0x01 ) - heap_end = 0xe000; - else - heap_end = 0x9800; + if (protocol >= 0x0200) { + type_of_loader = <type code>; + if (loading_initrd) { + ramdisk_image = <initrd_address>; + ramdisk_size = <initrd_size>; + } - if ( protocol >= 0x0201 ) { - heap_end_ptr = heap_end - 0x200; - loadflags |= 0x80; /* CAN_USE_HEAP */ - } + if (protocol >= 0x0202 && loadflags & 0x01) + heap_end = 0xe000; + else + heap_end = 0x9800; - if ( protocol >= 0x0202 ) { - cmd_line_ptr = base_ptr + heap_end; - strcpy(cmd_line_ptr, cmdline); - } else { - cmd_line_magic = 0xA33F; - cmd_line_offset = heap_end; - setup_move_size = heap_end + strlen(cmdline)+1; - strcpy(base_ptr+cmd_line_offset, cmdline); - } - } else { - /* Very old kernel */ + if (protocol >= 0x0201) { + heap_end_ptr = heap_end - 0x200; + loadflags |= 0x80; /* CAN_USE_HEAP */ + } - heap_end = 0x9800; + if (protocol >= 0x0202) { + cmd_line_ptr = base_ptr + heap_end; + strcpy(cmd_line_ptr, cmdline); + } else { + cmd_line_magic = 0xA33F; + cmd_line_offset = heap_end; + setup_move_size = heap_end + strlen(cmdline) + 1; + strcpy(base_ptr + cmd_line_offset, cmdline); + } + } else { + /* Very old kernel */ - cmd_line_magic = 0xA33F; - cmd_line_offset = heap_end; + heap_end = 0x9800; - /* A very old kernel MUST have its real-mode code - loaded at 0x90000 */ + cmd_line_magic = 0xA33F; + cmd_line_offset = heap_end; - if ( base_ptr != 0x90000 ) { - /* Copy the real-mode kernel */ - memcpy(0x90000, base_ptr, (setup_sects+1)*512); - base_ptr = 0x90000; /* Relocated */ - } + /* A very old kernel MUST have its real-mode code loaded at 0x90000 */ + if (base_ptr != 0x90000) { + /* Copy the real-mode kernel */ + memcpy(0x90000, base_ptr, (setup_sects + 1) * 512); + base_ptr = 0x90000; /* Relocated */ + } - strcpy(0x90000+cmd_line_offset, cmdline); + strcpy(0x90000 + cmd_line_offset, cmdline); - /* It is recommended to clear memory up to the 32K mark */ - memset(0x90000 + (setup_sects+1)*512, 0, - (64-(setup_sects+1))*512); - } + /* It is recommended to clear memory up to the 32K mark */ + memset(0x90000 + (setup_sects + 1) * 512, 0, (64 - (setup_sects + 1)) * 512); + } Loading The Rest of The Kernel ============================== -The 32-bit (non-real-mode) kernel starts at offset (setup_sects+1)*512 +The 32-bit (non-real-mode) kernel starts at offset (setup_sects + 1) * 512 in the kernel file (again, if setup_sects == 0 the real value is 4.) It should be loaded at address 0x10000 for Image/zImage kernels and 0x100000 for bzImage kernels. @@ -1207,13 +1203,14 @@ It should be loaded at address 0x10000 for Image/zImage kernels and The kernel is a bzImage kernel if the protocol >= 2.00 and the 0x01 bit (LOAD_HIGH) in the loadflags field is set:: - is_bzImage = (protocol >= 0x0200) && (loadflags & 0x01); - load_address = is_bzImage ? 0x100000 : 0x10000; + is_bzImage = (protocol >= 0x0200) && (loadflags & 0x01); + load_address = is_bzImage ? 0x100000 : 0x10000; -Note that Image/zImage kernels can be up to 512K in size, and thus use -the entire 0x10000-0x90000 range of memory. This means it is pretty -much a requirement for these kernels to load the real-mode part at -0x90000. bzImage kernels allow much more flexibility. +.. note:: + Image/zImage kernels can be up to 512K in size, and thus use the entire + 0x10000-0x90000 range of memory. This means it is pretty much a + requirement for these kernels to load the real-mode part at 0x90000. + bzImage kernels allow much more flexibility. Special Command Line Options ============================ @@ -1282,19 +1279,20 @@ es = ss. In our example from above, we would do:: - /* Note: in the case of the "old" kernel protocol, base_ptr must - be == 0x90000 at this point; see the previous sample code */ - - seg = base_ptr >> 4; + /* + * Note: in the case of the "old" kernel protocol, base_ptr must + * be == 0x90000 at this point; see the previous sample code. + */ + seg = base_ptr >> 4; - cli(); /* Enter with interrupts disabled! */ + cli(); /* Enter with interrupts disabled! */ - /* Set up the real-mode kernel stack */ - _SS = seg; - _SP = heap_end; + /* Set up the real-mode kernel stack */ + _SS = seg; + _SP = heap_end; - _DS = _ES = _FS = _GS = seg; - jmp_far(seg+0x20, 0); /* Run the kernel */ + _DS = _ES = _FS = _GS = seg; + jmp_far(seg + 0x20, 0); /* Run the kernel */ If your boot sector accesses a floppy drive, it is recommended to switch off the floppy motor before running the kernel, since the @@ -1349,7 +1347,7 @@ from offset 0x01f1 of kernel image on should be loaded into struct boot_params and examined. The end of setup header can be calculated as follow:: - 0x0202 + byte value at offset 0x0201 + 0x0202 + byte value at offset 0x0201 In addition to read/modify/write the setup header of the struct boot_params as that of 16-bit boot protocol, the boot loader should @@ -1385,7 +1383,7 @@ Then, the setup header at offset 0x01f1 of kernel image on should be loaded into struct boot_params and examined. The end of setup header can be calculated as follows:: - 0x0202 + byte value at offset 0x0201 + 0x0202 + byte value at offset 0x0201 In addition to read/modify/write the setup header of the struct boot_params as that of 16-bit boot protocol, the boot loader should @@ -1427,7 +1425,7 @@ execution context provided by the EFI firmware. The function prototype for the handover entry point looks like this:: - efi_stub_entry(void *handle, efi_system_table_t *table, struct boot_params *bp) + void efi_stub_entry(void *handle, efi_system_table_t *table, struct boot_params *bp); 'handle' is the EFI image handle passed to the boot loader by the EFI firmware, 'table' is the EFI system table - these are the first two @@ -1442,12 +1440,13 @@ The boot loader *must* fill out the following fields in bp:: All other fields should be zero. -NOTE: The EFI Handover Protocol is deprecated in favour of the ordinary PE/COFF - entry point, combined with the LINUX_EFI_INITRD_MEDIA_GUID based initrd - loading protocol (refer to [0] for an example of the bootloader side of - this), which removes the need for any knowledge on the part of the EFI - bootloader regarding the internal representation of boot_params or any - requirements/limitations regarding the placement of the command line - and ramdisk in memory, or the placement of the kernel image itself. +.. note:: + The EFI Handover Protocol is deprecated in favour of the ordinary PE/COFF + entry point, combined with the LINUX_EFI_INITRD_MEDIA_GUID based initrd + loading protocol (refer to [0] for an example of the bootloader side of + this), which removes the need for any knowledge on the part of the EFI + bootloader regarding the internal representation of boot_params or any + requirements/limitations regarding the placement of the command line + and ramdisk in memory, or the placement of the kernel image itself. [0] https://github.com/u-boot/u-boot/commit/ec80b4735a593961fe701cc3a5d717d4739b0fd0 diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst index a824affd741d..6768fc1fad16 100644 --- a/Documentation/arch/x86/resctrl.rst +++ b/Documentation/arch/x86/resctrl.rst @@ -384,6 +384,16 @@ When monitoring is enabled all MON groups will also contain: Available only with debug option. The identifier used by hardware for the monitor group. On x86 this is the RMID. +When the "mba_MBps" mount option is used all CTRL_MON groups will also contain: + +"mba_MBps_event": + Reading this file shows which memory bandwidth event is used + as input to the software feedback loop that keeps memory bandwidth + below the value specified in the schemata file. Writing the + name of one of the supported memory bandwidth events found in + /sys/fs/resctrl/info/L3_MON/mon_features changes the input + event. + Resource allocation rules ------------------------- diff --git a/Documentation/arch/x86/sva.rst b/Documentation/arch/x86/sva.rst index 33cb05005982..6a759984d471 100644 --- a/Documentation/arch/x86/sva.rst +++ b/Documentation/arch/x86/sva.rst @@ -25,7 +25,7 @@ to cache translations for virtual addresses. The IOMMU driver uses the mmu_notifier() support to keep the device TLB cache and the CPU cache in sync. When an ATS lookup fails for a virtual address, the device should use the PRI in order to request the virtual address to be paged into the -CPU page tables. The device must use ATS again in order the fetch the +CPU page tables. The device must use ATS again in order to fetch the translation before use. Shared Hardware Workqueues @@ -216,7 +216,7 @@ submitting work and processing completions. Single Root I/O Virtualization (SR-IOV) focuses on providing independent hardware interfaces for virtualizing hardware. Hence, it's required to be -almost fully functional interface to software supporting the traditional +an almost fully functional interface to software supporting the traditional BARs, space for interrupts via MSI-X, its own register layout. Virtual Functions (VFs) are assisted by the Physical Function (PF) driver. diff --git a/Documentation/arch/x86/topology.rst b/Documentation/arch/x86/topology.rst index 7352ab89a55a..c12837e61bda 100644 --- a/Documentation/arch/x86/topology.rst +++ b/Documentation/arch/x86/topology.rst @@ -135,6 +135,10 @@ Thread-related topology information in the kernel: The ID of the core to which a thread belongs. It is also printed in /proc/cpuinfo "core_id." + - topology_logical_core_id(); + + The logical core ID to which a thread belongs. + System topology examples diff --git a/Documentation/arch/x86/x86_64/boot-options.rst b/Documentation/arch/x86/x86_64/boot-options.rst deleted file mode 100644 index d69e3cfbdba5..000000000000 --- a/Documentation/arch/x86/x86_64/boot-options.rst +++ /dev/null @@ -1,312 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -=========================== -AMD64 Specific Boot Options -=========================== - -There are many others (usually documented in driver documentation), but -only the AMD64 specific ones are listed here. - -Machine check -============= -Please see Documentation/arch/x86/x86_64/machinecheck.rst for sysfs runtime tunables. - - mce=off - Disable machine check - mce=no_cmci - Disable CMCI(Corrected Machine Check Interrupt) that - Intel processor supports. Usually this disablement is - not recommended, but it might be handy if your hardware - is misbehaving. - Note that you'll get more problems without CMCI than with - due to the shared banks, i.e. you might get duplicated - error logs. - mce=dont_log_ce - Don't make logs for corrected errors. All events reported - as corrected are silently cleared by OS. - This option will be useful if you have no interest in any - of corrected errors. - mce=ignore_ce - Disable features for corrected errors, e.g. polling timer - and CMCI. All events reported as corrected are not cleared - by OS and remained in its error banks. - Usually this disablement is not recommended, however if - there is an agent checking/clearing corrected errors - (e.g. BIOS or hardware monitoring applications), conflicting - with OS's error handling, and you cannot deactivate the agent, - then this option will be a help. - mce=no_lmce - Do not opt-in to Local MCE delivery. Use legacy method - to broadcast MCEs. - mce=bootlog - Enable logging of machine checks left over from booting. - Disabled by default on AMD Fam10h and older because some BIOS - leave bogus ones. - If your BIOS doesn't do that it's a good idea to enable though - to make sure you log even machine check events that result - in a reboot. On Intel systems it is enabled by default. - mce=nobootlog - Disable boot machine check logging. - mce=monarchtimeout (number) - monarchtimeout: - Sets the time in us to wait for other CPUs on machine checks. 0 - to disable. - mce=bios_cmci_threshold - Don't overwrite the bios-set CMCI threshold. This boot option - prevents Linux from overwriting the CMCI threshold set by the - bios. Without this option, Linux always sets the CMCI - threshold to 1. Enabling this may make memory predictive failure - analysis less effective if the bios sets thresholds for memory - errors since we will not see details for all errors. - mce=recovery - Force-enable recoverable machine check code paths - - nomce (for compatibility with i386) - same as mce=off - - Everything else is in sysfs now. - -APICs -===== - - apic - Use IO-APIC. Default - - noapic - Don't use the IO-APIC. - - disableapic - Don't use the local APIC - - nolapic - Don't use the local APIC (alias for i386 compatibility) - - pirq=... - See Documentation/arch/x86/i386/IO-APIC.rst - - noapictimer - Don't set up the APIC timer - - no_timer_check - Don't check the IO-APIC timer. This can work around - problems with incorrect timer initialization on some boards. - - apicpmtimer - Do APIC timer calibration using the pmtimer. Implies - apicmaintimer. Useful when your PIT timer is totally broken. - -Timing -====== - - notsc - Deprecated, use tsc=unstable instead. - - nohpet - Don't use the HPET timer. - -Idle loop -========= - - idle=poll - Don't do power saving in the idle loop using HLT, but poll for rescheduling - event. This will make the CPUs eat a lot more power, but may be useful - to get slightly better performance in multiprocessor benchmarks. It also - makes some profiling using performance counters more accurate. - Please note that on systems with MONITOR/MWAIT support (like Intel EM64T - CPUs) this option has no performance advantage over the normal idle loop. - It may also interact badly with hyperthreading. - -Rebooting -========= - - reboot=b[ios] | t[riple] | k[bd] | a[cpi] | e[fi] | p[ci] [, [w]arm | [c]old] - bios - Use the CPU reboot vector for warm reset - warm - Don't set the cold reboot flag - cold - Set the cold reboot flag - triple - Force a triple fault (init) - kbd - Use the keyboard controller. cold reset (default) - acpi - Use the ACPI RESET_REG in the FADT. If ACPI is not configured or - the ACPI reset does not work, the reboot path attempts the reset - using the keyboard controller. - efi - Use efi reset_system runtime service. If EFI is not configured or - the EFI reset does not work, the reboot path attempts the reset using - the keyboard controller. - pci - Use a write to the PCI config space register 0xcf9 to trigger reboot. - - Using warm reset will be much faster especially on big memory - systems because the BIOS will not go through the memory check. - Disadvantage is that not all hardware will be completely reinitialized - on reboot so there may be boot problems on some systems. - - reboot=force - Don't stop other CPUs on reboot. This can make reboot more reliable - in some cases. - - reboot=default - There are some built-in platform specific "quirks" - you may see: - "reboot: <name> series board detected. Selecting <type> for reboots." - In the case where you think the quirk is in error (e.g. you have - newer BIOS, or newer board) using this option will ignore the built-in - quirk table, and use the generic default reboot actions. - -NUMA -==== - - numa=off - Only set up a single NUMA node spanning all memory. - - numa=noacpi - Don't parse the SRAT table for NUMA setup - - numa=nohmat - Don't parse the HMAT table for NUMA setup, or soft-reserved memory - partitioning. - -ACPI -==== - - acpi=off - Don't enable ACPI - acpi=ht - Use ACPI boot table parsing, but don't enable ACPI interpreter - acpi=force - Force ACPI on (currently not needed) - acpi=strict - Disable out of spec ACPI workarounds. - acpi_sci={edge,level,high,low} - Set up ACPI SCI interrupt. - acpi=noirq - Don't route interrupts - acpi=nocmcff - Disable firmware first mode for corrected errors. This - disables parsing the HEST CMC error source to check if - firmware has set the FF flag. This may result in - duplicate corrected error reports. - -PCI -=== - - pci=off - Don't use PCI - pci=conf1 - Use conf1 access. - pci=conf2 - Use conf2 access. - pci=rom - Assign ROMs. - pci=assign-busses - Assign busses - pci=irqmask=MASK - Set PCI interrupt mask to MASK - pci=lastbus=NUMBER - Scan up to NUMBER busses, no matter what the mptable says. - pci=noacpi - Don't use ACPI to set up PCI interrupt routing. - -IOMMU (input/output memory management unit) -=========================================== -Multiple x86-64 PCI-DMA mapping implementations exist, for example: - - 1. <kernel/dma/direct.c>: use no hardware/software IOMMU at all - (e.g. because you have < 3 GB memory). - Kernel boot message: "PCI-DMA: Disabling IOMMU" - - 2. <arch/x86/kernel/amd_gart_64.c>: AMD GART based hardware IOMMU. - Kernel boot message: "PCI-DMA: using GART IOMMU" - - 3. <arch/x86_64/kernel/pci-swiotlb.c> : Software IOMMU implementation. Used - e.g. if there is no hardware IOMMU in the system and it is need because - you have >3GB memory or told the kernel to us it (iommu=soft)) - Kernel boot message: "PCI-DMA: Using software bounce buffering - for IO (SWIOTLB)" - -:: - - iommu=[<size>][,noagp][,off][,force][,noforce] - [,memaper[=<order>]][,merge][,fullflush][,nomerge] - [,noaperture] - -General iommu options: - - off - Don't initialize and use any kind of IOMMU. - noforce - Don't force hardware IOMMU usage when it is not needed. (default). - force - Force the use of the hardware IOMMU even when it is - not actually needed (e.g. because < 3 GB memory). - soft - Use software bounce buffering (SWIOTLB) (default for - Intel machines). This can be used to prevent the usage - of an available hardware IOMMU. - -iommu options only relevant to the AMD GART hardware IOMMU: - - <size> - Set the size of the remapping area in bytes. - allowed - Overwrite iommu off workarounds for specific chipsets. - fullflush - Flush IOMMU on each allocation (default). - nofullflush - Don't use IOMMU fullflush. - memaper[=<order>] - Allocate an own aperture over RAM with size 32MB<<order. - (default: order=1, i.e. 64MB) - merge - Do scatter-gather (SG) merging. Implies "force" (experimental). - nomerge - Don't do scatter-gather (SG) merging. - noaperture - Ask the IOMMU not to touch the aperture for AGP. - noagp - Don't initialize the AGP driver and use full aperture. - panic - Always panic when IOMMU overflows. - -iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU -implementation: - - swiotlb=<slots>[,force,noforce] - <slots> - Prereserve that many 2K slots for the software IO bounce buffering. - force - Force all IO through the software TLB. - noforce - Do not initialize the software TLB. - - -Miscellaneous -============= - - nogbpages - Do not use GB pages for kernel direct mappings. - gbpages - Use GB pages for kernel direct mappings. - - -AMD SEV (Secure Encrypted Virtualization) -========================================= -Options relating to AMD SEV, specified via the following format: - -:: - - sev=option1[,option2] - -The available options are: - - debug - Enable debug messages. - - nosnp - Do not enable SEV-SNP (applies to host/hypervisor only). Setting - 'nosnp' avoids the RMP check overhead in memory accesses when - users do not want to run SEV-SNP guests. diff --git a/Documentation/arch/x86/x86_64/fake-numa-for-cpusets.rst b/Documentation/arch/x86/x86_64/fake-numa-for-cpusets.rst index ba74617d4999..970ee94eb551 100644 --- a/Documentation/arch/x86/x86_64/fake-numa-for-cpusets.rst +++ b/Documentation/arch/x86/x86_64/fake-numa-for-cpusets.rst @@ -18,7 +18,7 @@ For more information on the features of cpusets, see Documentation/admin-guide/cgroup-v1/cpusets.rst. There are a number of different configurations you can use for your needs. For more information on the numa=fake command line option and its various ways of -configuring fake nodes, see Documentation/arch/x86/x86_64/boot-options.rst. +configuring fake nodes, see Documentation/admin-guide/kernel-parameters.txt For the purposes of this introduction, we'll assume a very primitive NUMA emulation setup of "numa=fake=4*512,". This will split our system memory into diff --git a/Documentation/arch/x86/x86_64/index.rst b/Documentation/arch/x86/x86_64/index.rst index ad15e9bd623f..a0261957a08a 100644 --- a/Documentation/arch/x86/x86_64/index.rst +++ b/Documentation/arch/x86/x86_64/index.rst @@ -7,7 +7,6 @@ x86_64 Support .. toctree:: :maxdepth: 2 - boot-options uefi mm 5level-paging diff --git a/Documentation/arch/x86/x86_64/uefi.rst b/Documentation/arch/x86/x86_64/uefi.rst index fbc30c9a071d..e84592dbd6c1 100644 --- a/Documentation/arch/x86/x86_64/uefi.rst +++ b/Documentation/arch/x86/x86_64/uefi.rst @@ -12,14 +12,20 @@ with EFI firmware and specifications are listed below. 1. UEFI specification: http://www.uefi.org -2. Booting Linux kernel on UEFI x86_64 platform requires bootloader - support. Elilo with x86_64 support can be used. +2. Booting Linux kernel on UEFI x86_64 platform can either be + done using the <Documentation/admin-guide/efi-stub.rst> or using a + separate bootloader. 3. x86_64 platform with EFI/UEFI firmware. Mechanics --------- +Refer to <Documentation/admin-guide/efi-stub.rst> to learn how to use the EFI stub. + +Below are general EFI setup guidelines on the x86_64 platform, +regardless of whether you use the EFI stub or a separate bootloader. + - Build the kernel with the following configuration:: CONFIG_FB_EFI=y @@ -31,16 +37,27 @@ Mechanics CONFIG_EFI=y CONFIG_EFIVAR_FS=y or m # optional -- Create a VFAT partition on the disk -- Copy the following to the VFAT partition: +- Create a VFAT partition on the disk with the EFI System flag + You can do this with fdisk with the following commands: + + 1. g - initialize a GPT partition table + 2. n - create a new partition + 3. t - change the partition type to "EFI System" (number 1) + 4. w - write and save the changes + + Afterwards, initialize the VFAT filesystem by running mkfs:: + + mkfs.fat /dev/<your-partition> + +- Copy the boot files to the VFAT partition: + If you use the EFI stub method, the kernel acts also as an EFI executable. + + You can just copy the bzImage to the EFI/boot/bootx64.efi path on the partition + so that it will automatically get booted, see the <Documentation/admin-guide/efi-stub.rst> page + for additional instructions regarding passage of kernel parameters and initramfs. - elilo bootloader with x86_64 support, elilo configuration file, - kernel image built in first step and corresponding - initrd. Instructions on building elilo and its dependencies - can be found in the elilo sourceforge project. + If you use a custom bootloader, refer to the relevant documentation for help on this part. -- Boot to EFI shell and invoke elilo choosing the kernel image built - in first step. - If some or all EFI runtime services don't work, you can try following kernel command line parameters to turn off some or all EFI runtime services. |