summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/stable/sysfs-kernel-notes5
-rw-r--r--Documentation/PCI/index.rst1
-rw-r--r--Documentation/PCI/sysfs-pci.rst (renamed from Documentation/filesystems/sysfs-pci.rst)0
-rw-r--r--Documentation/admin-guide/README.rst6
-rw-r--r--Documentation/admin-guide/bcache.rst31
-rw-r--r--Documentation/admin-guide/blockdev/ramdisk.rst66
-rw-r--r--Documentation/admin-guide/cgroup-v1/cpusets.rst2
-rw-r--r--Documentation/admin-guide/kdump/kdump.rst7
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt40
-rw-r--r--Documentation/admin-guide/perf/arm-cmn.rst65
-rw-r--r--Documentation/admin-guide/perf/index.rst1
-rw-r--r--Documentation/admin-guide/pm/cpuidle.rst2
-rw-r--r--Documentation/admin-guide/svga.rst7
-rw-r--r--Documentation/admin-guide/sysctl/abi.rst73
-rw-r--r--Documentation/admin-guide/tainted-kernels.rst2
-rw-r--r--Documentation/arm/sunxi.rst2
-rw-r--r--Documentation/arm/uefi.rst2
-rw-r--r--Documentation/arm64/amu.rst2
-rw-r--r--Documentation/arm64/cpu-feature-registers.rst2
-rw-r--r--Documentation/arm64/elf_hwcaps.rst4
-rw-r--r--Documentation/arm64/index.rst3
-rw-r--r--Documentation/arm64/memory-tagging-extension.rst305
-rw-r--r--Documentation/conf.py15
-rw-r--r--Documentation/core-api/cpu_hotplug.rst2
-rw-r--r--Documentation/crypto/userspace-if.rst20
-rw-r--r--Documentation/devicetree/bindings/arm/bcm/raspberrypi,bcm2835-firmware.yaml4
-rw-r--r--Documentation/devicetree/bindings/clock/imx8qxp-lpcg.yaml2
-rw-r--r--Documentation/devicetree/bindings/crypto/ti,sa2ul.yaml2
-rw-r--r--Documentation/devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.yaml8
-rw-r--r--Documentation/devicetree/bindings/dma/xilinx/xlnx,zynqmp-dpdma.yaml2
-rw-r--r--Documentation/devicetree/bindings/edac/amazon,al-mc-edac.yaml67
-rw-r--r--Documentation/devicetree/bindings/gpio/sgpio-aspeed.txt5
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/actions,owl-sirq.yaml65
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/mstar,mst-intc.yaml64
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/snps,dw-apb-ictl.txt14
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/ti,pruss-intc.yaml158
-rw-r--r--Documentation/devicetree/bindings/leds/cznic,turris-omnia-leds.yaml2
-rw-r--r--Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.yaml37
-rw-r--r--Documentation/devicetree/bindings/mmc/microchip,dw-sparx5-sdhci.yaml65
-rw-r--r--Documentation/devicetree/bindings/mmc/mmc-controller.yaml4
-rw-r--r--Documentation/devicetree/bindings/mmc/mmc-pwrseq-simple.yaml2
-rw-r--r--Documentation/devicetree/bindings/mmc/owl-mmc.yaml6
-rw-r--r--Documentation/devicetree/bindings/mmc/renesas,sdhi.yaml1
-rw-r--r--Documentation/devicetree/bindings/mmc/sdhci-am654.txt61
-rw-r--r--Documentation/devicetree/bindings/mmc/sdhci-am654.yaml218
-rw-r--r--Documentation/devicetree/bindings/net/renesas,ravb.txt1
-rw-r--r--Documentation/devicetree/bindings/perf/arm,cmn.yaml57
-rw-r--r--Documentation/devicetree/bindings/rng/ingenic,trng.yaml43
-rw-r--r--Documentation/devicetree/bindings/rng/xiphera,xip8001b-trng.yaml33
-rw-r--r--Documentation/devicetree/bindings/timer/renesas,cmt.yaml4
-rw-r--r--Documentation/devicetree/bindings/trivial-devices.yaml2
-rw-r--r--Documentation/devicetree/bindings/vendor-prefixes.yaml2
-rw-r--r--Documentation/doc-guide/kernel-doc.rst33
-rw-r--r--Documentation/doc-guide/sphinx.rst17
-rw-r--r--Documentation/driver-api/dma-buf.rst2
-rw-r--r--Documentation/driver-api/gpio/driver.rst12
-rw-r--r--Documentation/driver-api/nvdimm/index.rst1
-rw-r--r--Documentation/driver-api/soundwire/stream.rst8
-rw-r--r--Documentation/fb/fbcon.rst21
-rw-r--r--Documentation/fb/matroxfb.rst2
-rw-r--r--Documentation/fb/sstfb.rst3
-rw-r--r--Documentation/fb/vesafb.rst2
-rw-r--r--Documentation/filesystems/index.rst2
-rw-r--r--Documentation/filesystems/mount_api.rst7
-rw-r--r--Documentation/filesystems/seq_file.rst20
-rw-r--r--Documentation/filesystems/sysfs.rst3
-rw-r--r--Documentation/filesystems/ubifs-authentication.rst6
-rw-r--r--Documentation/firmware-guide/acpi/index.rst1
-rw-r--r--Documentation/hwmon/index.rst1
-rw-r--r--Documentation/ia64/index.rst1
-rw-r--r--Documentation/ia64/xen.rst206
-rw-r--r--Documentation/iio/iio_configfs.rst2
-rw-r--r--Documentation/kbuild/llvm.rst4
-rw-r--r--Documentation/locking/lockdep-design.rst258
-rw-r--r--Documentation/locking/seqlock.rst18
-rw-r--r--Documentation/maintainer/index.rst1
-rw-r--r--Documentation/maintainer/modifying-patches.rst50
-rw-r--r--Documentation/memory-barriers.txt8
-rw-r--r--Documentation/networking/index.rst1
-rw-r--r--Documentation/networking/sysfs-tagging.rst (renamed from Documentation/filesystems/sysfs-tagging.rst)0
-rw-r--r--Documentation/process/2.Process.rst2
-rw-r--r--Documentation/process/changes.rst15
-rw-r--r--Documentation/process/deprecated.rst24
-rw-r--r--Documentation/process/email-clients.rst5
-rw-r--r--Documentation/process/programming-language.rst9
-rw-r--r--Documentation/process/submit-checklist.rst4
-rw-r--r--Documentation/process/submitting-drivers.rst9
-rw-r--r--Documentation/process/submitting-patches.rst280
-rw-r--r--Documentation/scheduler/sched-capacity.rst2
-rw-r--r--Documentation/scheduler/sched-energy.rst2
-rw-r--r--Documentation/security/credentials.rst1
-rw-r--r--Documentation/security/keys/trusted-encrypted.rst5
-rw-r--r--Documentation/sphinx/automarkup.py137
-rw-r--r--Documentation/trace/kprobetrace.rst2
-rw-r--r--Documentation/trace/ring-buffer-design.rst26
-rw-r--r--Documentation/translations/ko_KR/howto.rst9
-rw-r--r--Documentation/translations/ko_KR/memory-barriers.txt32
-rw-r--r--Documentation/translations/zh_CN/arm64/amu.rst100
-rw-r--r--Documentation/translations/zh_CN/arm64/index.rst16
-rw-r--r--Documentation/translations/zh_CN/filesystems/sysfs.txt3
-rw-r--r--Documentation/translations/zh_CN/index.rst1
-rw-r--r--Documentation/virt/index.rst2
-rw-r--r--Documentation/virt/kvm/amd-memory-encryption.rst6
-rw-r--r--Documentation/virt/kvm/api.rst4
-rw-r--r--Documentation/virt/kvm/arm/hyp-abi.rst6
-rw-r--r--Documentation/virt/kvm/cpuid.rst2
-rw-r--r--Documentation/virt/uml/user_mode_linux.rst4403
-rw-r--r--Documentation/virt/uml/user_mode_linux_howto_v2.rst1208
-rw-r--r--Documentation/vm/hmm.rst139
-rw-r--r--Documentation/vm/index.rst1
-rw-r--r--Documentation/vm/page_migration.rst164
-rw-r--r--Documentation/watch_queue.rst14
-rw-r--r--Documentation/x86/boot.rst6
-rw-r--r--Documentation/x86/cpuinfo.rst155
-rw-r--r--Documentation/x86/index.rst2
-rw-r--r--Documentation/x86/resctrl_ui.rst18
-rw-r--r--Documentation/x86/sva.rst257
117 files changed, 4002 insertions, 5353 deletions
diff --git a/Documentation/ABI/stable/sysfs-kernel-notes b/Documentation/ABI/stable/sysfs-kernel-notes
new file mode 100644
index 000000000000..2c76ee9e67f7
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-kernel-notes
@@ -0,0 +1,5 @@
+What: /sys/kernel/notes
+Date: July 2009
+Contact: <linux-kernel@vger.kernel.org>
+Description: The /sys/kernel/notes file contains the binary representation
+ of the running vmlinux's .notes section.
diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst
index 8f66feaafd4f..c17c87af1968 100644
--- a/Documentation/PCI/index.rst
+++ b/Documentation/PCI/index.rst
@@ -12,6 +12,7 @@ Linux PCI Bus Subsystem
pciebus-howto
pci-iov-howto
msi-howto
+ sysfs-pci
acpi-info
pci-error-recovery
pcieaer-howto
diff --git a/Documentation/filesystems/sysfs-pci.rst b/Documentation/PCI/sysfs-pci.rst
index 742fbd21dc1f..742fbd21dc1f 100644
--- a/Documentation/filesystems/sysfs-pci.rst
+++ b/Documentation/PCI/sysfs-pci.rst
diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst
index 5aad534233cd..95a28f47ac30 100644
--- a/Documentation/admin-guide/README.rst
+++ b/Documentation/admin-guide/README.rst
@@ -322,9 +322,9 @@ Compiling the kernel
reboot, and enjoy!
If you ever need to change the default root device, video mode,
- ramdisk size, etc. in the kernel image, use the ``rdev`` program (or
- alternatively the LILO boot options when appropriate). No need to
- recompile the kernel to change these parameters.
+ etc. in the kernel image, use your bootloader's boot options
+ where appropriate. No need to recompile the kernel to change
+ these parameters.
- Reboot with the new kernel and enjoy.
diff --git a/Documentation/admin-guide/bcache.rst b/Documentation/admin-guide/bcache.rst
index 1eccf952876d..8d3a2d045c0a 100644
--- a/Documentation/admin-guide/bcache.rst
+++ b/Documentation/admin-guide/bcache.rst
@@ -5,11 +5,14 @@ A block layer cache (bcache)
Say you've got a big slow raid 6, and an ssd or three. Wouldn't it be
nice if you could use them as cache... Hence bcache.
-Wiki and git repositories are at:
+The bcache wiki can be found at:
+ https://bcache.evilpiepirate.org
- - https://bcache.evilpiepirate.org
- - http://evilpiepirate.org/git/linux-bcache.git
- - https://evilpiepirate.org/git/bcache-tools.git
+This is the git repository of bcache-tools:
+ https://git.kernel.org/pub/scm/linux/kernel/git/colyli/bcache-tools.git/
+
+The latest bcache kernel code can be found from mainline Linux kernel:
+ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
It's designed around the performance characteristics of SSDs - it only allocates
in erase block sized buckets, and it uses a hybrid btree/log to track cached
@@ -41,17 +44,21 @@ in the cache it first disables writeback caching and waits for all dirty data
to be flushed.
Getting started:
-You'll need make-bcache from the bcache-tools repository. Both the cache device
+You'll need bcache util from the bcache-tools repository. Both the cache device
and backing device must be formatted before use::
- make-bcache -B /dev/sdb
- make-bcache -C /dev/sdc
+ bcache make -B /dev/sdb
+ bcache make -C /dev/sdc
-make-bcache has the ability to format multiple devices at the same time - if
+`bcache make` has the ability to format multiple devices at the same time - if
you format your backing devices and cache device at the same time, you won't
have to manually attach::
- make-bcache -B /dev/sda /dev/sdb -C /dev/sdc
+ bcache make -B /dev/sda /dev/sdb -C /dev/sdc
+
+If your bcache-tools is not updated to latest version and does not have the
+unified `bcache` utility, you may use the legacy `make-bcache` utility to format
+bcache device with same -B and -C parameters.
bcache-tools now ships udev rules, and bcache devices are known to the kernel
immediately. Without udev, you can manually register devices like this::
@@ -188,7 +195,7 @@ D) Recovering data without bcache:
If bcache is not available in the kernel, a filesystem on the backing
device is still available at an 8KiB offset. So either via a loopdev
of the backing device created with --offset 8K, or any value defined by
---data-offset when you originally formatted bcache with `make-bcache`.
+--data-offset when you originally formatted bcache with `bcache make`.
For example::
@@ -210,7 +217,7 @@ E) Wiping a cache device
After you boot back with bcache enabled, you recreate the cache and attach it::
- host:~# make-bcache -C /dev/sdh2
+ host:~# bcache make -C /dev/sdh2
UUID: 7be7e175-8f4c-4f99-94b2-9c904d227045
Set UUID: 5bc072a8-ab17-446d-9744-e247949913c1
version: 0
@@ -318,7 +325,7 @@ want for getting the best possible numbers when benchmarking.
The default metadata size in bcache is 8k. If your backing device is
RAID based, then be sure to align this by a multiple of your stride
- width using `make-bcache --data-offset`. If you intend to expand your
+ width using `bcache make --data-offset`. If you intend to expand your
disk array in the future, then multiply a series of primes by your
raid stripe size to get the disk multiples that you would like.
diff --git a/Documentation/admin-guide/blockdev/ramdisk.rst b/Documentation/admin-guide/blockdev/ramdisk.rst
index b7c2268f8dec..9ce6101e8dd9 100644
--- a/Documentation/admin-guide/blockdev/ramdisk.rst
+++ b/Documentation/admin-guide/blockdev/ramdisk.rst
@@ -6,7 +6,7 @@ Using the RAM disk block device with Linux
1) Overview
2) Kernel Command Line Parameters
- 3) Using "rdev -r"
+ 3) Using "rdev"
4) An Example of Creating a Compressed RAM Disk
@@ -59,51 +59,27 @@ default is 4096 (4 MB).
rd_size
See ramdisk_size.
-3) Using "rdev -r"
-------------------
+3) Using "rdev"
+---------------
-The usage of the word (two bytes) that "rdev -r" sets in the kernel image is
-as follows. The low 11 bits (0 -> 10) specify an offset (in 1 k blocks) of up
-to 2 MB (2^11) of where to find the RAM disk (this used to be the size). Bit
-14 indicates that a RAM disk is to be loaded, and bit 15 indicates whether a
-prompt/wait sequence is to be given before trying to read the RAM disk. Since
-the RAM disk dynamically grows as data is being written into it, a size field
-is not required. Bits 11 to 13 are not currently used and may as well be zero.
-These numbers are no magical secrets, as seen below::
+"rdev" is an obsolete, deprecated, antiquated utility that could be used
+to set the boot device in a Linux kernel image.
- ./arch/x86/kernel/setup.c:#define RAMDISK_IMAGE_START_MASK 0x07FF
- ./arch/x86/kernel/setup.c:#define RAMDISK_PROMPT_FLAG 0x8000
- ./arch/x86/kernel/setup.c:#define RAMDISK_LOAD_FLAG 0x4000
+Instead of using rdev, just place the boot device information on the
+kernel command line and pass it to the kernel from the bootloader.
-Consider a typical two floppy disk setup, where you will have the
-kernel on disk one, and have already put a RAM disk image onto disk #2.
+You can also pass arguments to the kernel by setting FDARGS in
+arch/x86/boot/Makefile and specify in initrd image by setting FDINITRD in
+arch/x86/boot/Makefile.
-Hence you want to set bits 0 to 13 as 0, meaning that your RAM disk
-starts at an offset of 0 kB from the beginning of the floppy.
-The command line equivalent is: "ramdisk_start=0"
+Some of the kernel command line boot options that may apply here are::
-You want bit 14 as one, indicating that a RAM disk is to be loaded.
-The command line equivalent is: "load_ramdisk=1"
-
-You want bit 15 as one, indicating that you want a prompt/keypress
-sequence so that you have a chance to switch floppy disks.
-The command line equivalent is: "prompt_ramdisk=1"
-
-Putting that together gives 2^15 + 2^14 + 0 = 49152 for an rdev word.
-So to create disk one of the set, you would do::
-
- /usr/src/linux# cat arch/x86/boot/zImage > /dev/fd0
- /usr/src/linux# rdev /dev/fd0 /dev/fd0
- /usr/src/linux# rdev -r /dev/fd0 49152
+ ramdisk_start=N
+ ramdisk_size=M
If you make a boot disk that has LILO, then for the above, you would use::
- append = "ramdisk_start=0 load_ramdisk=1 prompt_ramdisk=1"
-
-Since the default start = 0 and the default prompt = 1, you could use::
-
- append = "load_ramdisk=1"
-
+ append = "ramdisk_start=N ramdisk_size=M"
4) An Example of Creating a Compressed RAM Disk
-----------------------------------------------
@@ -151,12 +127,9 @@ f) Put the RAM disk image onto the floppy, after the kernel. Use an offset
dd if=/tmp/ram_image.gz of=/dev/fd0 bs=1k seek=400
-g) Use "rdev" to set the boot device, RAM disk offset, prompt flag, etc.
- For prompt_ramdisk=1, load_ramdisk=1, ramdisk_start=400, one would
- have 2^15 + 2^14 + 400 = 49552::
-
- rdev /dev/fd0 /dev/fd0
- rdev -r /dev/fd0 49552
+g) Make sure that you have already specified the boot information in
+ FDARGS and FDINITRD or that you use a bootloader to pass kernel
+ command line boot options to the kernel.
That is it. You now have your boot/root compressed RAM disk floppy. Some
users may wish to combine steps (d) and (f) by using a pipe.
@@ -167,11 +140,14 @@ users may wish to combine steps (d) and (f) by using a pipe.
Changelog:
----------
+SEPT-2020 :
+
+ Removed usage of "rdev"
+
10-22-04 :
Updated to reflect changes in command line options, remove
obsolete references, general cleanup.
James Nelson (james4765@gmail.com)
-
12-95 :
Original Document
diff --git a/Documentation/admin-guide/cgroup-v1/cpusets.rst b/Documentation/admin-guide/cgroup-v1/cpusets.rst
index 7ade3abd342a..5d844ed4df69 100644
--- a/Documentation/admin-guide/cgroup-v1/cpusets.rst
+++ b/Documentation/admin-guide/cgroup-v1/cpusets.rst
@@ -1,3 +1,5 @@
+.. _cpusets:
+
=======
CPUSETS
=======
diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
index 2da65fef2a1c..75a9dd98e76e 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -509,9 +509,12 @@ ELF32-format headers using the --elf32-core-headers kernel option on the
dump kernel.
You can also use the Crash utility to analyze dump files in Kdump
-format. Crash is available on Dave Anderson's site at the following URL:
+format. Crash is available at the following URL:
- http://people.redhat.com/~anderson/
+ https://github.com/crash-utility/crash
+
+Crash document can be found at:
+ https://crash-utility.github.io/
Trigger Kdump on WARN()
=======================
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a1068742a6df..0fa47ddf4c46 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -577,7 +577,7 @@
loops can be debugged more effectively on production
systems.
- clearcpuid=BITNUM [X86]
+ clearcpuid=BITNUM[,BITNUM...] [X86]
Disable CPUID feature X for the kernel. See
arch/x86/include/asm/cpufeatures.h for the valid bit
numbers. Note the Linux specific bits are not necessarily
@@ -591,7 +591,7 @@
some critical bits.
cma=nn[MG]@[start[MG][-end[MG]]]
- [ARM,X86,KNL]
+ [KNL,CMA]
Sets the size of kernel global memory area for
contiguous memory allocations and optionally the
placement constraint by the physical address range of
@@ -940,7 +940,7 @@
Arch Perfmon v4 (Skylake and newer).
disable_ddw [PPC/PSERIES]
- Disable Dynamic DMA Window support. Use this if
+ Disable Dynamic DMA Window support. Use this
to workaround buggy firmware.
disable_ipv6= [IPV6]
@@ -1019,7 +1019,7 @@
what data is available or for reverse-engineering.
dyndbg[="val"] [KNL,DYNAMIC_DEBUG]
- module.dyndbg[="val"]
+ <module>.dyndbg[="val"]
Enable debug messages at boot time. See
Documentation/admin-guide/dynamic-debug-howto.rst
for details.
@@ -1027,7 +1027,7 @@
nopku [X86] Disable Memory Protection Keys CPU feature found
in some Intel CPUs.
- module.async_probe [KNL]
+ <module>.async_probe [KNL]
Enable asynchronous probe on this module.
early_ioremap_debug [KNL]
@@ -1956,7 +1956,7 @@
1 - Bypass the IOMMU for DMA.
unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
- io7= [HW] IO7 for Marvel based alpha systems
+ io7= [HW] IO7 for Marvel-based Alpha systems
See comment before marvel_specify_io7 in
arch/alpha/kernel/core_marvel.c.
@@ -2177,7 +2177,7 @@
kgdbwait [KGDB] Stop kernel execution and enter the
kernel debugger at the earliest opportunity.
- kmac= [MIPS] korina ethernet MAC address.
+ kmac= [MIPS] Korina ethernet MAC address.
Configure the RouterBoard 532 series on-chip
Ethernet adapter MAC address.
@@ -2258,6 +2258,14 @@
[KVM,ARM] Allow use of GICv4 for direct injection of
LPIs.
+ kvm_cma_resv_ratio=n [PPC]
+ Reserves given percentage from system memory area for
+ contiguous memory allocation for KVM hash pagetable
+ allocation.
+ By default it reserves 5% of total system memory.
+ Format: <integer>
+ Default: 5
+
kvm-intel.ept= [KVM,Intel] Disable extended page tables
(virtualized MMU) support on capable Intel chips.
Default is 1 (enabled)
@@ -2367,9 +2375,10 @@
lapic [X86-32,APIC] Enable the local APIC even if BIOS
disabled it.
- lapic= [X86,APIC] "notscdeadline" Do not use TSC deadline
+ lapic= [X86,APIC] Do not use TSC deadline
value for LAPIC timer one-shot implementation. Default
back to the programmable timer unit in the LAPIC.
+ Format: notscdeadline
lapic_timer_c2_ok [X86,APIC] trust the local apic timer
in C2 power state.
@@ -2441,8 +2450,7 @@
memblock=debug [KNL] Enable memblock debug messages.
- load_ramdisk= [RAM] List of ramdisks to load from floppy
- See Documentation/admin-guide/blockdev/ramdisk.rst.
+ load_ramdisk= [RAM] [Deprecated]
lockd.nlm_grace_period=P [NFS] Assign grace period.
Format: <integer>
@@ -2579,8 +2587,8 @@
(machvec) in a generic kernel.
Example: machvec=hpzx1
- machtype= [Loongson] Share the same kernel image file between different
- yeeloong laptop.
+ machtype= [Loongson] Share the same kernel image file between
+ different yeeloong laptops.
Example: machtype=lemote-yeeloong-2f-7inch
max_addr=nn[KMG] [KNL,BOOT,ia64] All physical memory greater
@@ -3185,7 +3193,7 @@
register save and restore. The kernel will only save
legacy floating-point registers on task switch.
- nohugeiomap [KNL,X86,PPC] Disable kernel huge I/O mappings.
+ nohugeiomap [KNL,X86,PPC,ARM64] Disable kernel huge I/O mappings.
nosmt [KNL,S390] Disable symmetric multithreading (SMT).
Equivalent to smt=1.
@@ -3921,9 +3929,7 @@
Param: <number> - step/bucket size as a power of 2 for
statistical time based profiling.
- prompt_ramdisk= [RAM] List of RAM disks to prompt for floppy disk
- before loading.
- See Documentation/admin-guide/blockdev/ramdisk.rst.
+ prompt_ramdisk= [RAM] [Deprecated]
prot_virt= [S390] enable hosting protected virtual machines
isolated from the hypervisor (if hardware supports
@@ -3981,6 +3987,8 @@
ramdisk_size= [RAM] Sizes of RAM disks in kilobytes
See Documentation/admin-guide/blockdev/ramdisk.rst.
+ ramdisk_start= [RAM] RAM disk image start address
+
random.trust_cpu={on,off}
[KNL] Enable or disable trusting the use of the
CPU's random number generator (if available) to
diff --git a/Documentation/admin-guide/perf/arm-cmn.rst b/Documentation/admin-guide/perf/arm-cmn.rst
new file mode 100644
index 000000000000..0e4809346014
--- /dev/null
+++ b/Documentation/admin-guide/perf/arm-cmn.rst
@@ -0,0 +1,65 @@
+=============================
+Arm Coherent Mesh Network PMU
+=============================
+
+CMN-600 is a configurable mesh interconnect consisting of a rectangular
+grid of crosspoints (XPs), with each crosspoint supporting up to two
+device ports to which various AMBA CHI agents are attached.
+
+CMN implements a distributed PMU design as part of its debug and trace
+functionality. This consists of a local monitor (DTM) at every XP, which
+counts up to 4 event signals from the connected device nodes and/or the
+XP itself. Overflow from these local counters is accumulated in up to 8
+global counters implemented by the main controller (DTC), which provides
+overall PMU control and interrupts for global counter overflow.
+
+PMU events
+----------
+
+The PMU driver registers a single PMU device for the whole interconnect,
+see /sys/bus/event_source/devices/arm_cmn. Multi-chip systems may link
+more than one CMN together via external CCIX links - in this situation,
+each mesh counts its own events entirely independently, and additional
+PMU devices will be named arm_cmn_{1..n}.
+
+Most events are specified in a format based directly on the TRM
+definitions - "type" selects the respective node type, and "eventid" the
+event number. Some events require an additional occupancy ID, which is
+specified by "occupid".
+
+* Since RN-D nodes do not have any distinct events from RN-I nodes, they
+ are treated as the same type (0xa), and the common event templates are
+ named "rnid_*".
+
+* The cycle counter is treated as a synthetic event belonging to the DTC
+ node ("type" == 0x3, "eventid" is ignored).
+
+* XP events also encode the port and channel in the "eventid" field, to
+ match the underlying pmu_event0_id encoding for the pmu_event_sel
+ register. The event templates are named with prefixes to cover all
+ permutations.
+
+By default each event provides an aggregate count over all nodes of the
+given type. To target a specific node, "bynodeid" must be set to 1 and
+"nodeid" to the appropriate value derived from the CMN configuration
+(as defined in the "Node ID Mapping" section of the TRM).
+
+Watchpoints
+-----------
+
+The PMU can also count watchpoint events to monitor specific flit
+traffic. Watchpoints are treated as a synthetic event type, and like PMU
+events can be global or targeted with a particular XP's "nodeid" value.
+Since the watchpoint direction is otherwise implicit in the underlying
+register selection, separate events are provided for flit uploads and
+downloads.
+
+The flit match value and mask are passed in config1 and config2 ("val"
+and "mask" respectively). "wp_dev_sel", "wp_chn_sel", "wp_grp" and
+"wp_exclusive" are specified per the TRM definitions for dtm_wp_config0.
+Where a watchpoint needs to match fields from both match groups on the
+REQ or SNP channel, it can be specified as two events - one for each
+group - with the same nonzero "combine" value. The count for such a
+pair of combined events will be attributed to the primary match.
+Watchpoint events with a "combine" value of 0 are considered independent
+and will count individually.
diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
index 47c99f40cc16..5a8f2529a033 100644
--- a/Documentation/admin-guide/perf/index.rst
+++ b/Documentation/admin-guide/perf/index.rst
@@ -12,6 +12,7 @@ Performance monitor support
qcom_l2_pmu
qcom_l3_pmu
arm-ccn
+ arm-cmn
xgene-pmu
arm_dsu_pmu
thunderx2-pmu
diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst
index a96a423e3779..6ebe163f9dfe 100644
--- a/Documentation/admin-guide/pm/cpuidle.rst
+++ b/Documentation/admin-guide/pm/cpuidle.rst
@@ -690,7 +690,7 @@ which of the two parameters is added to the kernel command line. In the
instruction of the CPUs (which, as a rule, suspends the execution of the program
and causes the hardware to attempt to enter the shallowest available idle state)
for this purpose, and if ``idle=poll`` is used, idle CPUs will execute a
-more or less ``lightweight'' sequence of instructions in a tight loop. [Note
+more or less "lightweight" sequence of instructions in a tight loop. [Note
that using ``idle=poll`` is somewhat drastic in many cases, as preventing idle
CPUs from saving almost any energy at all may not be the only effect of it.
For example, on Intel hardware it effectively prevents CPUs from using
diff --git a/Documentation/admin-guide/svga.rst b/Documentation/admin-guide/svga.rst
index b6c2f9acca92..9eb1e0738e84 100644
--- a/Documentation/admin-guide/svga.rst
+++ b/Documentation/admin-guide/svga.rst
@@ -12,7 +12,8 @@ Intro
This small document describes the "Video Mode Selection" feature which
allows the use of various special video modes supported by the video BIOS. Due
to usage of the BIOS, the selection is limited to boot time (before the
-kernel decompression starts) and works only on 80X86 machines.
+kernel decompression starts) and works only on 80X86 machines that are
+booted through BIOS firmware (as opposed to through UEFI, kexec, etc.).
.. note::
@@ -23,7 +24,7 @@ kernel decompression starts) and works only on 80X86 machines.
The video mode to be used is selected by a kernel parameter which can be
specified in the kernel Makefile (the SVGA_MODE=... line) or by the "vga=..."
-option of LILO (or some other boot loader you use) or by the "vidmode" utility
+option of LILO (or some other boot loader you use) or by the "xrandr" utility
(present in standard Linux utility packages). You can use the following values
of this parameter::
@@ -41,7 +42,7 @@ of this parameter::
better to use absolute mode numbers instead.
0x.... - Hexadecimal video mode ID (also displayed on the menu, see below
- for exact meaning of the ID). Warning: rdev and LILO don't support
+ for exact meaning of the ID). Warning: LILO doesn't support
hexadecimal numbers -- you have to convert it to decimal manually.
Menu
diff --git a/Documentation/admin-guide/sysctl/abi.rst b/Documentation/admin-guide/sysctl/abi.rst
index 599bcde7f0b7..ac87eafdb54f 100644
--- a/Documentation/admin-guide/sysctl/abi.rst
+++ b/Documentation/admin-guide/sysctl/abi.rst
@@ -1,67 +1,34 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
================================
Documentation for /proc/sys/abi/
================================
-kernel version 2.6.0.test2
+.. See scripts/check-sysctl-docs to keep this up to date:
+.. scripts/check-sysctl-docs -vtable="abi" \
+.. Documentation/admin-guide/sysctl/abi.rst \
+.. $(git grep -l register_sysctl_)
-Copyright (c) 2003, Fabian Frederick <ffrederick@users.sourceforge.net>
+Copyright (c) 2020, Stephen Kitt
-For general info: index.rst.
+For general info, see :doc:`index`.
------------------------------------------------------------------------------
-This path is binary emulation relevant aka personality types aka abi.
-When a process is executed, it's linked to an exec_domain whose
-personality is defined using values available from /proc/sys/abi.
-You can find further details about abi in include/linux/personality.h.
-
-Here are the files featuring in 2.6 kernel:
-
-- defhandler_coff
-- defhandler_elf
-- defhandler_lcall7
-- defhandler_libcso
-- fake_utsname
-- trace
-
-defhandler_coff
----------------
-
-defined value:
- PER_SCOSVR3::
-
- 0x0003 | STICKY_TIMEOUTS | WHOLE_SECONDS | SHORT_INODE
-
-defhandler_elf
---------------
-
-defined value:
- PER_LINUX::
-
- 0
-
-defhandler_lcall7
------------------
-
-defined value :
- PER_SVR4::
-
- 0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO,
-
-defhandler_libsco
------------------
-
-defined value:
- PER_SVR4::
+The files in ``/proc/sys/abi`` can be used to see and modify
+ABI-related settings.
- 0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO,
+Currently, these files might (depending on your configuration)
+show up in ``/proc/sys/kernel``:
-fake_utsname
-------------
+.. contents:: :local:
-Unused
+vsyscall32 (x86)
+================
-trace
------
+Determines whether the kernels maps a vDSO page into 32-bit processes;
+can be set to 1 to enable, or 0 to disable. Defaults to enabled if
+``CONFIG_COMPAT_VDSO`` is set, disabled otherwide.
-Unused
+This controls the same setting as the ``vdso32`` kernel boot
+parameter.
diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
index abf804719890..f718a2eaf1f6 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -130,7 +130,7 @@ More detailed explanation for tainting
5) ``B`` If a page-release function has found a bad page reference or some
unexpected page flags. This indicates a hardware problem or a kernel bug;
there should be other information in the log indicating why this tainting
- occured.
+ occurred.
6) ``U`` if a user or user application specifically requested that the
Tainted flag be set, ``' '`` otherwise.
diff --git a/Documentation/arm/sunxi.rst b/Documentation/arm/sunxi.rst
index b037428aee98..62b533d0ba94 100644
--- a/Documentation/arm/sunxi.rst
+++ b/Documentation/arm/sunxi.rst
@@ -108,7 +108,7 @@ SunXi family
* Datasheet
- http://dl.linux-sunxi.org/H3/Allwinner_H3_Datasheet_V1.0.pdf
+ https://linux-sunxi.org/images/4/4b/Allwinner_H3_Datasheet_V1.2.pdf
- Allwinner R40 (sun8i)
diff --git a/Documentation/arm/uefi.rst b/Documentation/arm/uefi.rst
index f868330df6be..f732f957421f 100644
--- a/Documentation/arm/uefi.rst
+++ b/Documentation/arm/uefi.rst
@@ -23,7 +23,7 @@ makes it possible for the kernel to support additional features:
For actually enabling [U]EFI support, enable:
- CONFIG_EFI=y
-- CONFIG_EFI_VARS=y or m
+- CONFIG_EFIVAR_FS=y or m
The implementation depends on receiving information about the UEFI environment
in a Flattened Device Tree (FDT) - so is only available with CONFIG_OF.
diff --git a/Documentation/arm64/amu.rst b/Documentation/arm64/amu.rst
index 452ec8b115c2..01f2de2b0450 100644
--- a/Documentation/arm64/amu.rst
+++ b/Documentation/arm64/amu.rst
@@ -1,3 +1,5 @@
+.. _amu_index:
+
=======================================================
Activity Monitors Unit (AMU) extension in AArch64 Linux
=======================================================
diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
index f28853f80089..328e0c454fbd 100644
--- a/Documentation/arm64/cpu-feature-registers.rst
+++ b/Documentation/arm64/cpu-feature-registers.rst
@@ -175,6 +175,8 @@ infrastructure:
+------------------------------+---------+---------+
| Name | bits | visible |
+------------------------------+---------+---------+
+ | MTE | [11-8] | y |
+ +------------------------------+---------+---------+
| SSBS | [7-4] | y |
+------------------------------+---------+---------+
| BT | [3-0] | y |
diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst
index 84a9fd2d41b4..bbd9cf54db6c 100644
--- a/Documentation/arm64/elf_hwcaps.rst
+++ b/Documentation/arm64/elf_hwcaps.rst
@@ -240,6 +240,10 @@ HWCAP2_BTI
Functionality implied by ID_AA64PFR0_EL1.BT == 0b0001.
+HWCAP2_MTE
+
+ Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0010, as described
+ by Documentation/arm64/memory-tagging-extension.rst.
4. Unused AT_HWCAP bits
-----------------------
diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
index d9665d83c53a..937634c49979 100644
--- a/Documentation/arm64/index.rst
+++ b/Documentation/arm64/index.rst
@@ -1,3 +1,5 @@
+.. _arm64_index:
+
==================
ARM64 Architecture
==================
@@ -14,6 +16,7 @@ ARM64 Architecture
hugetlbpage
legacy_instructions
memory
+ memory-tagging-extension
perf
pointer-authentication
silicon-errata
diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst
new file mode 100644
index 000000000000..034d37c605e8
--- /dev/null
+++ b/Documentation/arm64/memory-tagging-extension.rst
@@ -0,0 +1,305 @@
+===============================================
+Memory Tagging Extension (MTE) in AArch64 Linux
+===============================================
+
+Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
+ Catalin Marinas <catalin.marinas@arm.com>
+
+Date: 2020-02-25
+
+This document describes the provision of the Memory Tagging Extension
+functionality in AArch64 Linux.
+
+Introduction
+============
+
+ARMv8.5 based processors introduce the Memory Tagging Extension (MTE)
+feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI
+(Top Byte Ignore) feature and allows software to access a 4-bit
+allocation tag for each 16-byte granule in the physical address space.
+Such memory range must be mapped with the Normal-Tagged memory
+attribute. A logical tag is derived from bits 59-56 of the virtual
+address used for the memory access. A CPU with MTE enabled will compare
+the logical tag against the allocation tag and potentially raise an
+exception on mismatch, subject to system registers configuration.
+
+Userspace Support
+=================
+
+When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is
+supported by the hardware, the kernel advertises the feature to
+userspace via ``HWCAP2_MTE``.
+
+PROT_MTE
+--------
+
+To access the allocation tags, a user process must enable the Tagged
+memory attribute on an address range using a new ``prot`` flag for
+``mmap()`` and ``mprotect()``:
+
+``PROT_MTE`` - Pages allow access to the MTE allocation tags.
+
+The allocation tag is set to 0 when such pages are first mapped in the
+user address space and preserved on copy-on-write. ``MAP_SHARED`` is
+supported and the allocation tags can be shared between processes.
+
+**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and
+RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other
+types of mapping will result in ``-EINVAL`` returned by these system
+calls.
+
+**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot
+be cleared by ``mprotect()``.
+
+**Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and
+``MADV_FREE`` may have the allocation tags cleared (set to 0) at any
+point after the system call.
+
+Tag Check Faults
+----------------
+
+When ``PROT_MTE`` is enabled on an address range and a mismatch between
+the logical and allocation tags occurs on access, there are three
+configurable behaviours:
+
+- *Ignore* - This is the default mode. The CPU (and kernel) ignores the
+ tag check fault.
+
+- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with
+ ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The
+ memory access is not performed. If ``SIGSEGV`` is ignored or blocked
+ by the offending thread, the containing process is terminated with a
+ ``coredump``.
+
+- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending
+ thread, asynchronously following one or multiple tag check faults,
+ with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting
+ address is unknown).
+
+The user can select the above modes, per thread, using the
+``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where
+``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK``
+bit-field:
+
+- ``PR_MTE_TCF_NONE`` - *Ignore* tag check faults
+- ``PR_MTE_TCF_SYNC`` - *Synchronous* tag check fault mode
+- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
+
+The current tag check fault mode can be read using the
+``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call.
+
+Tag checking can also be disabled for a user thread by setting the
+``PSTATE.TCO`` bit with ``MSR TCO, #1``.
+
+**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``,
+irrespective of the interrupted context. ``PSTATE.TCO`` is restored on
+``sigreturn()``.
+
+**Note**: There are no *match-all* logical tags available for user
+applications.
+
+**Note**: Kernel accesses to the user address space (e.g. ``read()``
+system call) are not checked if the user thread tag checking mode is
+``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is
+``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user
+address accesses, however it cannot always guarantee it.
+
+Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions
+-----------------------------------------------------------------
+
+The architecture allows excluding certain tags to be randomly generated
+via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux
+excludes all tags other than 0. A user thread can enable specific tags
+in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
+flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
+in the ``PR_MTE_TAG_MASK`` bit-field.
+
+**Note**: The hardware uses an exclude mask but the ``prctl()``
+interface provides an include mask. An include mask of ``0`` (exclusion
+mask ``0xffff``) results in the CPU always generating tag ``0``.
+
+Initial process state
+---------------------
+
+On ``execve()``, the new process has the following configuration:
+
+- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled)
+- Tag checking mode set to ``PR_MTE_TCF_NONE``
+- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded)
+- ``PSTATE.TCO`` set to 0
+- ``PROT_MTE`` not set on any of the initial memory maps
+
+On ``fork()``, the new process inherits the parent's configuration and
+memory map attributes with the exception of the ``madvise()`` ranges
+with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set
+to 0).
+
+The ``ptrace()`` interface
+--------------------------
+
+``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read
+the tags from or set the tags to a tracee's address space. The
+``ptrace()`` system call is invoked as ``ptrace(request, pid, addr,
+data)`` where:
+
+- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_POKEMTETAGS``.
+- ``pid`` - the tracee's PID.
+- ``addr`` - address in the tracee's address space.
+- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to
+ a buffer of ``iov_len`` length in the tracer's address space.
+
+The tags in the tracer's ``iov_base`` buffer are represented as one
+4-bit tag per byte and correspond to a 16-byte MTE tag granule in the
+tracee's address space.
+
+**Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel
+will use the corresponding aligned address.
+
+``ptrace()`` return value:
+
+- 0 - tags were copied, the tracer's ``iov_len`` was updated to the
+ number of tags transferred. This may be smaller than the requested
+ ``iov_len`` if the requested address range in the tracee's or the
+ tracer's space cannot be accessed or does not have valid tags.
+- ``-EPERM`` - the specified process cannot be traced.
+- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid
+ address) and no tags copied. ``iov_len`` not updated.
+- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec``
+ or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated.
+- ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never
+ mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated.
+
+**Note**: There are no transient errors for the requests above, so user
+programs should not retry in case of a non-zero system call return.
+
+``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr ==
+``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged
+address ABI control and MTE configuration of a process as per the
+``prctl()`` options described in
+Documentation/arm64/tagged-address-abi.rst and above. The corresponding
+``regset`` is 1 element of 8 bytes (``sizeof(long))``).
+
+Example of correct usage
+========================
+
+*MTE Example code*
+
+.. code-block:: c
+
+ /*
+ * To be compiled with -march=armv8.5-a+memtag
+ */
+ #include <errno.h>
+ #include <stdint.h>
+ #include <stdio.h>
+ #include <stdlib.h>
+ #include <unistd.h>
+ #include <sys/auxv.h>
+ #include <sys/mman.h>
+ #include <sys/prctl.h>
+
+ /*
+ * From arch/arm64/include/uapi/asm/hwcap.h
+ */
+ #define HWCAP2_MTE (1 << 18)
+
+ /*
+ * From arch/arm64/include/uapi/asm/mman.h
+ */
+ #define PROT_MTE 0x20
+
+ /*
+ * From include/uapi/linux/prctl.h
+ */
+ #define PR_SET_TAGGED_ADDR_CTRL 55
+ #define PR_GET_TAGGED_ADDR_CTRL 56
+ # define PR_TAGGED_ADDR_ENABLE (1UL << 0)
+ # define PR_MTE_TCF_SHIFT 1
+ # define PR_MTE_TCF_NONE (0UL << PR_MTE_TCF_SHIFT)
+ # define PR_MTE_TCF_SYNC (1UL << PR_MTE_TCF_SHIFT)
+ # define PR_MTE_TCF_ASYNC (2UL << PR_MTE_TCF_SHIFT)
+ # define PR_MTE_TCF_MASK (3UL << PR_MTE_TCF_SHIFT)
+ # define PR_MTE_TAG_SHIFT 3
+ # define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT)
+
+ /*
+ * Insert a random logical tag into the given pointer.
+ */
+ #define insert_random_tag(ptr) ({ \
+ uint64_t __val; \
+ asm("irg %0, %1" : "=r" (__val) : "r" (ptr)); \
+ __val; \
+ })
+
+ /*
+ * Set the allocation tag on the destination address.
+ */
+ #define set_tag(tagged_addr) do { \
+ asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \
+ } while (0)
+
+ int main()
+ {
+ unsigned char *a;
+ unsigned long page_sz = sysconf(_SC_PAGESIZE);
+ unsigned long hwcap2 = getauxval(AT_HWCAP2);
+
+ /* check if MTE is present */
+ if (!(hwcap2 & HWCAP2_MTE))
+ return EXIT_FAILURE;
+
+ /*
+ * Enable the tagged address ABI, synchronous MTE tag check faults and
+ * allow all non-zero tags in the randomly generated set.
+ */
+ if (prctl(PR_SET_TAGGED_ADDR_CTRL,
+ PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | (0xfffe << PR_MTE_TAG_SHIFT),
+ 0, 0, 0)) {
+ perror("prctl() failed");
+ return EXIT_FAILURE;
+ }
+
+ a = mmap(0, page_sz, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (a == MAP_FAILED) {
+ perror("mmap() failed");
+ return EXIT_FAILURE;
+ }
+
+ /*
+ * Enable MTE on the above anonymous mmap. The flag could be passed
+ * directly to mmap() and skip this step.
+ */
+ if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) {
+ perror("mprotect() failed");
+ return EXIT_FAILURE;
+ }
+
+ /* access with the default tag (0) */
+ a[0] = 1;
+ a[1] = 2;
+
+ printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
+
+ /* set the logical and allocation tags */
+ a = (unsigned char *)insert_random_tag(a);
+ set_tag(a);
+
+ printf("%p\n", a);
+
+ /* non-zero tag access */
+ a[0] = 3;
+ printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
+
+ /*
+ * If MTE is enabled correctly the next instruction will generate an
+ * exception.
+ */
+ printf("Expecting SIGSEGV...\n");
+ a[16] = 0xdd;
+
+ /* this should not be printed in the PR_MTE_TCF_SYNC mode */
+ printf("...haven't got one\n");
+
+ return EXIT_FAILURE;
+ }
diff --git a/Documentation/conf.py b/Documentation/conf.py
index c503188880d9..0a102d57437d 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -36,10 +36,23 @@ needs_sphinx = '1.3'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
-extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain',
+extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include',
'kfigure', 'sphinx.ext.ifconfig', 'automarkup',
'maintainers_include', 'sphinx.ext.autosectionlabel' ]
+#
+# cdomain is badly broken in Sphinx 3+. Leaving it out generates *most*
+# of the docs correctly, but not all. Scream bloody murder but allow
+# the process to proceed; hopefully somebody will fix this properly soon.
+#
+if major >= 3:
+ sys.stderr.write('''WARNING: The kernel documentation build process
+ does not work correctly with Sphinx v3.0 and above. Expect errors
+ in the generated output.
+ ''')
+else:
+ extensions.append('cdomain')
+
# Ensure that autosectionlabel will produce unique names
autosectionlabel_prefix_document = True
autosectionlabel_maxdepth = 2
diff --git a/Documentation/core-api/cpu_hotplug.rst b/Documentation/core-api/cpu_hotplug.rst
index 298c9c8bea9a..a2c96bec5ee8 100644
--- a/Documentation/core-api/cpu_hotplug.rst
+++ b/Documentation/core-api/cpu_hotplug.rst
@@ -30,7 +30,7 @@ which didn't support these methods.
Command Line Switches
=====================
``maxcpus=n``
- Restrict boot time CPUs to *n*. Say if you have fourV CPUs, using
+ Restrict boot time CPUs to *n*. Say if you have four CPUs, using
``maxcpus=2`` will only boot two. You can choose to bring the
other CPUs later online.
diff --git a/Documentation/crypto/userspace-if.rst b/Documentation/crypto/userspace-if.rst
index 52019e905900..b45dabbf69d6 100644
--- a/Documentation/crypto/userspace-if.rst
+++ b/Documentation/crypto/userspace-if.rst
@@ -296,15 +296,16 @@ follows:
struct sockaddr_alg sa = {
.salg_family = AF_ALG,
- .salg_type = "rng", /* this selects the symmetric cipher */
- .salg_name = "drbg_nopr_sha256" /* this is the cipher name */
+ .salg_type = "rng", /* this selects the random number generator */
+ .salg_name = "drbg_nopr_sha256" /* this is the RNG name */
};
Depending on the RNG type, the RNG must be seeded. The seed is provided
using the setsockopt interface to set the key. For example, the
ansi_cprng requires a seed. The DRBGs do not require a seed, but may be
-seeded.
+seeded. The seed is also known as a *Personalization String* in NIST SP 800-90A
+standard.
Using the read()/recvmsg() system calls, random numbers can be obtained.
The kernel generates at most 128 bytes in one call. If user space
@@ -314,6 +315,16 @@ WARNING: The user space caller may invoke the initially mentioned accept
system call multiple times. In this case, the returned file descriptors
have the same state.
+Following CAVP testing interfaces are enabled when kernel is built with
+CRYPTO_USER_API_RNG_CAVP option:
+
+- the concatenation of *Entropy* and *Nonce* can be provided to the RNG via
+ ALG_SET_DRBG_ENTROPY setsockopt interface. Setting the entropy requires
+ CAP_SYS_ADMIN permission.
+
+- *Additional Data* can be provided using the send()/sendmsg() system calls,
+ but only after the entropy has been set.
+
Zero-Copy Interface
-------------------
@@ -377,6 +388,9 @@ mentioned optname:
provided ciphertext is assumed to contain an authentication tag of
the given size (see section about AEAD memory layout below).
+- ALG_SET_DRBG_ENTROPY -- Setting the entropy of the random number generator.
+ This option is applicable to RNG cipher type only.
+
User space API example
----------------------
diff --git a/Documentation/devicetree/bindings/arm/bcm/raspberrypi,bcm2835-firmware.yaml b/Documentation/devicetree/bindings/arm/bcm/raspberrypi,bcm2835-firmware.yaml
index 17e4f20c8d39..6834f5e8df5f 100644
--- a/Documentation/devicetree/bindings/arm/bcm/raspberrypi,bcm2835-firmware.yaml
+++ b/Documentation/devicetree/bindings/arm/bcm/raspberrypi,bcm2835-firmware.yaml
@@ -23,7 +23,7 @@ properties:
compatible:
items:
- const: raspberrypi,bcm2835-firmware
- - const: simple-bus
+ - const: simple-mfd
mboxes:
$ref: '/schemas/types.yaml#/definitions/phandle'
@@ -57,7 +57,7 @@ required:
examples:
- |
firmware {
- compatible = "raspberrypi,bcm2835-firmware", "simple-bus";
+ compatible = "raspberrypi,bcm2835-firmware", "simple-mfd";
mboxes = <&mailbox>;
firmware_clocks: clocks {
diff --git a/Documentation/devicetree/bindings/clock/imx8qxp-lpcg.yaml b/Documentation/devicetree/bindings/clock/imx8qxp-lpcg.yaml
index 1d5e9bcce4c8..33f3010f48c3 100644
--- a/Documentation/devicetree/bindings/clock/imx8qxp-lpcg.yaml
+++ b/Documentation/devicetree/bindings/clock/imx8qxp-lpcg.yaml
@@ -62,7 +62,7 @@ examples:
};
mmc@5b010000 {
- compatible = "fsl,imx8qxp-usdhc";
+ compatible = "fsl,imx8qxp-usdhc", "fsl,imx7d-usdhc";
interrupts = <GIC_SPI 232 IRQ_TYPE_LEVEL_HIGH>;
reg = <0x5b010000 0x10000>;
clocks = <&conn_lpcg IMX_CONN_LPCG_SDHC0_IPG_CLK>,
diff --git a/Documentation/devicetree/bindings/crypto/ti,sa2ul.yaml b/Documentation/devicetree/bindings/crypto/ti,sa2ul.yaml
index 85ef69ffebed..1465c9ebaf93 100644
--- a/Documentation/devicetree/bindings/crypto/ti,sa2ul.yaml
+++ b/Documentation/devicetree/bindings/crypto/ti,sa2ul.yaml
@@ -67,7 +67,7 @@ examples:
main_crypto: crypto@4e00000 {
compatible = "ti,j721-sa2ul";
- reg = <0x0 0x4e00000 0x0 0x1200>;
+ reg = <0x4e00000 0x1200>;
power-domains = <&k3_pds 264 TI_SCI_PD_EXCLUSIVE>;
dmas = <&main_udmap 0xc000>, <&main_udmap 0x4000>,
<&main_udmap 0x4001>;
diff --git a/Documentation/devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.yaml b/Documentation/devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.yaml
index 52a939cade3b..7b9d468c3e52 100644
--- a/Documentation/devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.yaml
+++ b/Documentation/devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.yaml
@@ -145,10 +145,10 @@ examples:
display@fd4a0000 {
compatible = "xlnx,zynqmp-dpsub-1.7";
- reg = <0x0 0xfd4a0000 0x0 0x1000>,
- <0x0 0xfd4aa000 0x0 0x1000>,
- <0x0 0xfd4ab000 0x0 0x1000>,
- <0x0 0xfd4ac000 0x0 0x1000>;
+ reg = <0xfd4a0000 0x1000>,
+ <0xfd4aa000 0x1000>,
+ <0xfd4ab000 0x1000>,
+ <0xfd4ac000 0x1000>;
reg-names = "dp", "blend", "av_buf", "aud";
interrupts = <0 119 4>;
interrupt-parent = <&gic>;
diff --git a/Documentation/devicetree/bindings/dma/xilinx/xlnx,zynqmp-dpdma.yaml b/Documentation/devicetree/bindings/dma/xilinx/xlnx,zynqmp-dpdma.yaml
index 5de510f8c88c..2a595b18ff6c 100644
--- a/Documentation/devicetree/bindings/dma/xilinx/xlnx,zynqmp-dpdma.yaml
+++ b/Documentation/devicetree/bindings/dma/xilinx/xlnx,zynqmp-dpdma.yaml
@@ -57,7 +57,7 @@ examples:
dma: dma-controller@fd4c0000 {
compatible = "xlnx,zynqmp-dpdma";
- reg = <0x0 0xfd4c0000 0x0 0x1000>;
+ reg = <0xfd4c0000 0x1000>;
interrupts = <GIC_SPI 122 IRQ_TYPE_LEVEL_HIGH>;
interrupt-parent = <&gic>;
clocks = <&dpdma_clk>;
diff --git a/Documentation/devicetree/bindings/edac/amazon,al-mc-edac.yaml b/Documentation/devicetree/bindings/edac/amazon,al-mc-edac.yaml
new file mode 100644
index 000000000000..a25387df0865
--- /dev/null
+++ b/Documentation/devicetree/bindings/edac/amazon,al-mc-edac.yaml
@@ -0,0 +1,67 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/edac/amazon,al-mc-edac.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Amazon's Annapurna Labs Memory Controller EDAC
+
+maintainers:
+ - Talel Shenhar <talel@amazon.com>
+ - Talel Shenhar <talelshenhar@gmail.com>
+
+description: |
+ EDAC node is defined to describe on-chip error detection and correction for
+ Amazon's Annapurna Labs Memory Controller.
+
+properties:
+
+ compatible:
+ const: amazon,al-mc-edac
+
+ reg:
+ maxItems: 1
+
+ "#address-cells":
+ const: 2
+
+ "#size-cells":
+ const: 2
+
+ interrupts:
+ minItems: 1
+ maxItems: 2
+ items:
+ - description: uncorrectable error interrupt
+ - description: correctable error interrupt
+
+ interrupt-names:
+ minItems: 1
+ maxItems: 2
+ items:
+ - const: ue
+ - const: ce
+
+required:
+ - compatible
+ - reg
+ - "#address-cells"
+ - "#size-cells"
+
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/irq.h>
+ soc {
+ #address-cells = <2>;
+ #size-cells = <2>;
+ edac@f0080000 {
+ #address-cells = <2>;
+ #size-cells = <2>;
+ compatible = "amazon,al-mc-edac";
+ reg = <0x0 0xf0080000 0x0 0x00010000>;
+ interrupt-parent = <&amazon_al_system_fabric>;
+ interrupt-names = "ue";
+ interrupts = <20 IRQ_TYPE_LEVEL_HIGH>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/gpio/sgpio-aspeed.txt b/Documentation/devicetree/bindings/gpio/sgpio-aspeed.txt
index d4d83916c09d..be329ea4794f 100644
--- a/Documentation/devicetree/bindings/gpio/sgpio-aspeed.txt
+++ b/Documentation/devicetree/bindings/gpio/sgpio-aspeed.txt
@@ -20,8 +20,9 @@ Required properties:
- gpio-controller : Marks the device node as a GPIO controller
- interrupts : Interrupt specifier, see interrupt-controller/interrupts.txt
- interrupt-controller : Mark the GPIO controller as an interrupt-controller
-- ngpios : number of GPIO lines, see gpio.txt
- (should be multiple of 8, up to 80 pins)
+- ngpios : number of *hardware* GPIO lines, see gpio.txt. This will expose
+ 2 software GPIOs per hardware GPIO: one for hardware input, one for hardware
+ output. Up to 80 pins, must be a multiple of 8.
- clocks : A phandle to the APB clock for SGPM clock division
- bus-frequency : SGPM CLK frequency
diff --git a/Documentation/devicetree/bindings/interrupt-controller/actions,owl-sirq.yaml b/Documentation/devicetree/bindings/interrupt-controller/actions,owl-sirq.yaml
new file mode 100644
index 000000000000..5da333c644c9
--- /dev/null
+++ b/Documentation/devicetree/bindings/interrupt-controller/actions,owl-sirq.yaml
@@ -0,0 +1,65 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/interrupt-controller/actions,owl-sirq.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Actions Semi Owl SoCs SIRQ interrupt controller
+
+maintainers:
+ - Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
+ - Cristian Ciocaltea <cristian.ciocaltea@gmail.com>
+
+description: |
+ This interrupt controller is found in the Actions Semi Owl SoCs (S500, S700
+ and S900) and provides support for handling up to 3 external interrupt lines.
+
+properties:
+ compatible:
+ enum:
+ - actions,s500-sirq
+ - actions,s700-sirq
+ - actions,s900-sirq
+
+ reg:
+ maxItems: 1
+
+ interrupt-controller: true
+
+ '#interrupt-cells':
+ const: 2
+ description:
+ The first cell is the input IRQ number, between 0 and 2, while the second
+ cell is the trigger type as defined in interrupt.txt in this directory.
+
+ 'interrupts':
+ description: |
+ Contains the GIC SPI IRQs mapped to the external interrupt lines.
+ They shall be specified sequentially from output 0 to 2.
+ minItems: 3
+ maxItems: 3
+
+required:
+ - compatible
+ - reg
+ - interrupt-controller
+ - '#interrupt-cells'
+ - 'interrupts'
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+
+ sirq: interrupt-controller@b01b0200 {
+ compatible = "actions,s500-sirq";
+ reg = <0xb01b0200 0x4>;
+ interrupt-controller;
+ #interrupt-cells = <2>;
+ interrupts = <GIC_SPI 13 IRQ_TYPE_LEVEL_HIGH>, /* SIRQ0 */
+ <GIC_SPI 14 IRQ_TYPE_LEVEL_HIGH>, /* SIRQ1 */
+ <GIC_SPI 15 IRQ_TYPE_LEVEL_HIGH>; /* SIRQ2 */
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/interrupt-controller/mstar,mst-intc.yaml b/Documentation/devicetree/bindings/interrupt-controller/mstar,mst-intc.yaml
new file mode 100644
index 000000000000..bbf0f26cd008
--- /dev/null
+++ b/Documentation/devicetree/bindings/interrupt-controller/mstar,mst-intc.yaml
@@ -0,0 +1,64 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/interrupt-controller/mstar,mst-intc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: MStar Interrupt Controller
+
+maintainers:
+ - Mark-PK Tsai <mark-pk.tsai@mediatek.com>
+
+description: |+
+ MStar, SigmaStar and Mediatek TV SoCs contain multiple legacy
+ interrupt controllers that routes interrupts to the GIC.
+
+ The HW block exposes a number of interrupt controllers, each
+ can support up to 64 interrupts.
+
+properties:
+ compatible:
+ const: mstar,mst-intc
+
+ interrupt-controller: true
+
+ "#interrupt-cells":
+ const: 3
+ description: |
+ Use the same format as specified by GIC in arm,gic.yaml.
+
+ reg:
+ maxItems: 1
+
+ mstar,irqs-map-range:
+ description: |
+ The range <start, end> of parent interrupt controller's interrupt
+ lines that are hardwired to mstar interrupt controller.
+ $ref: /schemas/types.yaml#/definitions/uint32-matrix
+ items:
+ minItems: 2
+ maxItems: 2
+
+ mstar,intc-no-eoi:
+ description:
+ Mark this controller has no End Of Interrupt(EOI) implementation.
+ type: boolean
+
+required:
+ - compatible
+ - reg
+ - mstar,irqs-map-range
+
+additionalProperties: false
+
+examples:
+ - |
+ mst_intc0: interrupt-controller@1f2032d0 {
+ compatible = "mstar,mst-intc";
+ interrupt-controller;
+ #interrupt-cells = <3>;
+ interrupt-parent = <&gic>;
+ reg = <0x1f2032d0 0x30>;
+ mstar,irqs-map-range = <0 63>;
+ };
+...
diff --git a/Documentation/devicetree/bindings/interrupt-controller/snps,dw-apb-ictl.txt b/Documentation/devicetree/bindings/interrupt-controller/snps,dw-apb-ictl.txt
index 086ff08322db..2db59df9408f 100644
--- a/Documentation/devicetree/bindings/interrupt-controller/snps,dw-apb-ictl.txt
+++ b/Documentation/devicetree/bindings/interrupt-controller/snps,dw-apb-ictl.txt
@@ -2,7 +2,8 @@ Synopsys DesignWare APB interrupt controller (dw_apb_ictl)
Synopsys DesignWare provides interrupt controller IP for APB known as
dw_apb_ictl. The IP is used as secondary interrupt controller in some SoCs with
-APB bus, e.g. Marvell Armada 1500.
+APB bus, e.g. Marvell Armada 1500. It can also be used as primary interrupt
+controller in some SoCs, e.g. Hisilicon SD5203.
Required properties:
- compatible: shall be "snps,dw-apb-ictl"
@@ -10,6 +11,8 @@ Required properties:
region starting with ENABLE_LOW register
- interrupt-controller: identifies the node as an interrupt controller
- #interrupt-cells: number of cells to encode an interrupt-specifier, shall be 1
+
+Additional required property when it's used as secondary interrupt controller:
- interrupts: interrupt reference to primary interrupt controller
The interrupt sources map to the corresponding bits in the interrupt
@@ -21,6 +24,7 @@ registers, i.e.
- (optional) fast interrupts start at 64.
Example:
+ /* dw_apb_ictl is used as secondary interrupt controller */
aic: interrupt-controller@3000 {
compatible = "snps,dw-apb-ictl";
reg = <0x3000 0xc00>;
@@ -29,3 +33,11 @@ Example:
interrupt-parent = <&gic>;
interrupts = <GIC_SPI 3 IRQ_TYPE_LEVEL_HIGH>;
};
+
+ /* dw_apb_ictl is used as primary interrupt controller */
+ vic: interrupt-controller@10130000 {
+ compatible = "snps,dw-apb-ictl";
+ reg = <0x10130000 0x1000>;
+ interrupt-controller;
+ #interrupt-cells = <1>;
+ };
diff --git a/Documentation/devicetree/bindings/interrupt-controller/ti,pruss-intc.yaml b/Documentation/devicetree/bindings/interrupt-controller/ti,pruss-intc.yaml
new file mode 100644
index 000000000000..bbf79d125675
--- /dev/null
+++ b/Documentation/devicetree/bindings/interrupt-controller/ti,pruss-intc.yaml
@@ -0,0 +1,158 @@
+# SPDX-License-Identifier: (GPL-2.0-only or BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/interrupt-controller/ti,pruss-intc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: TI PRU-ICSS Local Interrupt Controller
+
+maintainers:
+ - Suman Anna <s-anna@ti.com>
+
+description: |
+ Each PRU-ICSS has a single interrupt controller instance that is common
+ to all the PRU cores. Most interrupt controllers can route 64 input events
+ which are then mapped to 10 possible output interrupts through two levels
+ of mapping. The input events can be triggered by either the PRUs and/or
+ various other PRUSS internal and external peripherals. The first 2 output
+ interrupts (0, 1) are fed exclusively to the internal PRU cores, with the
+ remaining 8 (2 through 9) connected to external interrupt controllers
+ including the MPU and/or other PRUSS instances, DSPs or devices.
+
+ The property "ti,irqs-reserved" is used for denoting the connection
+ differences on the output interrupts 2 through 9. If this property is not
+ defined, it implies that all the PRUSS INTC output interrupts 2 through 9
+ (host_intr0 through host_intr7) are connected exclusively to the Arm interrupt
+ controller.
+
+ The K3 family of SoCs can handle 160 input events that can be mapped to 20
+ different possible output interrupts. The additional output interrupts (10
+ through 19) are connected to new sub-modules within the ICSSG instances.
+
+ This interrupt-controller node should be defined as a child node of the
+ corresponding PRUSS node. The node should be named "interrupt-controller".
+
+properties:
+ compatible:
+ enum:
+ - ti,pruss-intc
+ - ti,icssg-intc
+ description: |
+ Use "ti,pruss-intc" for OMAP-L13x/AM18x/DA850 SoCs,
+ AM335x family of SoCs,
+ AM437x family of SoCs,
+ AM57xx family of SoCs
+ 66AK2G family of SoCs
+ Use "ti,icssg-intc" for K3 AM65x & J721E family of SoCs
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ minItems: 1
+ maxItems: 8
+ description: |
+ All the interrupts generated towards the main host processor in the SoC.
+ A shared interrupt can be skipped if the desired destination and usage is
+ by a different processor/device.
+
+ interrupt-names:
+ minItems: 1
+ maxItems: 8
+ items:
+ pattern: host_intr[0-7]
+ description: |
+ Should use one of the above names for each valid host event interrupt
+ connected to Arm interrupt controller, the name should match the
+ corresponding host event interrupt number.
+
+ interrupt-controller: true
+
+ "#interrupt-cells":
+ const: 3
+ description: |
+ Client users shall use the PRU System event number (the interrupt source
+ that the client is interested in) [cell 1], PRU channel [cell 2] and PRU
+ host_event (target) [cell 3] as the value of the interrupts property in
+ their node. The system events can be mapped to some output host
+ interrupts through 2 levels of many-to-one mapping i.e. events to channel
+ mapping and channels to host interrupts so through this property entire
+ mapping is provided.
+
+ ti,irqs-reserved:
+ $ref: /schemas/types.yaml#definitions/uint8
+ description: |
+ Bitmask of host interrupts between 0 and 7 (corresponding to PRUSS INTC
+ output interrupts 2 through 9) that are not connected to the Arm interrupt
+ controller or are shared and used by other devices or processors in the
+ SoC. Define this property when any of 8 interrupts should not be handled
+ by Arm interrupt controller.
+ Eg: - AM437x and 66AK2G SoCs do not have "host_intr5" interrupt
+ connected to MPU
+ - AM65x and J721E SoCs have "host_intr5", "host_intr6" and
+ "host_intr7" interrupts connected to MPU, and other ICSSG
+ instances.
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - interrupt-names
+ - interrupt-controller
+ - "#interrupt-cells"
+
+additionalProperties: false
+
+examples:
+ - |
+ /* AM33xx PRU-ICSS */
+ pruss: pruss@0 {
+ compatible = "ti,am3356-pruss";
+ reg = <0x0 0x80000>;
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges;
+
+ pruss_intc: interrupt-controller@20000 {
+ compatible = "ti,pruss-intc";
+ reg = <0x20000 0x2000>;
+ interrupts = <20 21 22 23 24 25 26 27>;
+ interrupt-names = "host_intr0", "host_intr1",
+ "host_intr2", "host_intr3",
+ "host_intr4", "host_intr5",
+ "host_intr6", "host_intr7";
+ interrupt-controller;
+ #interrupt-cells = <3>;
+ };
+ };
+
+ - |
+
+ /* AM4376 PRU-ICSS */
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ pruss@0 {
+ compatible = "ti,am4376-pruss";
+ reg = <0x0 0x40000>;
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges;
+
+ interrupt-controller@20000 {
+ compatible = "ti,pruss-intc";
+ reg = <0x20000 0x2000>;
+ interrupt-controller;
+ #interrupt-cells = <3>;
+ interrupts = <GIC_SPI 20 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 21 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 22 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 23 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 24 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 26 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 27 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-names = "host_intr0", "host_intr1",
+ "host_intr2", "host_intr3",
+ "host_intr4",
+ "host_intr6", "host_intr7";
+ ti,irqs-reserved = /bits/ 8 <0x20>; /* BIT(5) */
+ };
+ };
diff --git a/Documentation/devicetree/bindings/leds/cznic,turris-omnia-leds.yaml b/Documentation/devicetree/bindings/leds/cznic,turris-omnia-leds.yaml
index 24ad1446445e..fe7fa25877fd 100644
--- a/Documentation/devicetree/bindings/leds/cznic,turris-omnia-leds.yaml
+++ b/Documentation/devicetree/bindings/leds/cznic,turris-omnia-leds.yaml
@@ -30,7 +30,7 @@ properties:
const: 0
patternProperties:
- "^multi-led[0-9a-f]$":
+ "^multi-led@[0-9a-b]$":
type: object
allOf:
- $ref: leds-class-multicolor.yaml#
diff --git a/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.yaml b/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.yaml
index 10b45966f1b8..e71d13c2d109 100644
--- a/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.yaml
+++ b/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.yaml
@@ -21,23 +21,26 @@ description: |
properties:
compatible:
- enum:
- - fsl,imx25-esdhc
- - fsl,imx35-esdhc
- - fsl,imx51-esdhc
- - fsl,imx53-esdhc
- - fsl,imx6q-usdhc
- - fsl,imx6sl-usdhc
- - fsl,imx6sx-usdhc
- - fsl,imx6ull-usdhc
- - fsl,imx7d-usdhc
- - fsl,imx7ulp-usdhc
- - fsl,imx8mq-usdhc
- - fsl,imx8mm-usdhc
- - fsl,imx8mn-usdhc
- - fsl,imx8mp-usdhc
- - fsl,imx8qm-usdhc
- - fsl,imx8qxp-usdhc
+ oneOf:
+ - enum:
+ - fsl,imx25-esdhc
+ - fsl,imx35-esdhc
+ - fsl,imx51-esdhc
+ - fsl,imx53-esdhc
+ - fsl,imx6q-usdhc
+ - fsl,imx6sl-usdhc
+ - fsl,imx6sx-usdhc
+ - fsl,imx6ull-usdhc
+ - fsl,imx7d-usdhc
+ - fsl,imx7ulp-usdhc
+ - items:
+ - enum:
+ - fsl,imx8mm-usdhc
+ - fsl,imx8mn-usdhc
+ - fsl,imx8mp-usdhc
+ - fsl,imx8mq-usdhc
+ - fsl,imx8qxp-usdhc
+ - const: fsl,imx7d-usdhc
reg:
maxItems: 1
diff --git a/Documentation/devicetree/bindings/mmc/microchip,dw-sparx5-sdhci.yaml b/Documentation/devicetree/bindings/mmc/microchip,dw-sparx5-sdhci.yaml
new file mode 100644
index 000000000000..55883290543b
--- /dev/null
+++ b/Documentation/devicetree/bindings/mmc/microchip,dw-sparx5-sdhci.yaml
@@ -0,0 +1,65 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/mmc/microchip,dw-sparx5-sdhci.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Microchip Sparx5 Mobile Storage Host Controller Binding
+
+allOf:
+ - $ref: "mmc-controller.yaml"
+
+maintainers:
+ - Lars Povlsen <lars.povlsen@microchip.com>
+
+# Everything else is described in the common file
+properties:
+ compatible:
+ const: microchip,dw-sparx5-sdhci
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ clocks:
+ maxItems: 1
+ description:
+ Handle to "core" clock for the sdhci controller.
+
+ clock-names:
+ items:
+ - const: core
+
+ microchip,clock-delay:
+ description: Delay clock to card to meet setup time requirements.
+ Each step increase by 1.25ns.
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 1
+ maximum: 15
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - clocks
+ - clock-names
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ #include <dt-bindings/clock/microchip,sparx5.h>
+ sdhci0: mmc@600800000 {
+ compatible = "microchip,dw-sparx5-sdhci";
+ reg = <0x00800000 0x1000>;
+ pinctrl-0 = <&emmc_pins>;
+ pinctrl-names = "default";
+ clocks = <&clks CLK_ID_AUX1>;
+ clock-names = "core";
+ assigned-clocks = <&clks CLK_ID_AUX1>;
+ assigned-clock-rates = <800000000>;
+ interrupts = <GIC_SPI 4 IRQ_TYPE_LEVEL_HIGH>;
+ bus-width = <8>;
+ microchip,clock-delay = <10>;
+ };
diff --git a/Documentation/devicetree/bindings/mmc/mmc-controller.yaml b/Documentation/devicetree/bindings/mmc/mmc-controller.yaml
index b96da0c7f819..f928f66fc59a 100644
--- a/Documentation/devicetree/bindings/mmc/mmc-controller.yaml
+++ b/Documentation/devicetree/bindings/mmc/mmc-controller.yaml
@@ -14,6 +14,10 @@ description: |
that requires the respective functionality should implement them using
these definitions.
+ It is possible to assign a fixed index mmcN to an MMC host controller
+ (and the corresponding mmcblkN devices) by defining an alias in the
+ /aliases device tree node.
+
properties:
$nodename:
pattern: "^mmc(@.*)?$"
diff --git a/Documentation/devicetree/bindings/mmc/mmc-pwrseq-simple.yaml b/Documentation/devicetree/bindings/mmc/mmc-pwrseq-simple.yaml
index 449215444723..8d625f903856 100644
--- a/Documentation/devicetree/bindings/mmc/mmc-pwrseq-simple.yaml
+++ b/Documentation/devicetree/bindings/mmc/mmc-pwrseq-simple.yaml
@@ -20,6 +20,8 @@ properties:
reset-gpios:
minItems: 1
+ # Put some limit to avoid false warnings
+ maxItems: 32
description:
contains a list of GPIO specifiers. The reset GPIOs are asserted
at initialization and prior we start the power up procedure of the card.
diff --git a/Documentation/devicetree/bindings/mmc/owl-mmc.yaml b/Documentation/devicetree/bindings/mmc/owl-mmc.yaml
index 1380501fb8f0..5eab25ccf7ae 100644
--- a/Documentation/devicetree/bindings/mmc/owl-mmc.yaml
+++ b/Documentation/devicetree/bindings/mmc/owl-mmc.yaml
@@ -14,7 +14,11 @@ maintainers:
properties:
compatible:
- const: actions,owl-mmc
+ oneOf:
+ - const: actions,owl-mmc
+ - items:
+ - const: actions,s700-mmc
+ - const: actions,owl-mmc
reg:
maxItems: 1
diff --git a/Documentation/devicetree/bindings/mmc/renesas,sdhi.yaml b/Documentation/devicetree/bindings/mmc/renesas,sdhi.yaml
index b4c3fd40caeb..6bbf29b5c239 100644
--- a/Documentation/devicetree/bindings/mmc/renesas,sdhi.yaml
+++ b/Documentation/devicetree/bindings/mmc/renesas,sdhi.yaml
@@ -50,6 +50,7 @@ properties:
- renesas,sdhi-r8a774a1 # RZ/G2M
- renesas,sdhi-r8a774b1 # RZ/G2N
- renesas,sdhi-r8a774c0 # RZ/G2E
+ - renesas,sdhi-r8a774e1 # RZ/G2H
- renesas,sdhi-r8a7795 # R-Car H3
- renesas,sdhi-r8a7796 # R-Car M3-W
- renesas,sdhi-r8a77961 # R-Car M3-W+
diff --git a/Documentation/devicetree/bindings/mmc/sdhci-am654.txt b/Documentation/devicetree/bindings/mmc/sdhci-am654.txt
deleted file mode 100644
index 6d202f4d9249..000000000000
--- a/Documentation/devicetree/bindings/mmc/sdhci-am654.txt
+++ /dev/null
@@ -1,61 +0,0 @@
-Device Tree Bindings for the SDHCI Controllers present on TI's AM654 SOCs
-
-The bindings follow the mmc[1], clock[2] and interrupt[3] bindings.
-Only deviations are documented here.
-
- [1] Documentation/devicetree/bindings/mmc/mmc.txt
- [2] Documentation/devicetree/bindings/clock/clock-bindings.txt
- [3] Documentation/devicetree/bindings/interrupt-controller/interrupts.txt
-
-Required Properties:
- - compatible: should be one of:
- "ti,am654-sdhci-5.1": SDHCI on AM654 device.
- "ti,j721e-sdhci-8bit": 8 bit SDHCI on J721E device.
- "ti,j721e-sdhci-4bit": 4 bit SDHCI on J721E device.
- - reg: Must be two entries.
- - The first should be the sdhci register space
- - The second should the subsystem/phy register space
- - clocks: Handles to the clock inputs.
- - clock-names: Tuple including "clk_xin" and "clk_ahb"
- - interrupts: Interrupt specifiers
- Output tap delay for each speed mode:
- - ti,otap-del-sel-legacy
- - ti,otap-del-sel-mmc-hs
- - ti,otap-del-sel-sd-hs
- - ti,otap-del-sel-sdr12
- - ti,otap-del-sel-sdr25
- - ti,otap-del-sel-sdr50
- - ti,otap-del-sel-sdr104
- - ti,otap-del-sel-ddr50
- - ti,otap-del-sel-ddr52
- - ti,otap-del-sel-hs200
- - ti,otap-del-sel-hs400
- These bindings must be provided otherwise the driver will disable the
- corresponding speed mode (i.e. all nodes must provide at least -legacy)
-
-Optional Properties (Required for ti,am654-sdhci-5.1 and ti,j721e-sdhci-8bit):
- - ti,trm-icp: DLL trim select
- - ti,driver-strength-ohm: driver strength in ohms.
- Valid values are 33, 40, 50, 66 and 100 ohms.
-Optional Properties:
- - ti,strobe-sel: strobe select delay for HS400 speed mode. Default value: 0x0.
- - ti,clkbuf-sel: Clock Delay Buffer Select
-
-Example:
-
- sdhci0: sdhci@4f80000 {
- compatible = "ti,am654-sdhci-5.1";
- reg = <0x0 0x4f80000 0x0 0x260>, <0x0 0x4f90000 0x0 0x134>;
- power-domains = <&k3_pds 47>;
- clocks = <&k3_clks 47 0>, <&k3_clks 47 1>;
- clock-names = "clk_ahb", "clk_xin";
- interrupts = <GIC_SPI 136 IRQ_TYPE_LEVEL_HIGH>;
- sdhci-caps-mask = <0x80000007 0x0>;
- mmc-ddr-1_8v;
- ti,otap-del-sel-legacy = <0x0>;
- ti,otap-del-sel-mmc-hs = <0x0>;
- ti,otap-del-sel-ddr52 = <0x5>;
- ti,otap-del-sel-hs200 = <0x5>;
- ti,otap-del-sel-hs400 = <0x0>;
- ti,trm-icp = <0x8>;
- };
diff --git a/Documentation/devicetree/bindings/mmc/sdhci-am654.yaml b/Documentation/devicetree/bindings/mmc/sdhci-am654.yaml
new file mode 100644
index 000000000000..ac79f3adf20b
--- /dev/null
+++ b/Documentation/devicetree/bindings/mmc/sdhci-am654.yaml
@@ -0,0 +1,218 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+# Copyright (C) 2020 Texas Instruments Incorporated - http://www.ti.com/
+%YAML 1.2
+---
+$id: "http://devicetree.org/schemas/mmc/sdhci-am654.yaml#"
+$schema : "http://devicetree.org/meta-schemas/core.yaml#"
+
+title: TI AM654 MMC Controller
+
+maintainers:
+ - Ulf Hansson <ulf.hansson@linaro.org>
+
+allOf:
+ - $ref: mmc-controller.yaml#
+
+properties:
+ compatible:
+ enum:
+ - ti,am654-sdhci-5.1
+ - ti,j721e-sdhci-8bit
+ - ti,j721e-sdhci-4bit
+ - ti,j7200-sdhci-8bit
+ - ti,j721e-sdhci-4bit
+
+ reg:
+ maxItems: 2
+
+ interrupts:
+ maxItems: 1
+
+ power-domains:
+ maxItems: 1
+
+ clocks:
+ minItems: 1
+ maxItems: 2
+ description: Handles to input clocks
+
+ clock-names:
+ minItems: 1
+ maxItems: 2
+ items:
+ - const: clk_ahb
+ - const: clk_xin
+
+ # PHY output tap delays:
+ # Used to delay the data valid window and align it to the sampling clock.
+ # Binding needs to be provided for each supported speed mode otherwise the
+ # corresponding mode will be disabled.
+
+ ti,otap-del-sel-legacy:
+ description: Output tap delay for SD/MMC legacy timing
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 0
+ maximum: 0xf
+
+ ti,otap-del-sel-mmc-hs:
+ description: Output tap delay for MMC high speed timing
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 0
+ maximum: 0xf
+
+ ti,otap-del-sel-sd-hs:
+ description: Output tap delay for SD high speed timing
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 0
+ maximum: 0xf
+
+ ti,otap-del-sel-sdr12:
+ description: Output tap delay for SD UHS SDR12 timing
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 0
+ maximum: 0xf
+
+ ti,otap-del-sel-sdr25:
+ description: Output tap delay for SD UHS SDR25 timing
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 0
+ maximum: 0xf
+
+ ti,otap-del-sel-sdr50:
+ description: Output tap delay for SD UHS SDR50 timing
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 0
+ maximum: 0xf
+
+ ti,otap-del-sel-sdr104:
+ description: Output tap delay for SD UHS SDR104 timing
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 0
+ maximum: 0xf
+
+ ti,otap-del-sel-ddr50:
+ description: Output tap delay for SD UHS DDR50 timing
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 0
+ maximum: 0xf
+
+ ti,otap-del-sel-ddr52:
+ description: Output tap delay for eMMC DDR52 timing
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 0
+ maximum: 0xf
+
+ ti,otap-del-sel-hs200:
+ description: Output tap delay for eMMC HS200 timing
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 0
+ maximum: 0xf
+
+ ti,otap-del-sel-hs400:
+ description: Output tap delay for eMMC HS400 timing
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 0
+ maximum: 0xf
+
+ # PHY input tap delays:
+ # Used to delay the data valid window and align it to the sampling clock for
+ # modes that don't support tuning
+
+ ti,itap-del-sel-legacy:
+ description: Input tap delay for SD/MMC legacy timing
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 0
+ maximum: 0x1f
+
+ ti,itap-del-sel-mmc-hs:
+ description: Input tap delay for MMC high speed timing
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 0
+ maximum: 0x1f
+
+ ti,itap-del-sel-sd-hs:
+ description: Input tap delay for SD high speed timing
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 0
+ maximum: 0x1f
+
+ ti,itap-del-sel-sdr12:
+ description: Input tap delay for SD UHS SDR12 timing
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 0
+ maximum: 0x1f
+
+ ti,itap-del-sel-sdr25:
+ description: Input tap delay for SD UHS SDR25 timing
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 0
+ maximum: 0x1f
+
+ ti,itap-del-sel-ddr52:
+ description: Input tap delay for MMC DDR52 timing
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 0
+ maximum: 0x1f
+
+ ti,trm-icp:
+ description: DLL trim select
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ minimum: 0
+ maximum: 0xf
+
+ ti,driver-strength-ohm:
+ description: DLL drive strength in ohms
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+ oneOf:
+ - enum:
+ - 33
+ - 40
+ - 50
+ - 66
+ - 100
+
+ ti,strobe-sel:
+ description: strobe select delay for HS400 speed mode.
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+
+ ti,clkbuf-sel:
+ description: Clock Delay Buffer Select
+ $ref: "/schemas/types.yaml#/definitions/uint32"
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - clocks
+ - clock-names
+ - ti,otap-del-sel-legacy
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/irq.h>
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+
+ bus {
+ #address-cells = <2>;
+ #size-cells = <2>;
+
+ mmc0: mmc@4f80000 {
+ compatible = "ti,am654-sdhci-5.1";
+ reg = <0x0 0x4f80000 0x0 0x260>, <0x0 0x4f90000 0x0 0x134>;
+ power-domains = <&k3_pds 47>;
+ clocks = <&k3_clks 47 0>, <&k3_clks 47 1>;
+ clock-names = "clk_ahb", "clk_xin";
+ interrupts = <GIC_SPI 136 IRQ_TYPE_LEVEL_HIGH>;
+ sdhci-caps-mask = <0x80000007 0x0>;
+ mmc-ddr-1_8v;
+ ti,otap-del-sel-legacy = <0x0>;
+ ti,otap-del-sel-mmc-hs = <0x0>;
+ ti,otap-del-sel-ddr52 = <0x5>;
+ ti,otap-del-sel-hs200 = <0x5>;
+ ti,otap-del-sel-hs400 = <0x0>;
+ ti,itap-del-sel-legacy = <0x10>;
+ ti,itap-del-sel-mmc-hs = <0xa>;
+ ti,itap-del-sel-ddr52 = <0x3>;
+ ti,trm-icp = <0x8>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt b/Documentation/devicetree/bindings/net/renesas,ravb.txt
index 032b76f14f4f..9119f1caf391 100644
--- a/Documentation/devicetree/bindings/net/renesas,ravb.txt
+++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt
@@ -21,6 +21,7 @@ Required properties:
- "renesas,etheravb-r8a774a1" for the R8A774A1 SoC.
- "renesas,etheravb-r8a774b1" for the R8A774B1 SoC.
- "renesas,etheravb-r8a774c0" for the R8A774C0 SoC.
+ - "renesas,etheravb-r8a774e1" for the R8A774E1 SoC.
- "renesas,etheravb-r8a7795" for the R8A7795 SoC.
- "renesas,etheravb-r8a7796" for the R8A77960 SoC.
- "renesas,etheravb-r8a77961" for the R8A77961 SoC.
diff --git a/Documentation/devicetree/bindings/perf/arm,cmn.yaml b/Documentation/devicetree/bindings/perf/arm,cmn.yaml
new file mode 100644
index 000000000000..e4fcc0de25e2
--- /dev/null
+++ b/Documentation/devicetree/bindings/perf/arm,cmn.yaml
@@ -0,0 +1,57 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+# Copyright 2020 Arm Ltd.
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/perf/arm,cmn.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Arm CMN (Coherent Mesh Network) Performance Monitors
+
+maintainers:
+ - Robin Murphy <robin.murphy@arm.com>
+
+properties:
+ compatible:
+ const: arm,cmn-600
+
+ reg:
+ items:
+ - description: Physical address of the base (PERIPHBASE) and
+ size (up to 64MB) of the configuration address space.
+
+ interrupts:
+ minItems: 1
+ maxItems: 4
+ items:
+ - description: Overflow interrupt for DTC0
+ - description: Overflow interrupt for DTC1
+ - description: Overflow interrupt for DTC2
+ - description: Overflow interrupt for DTC3
+ description: One interrupt for each DTC domain implemented must
+ be specified, in order. DTC0 is always present.
+
+ arm,root-node:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description: Offset from PERIPHBASE of the configuration
+ discovery node (see TRM definition of ROOTNODEBASE).
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - arm,root-node
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ #include <dt-bindings/interrupt-controller/irq.h>
+ pmu@50000000 {
+ compatible = "arm,cmn-600";
+ reg = <0x50000000 0x4000000>;
+ /* 4x2 mesh with one DTC, and CFG node at 0,1,1,0 */
+ interrupts = <GIC_SPI 46 IRQ_TYPE_LEVEL_HIGH>;
+ arm,root-node = <0x104000>;
+ };
+...
diff --git a/Documentation/devicetree/bindings/rng/ingenic,trng.yaml b/Documentation/devicetree/bindings/rng/ingenic,trng.yaml
new file mode 100644
index 000000000000..808f247c8421
--- /dev/null
+++ b/Documentation/devicetree/bindings/rng/ingenic,trng.yaml
@@ -0,0 +1,43 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/rng/ingenic,trng.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Bindings for DTRNG in Ingenic SoCs
+
+maintainers:
+ - 周ç°æ° (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
+
+description:
+ The True Random Number Generator in Ingenic SoCs.
+
+properties:
+ compatible:
+ enum:
+ - ingenic,x1830-dtrng
+
+ reg:
+ maxItems: 1
+
+ clocks:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+ - clocks
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/clock/x1830-cgu.h>
+
+ dtrng: trng@10072000 {
+ compatible = "ingenic,x1830-dtrng";
+ reg = <0x10072000 0xc>;
+
+ clocks = <&cgu X1830_CLK_DTRNG>;
+ };
+...
diff --git a/Documentation/devicetree/bindings/rng/xiphera,xip8001b-trng.yaml b/Documentation/devicetree/bindings/rng/xiphera,xip8001b-trng.yaml
new file mode 100644
index 000000000000..1e17e55762f1
--- /dev/null
+++ b/Documentation/devicetree/bindings/rng/xiphera,xip8001b-trng.yaml
@@ -0,0 +1,33 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/rng/xiphera,xip8001b-trng.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Xiphera XIP8001B-trng bindings
+
+maintainers:
+ - Atte Tommiska <atte.tommiska@xiphera.com>
+
+description: |
+ Xiphera FPGA-based true random number generator intellectual property core.
+
+properties:
+ compatible:
+ const: xiphera,xip8001b-trng
+
+ reg:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+
+additionalProperties: false
+
+examples:
+ - |
+ rng@43c00000 {
+ compatible = "xiphera,xip8001b-trng";
+ reg = <0x43c00000 0x10000>;
+ };
diff --git a/Documentation/devicetree/bindings/timer/renesas,cmt.yaml b/Documentation/devicetree/bindings/timer/renesas,cmt.yaml
index 7e4dc5623da8..428db3a21bb9 100644
--- a/Documentation/devicetree/bindings/timer/renesas,cmt.yaml
+++ b/Documentation/devicetree/bindings/timer/renesas,cmt.yaml
@@ -39,6 +39,7 @@ properties:
- items:
- enum:
- renesas,r8a73a4-cmt0 # 32-bit CMT0 on R-Mobile APE6
+ - renesas,r8a7742-cmt0 # 32-bit CMT0 on RZ/G1H
- renesas,r8a7743-cmt0 # 32-bit CMT0 on RZ/G1M
- renesas,r8a7744-cmt0 # 32-bit CMT0 on RZ/G1N
- renesas,r8a7745-cmt0 # 32-bit CMT0 on RZ/G1E
@@ -53,6 +54,7 @@ properties:
- items:
- enum:
- renesas,r8a73a4-cmt1 # 48-bit CMT1 on R-Mobile APE6
+ - renesas,r8a7742-cmt1 # 48-bit CMT1 on RZ/G1H
- renesas,r8a7743-cmt1 # 48-bit CMT1 on RZ/G1M
- renesas,r8a7744-cmt1 # 48-bit CMT1 on RZ/G1N
- renesas,r8a7745-cmt1 # 48-bit CMT1 on RZ/G1E
@@ -69,6 +71,7 @@ properties:
- renesas,r8a774a1-cmt0 # 32-bit CMT0 on RZ/G2M
- renesas,r8a774b1-cmt0 # 32-bit CMT0 on RZ/G2N
- renesas,r8a774c0-cmt0 # 32-bit CMT0 on RZ/G2E
+ - renesas,r8a774e1-cmt0 # 32-bit CMT0 on RZ/G2H
- renesas,r8a7795-cmt0 # 32-bit CMT0 on R-Car H3
- renesas,r8a7796-cmt0 # 32-bit CMT0 on R-Car M3-W
- renesas,r8a77965-cmt0 # 32-bit CMT0 on R-Car M3-N
@@ -83,6 +86,7 @@ properties:
- renesas,r8a774a1-cmt1 # 48-bit CMT on RZ/G2M
- renesas,r8a774b1-cmt1 # 48-bit CMT on RZ/G2N
- renesas,r8a774c0-cmt1 # 48-bit CMT on RZ/G2E
+ - renesas,r8a774e1-cmt1 # 48-bit CMT on RZ/G2H
- renesas,r8a7795-cmt1 # 48-bit CMT on R-Car H3
- renesas,r8a7796-cmt1 # 48-bit CMT on R-Car M3-W
- renesas,r8a77965-cmt1 # 48-bit CMT on R-Car M3-N
diff --git a/Documentation/devicetree/bindings/trivial-devices.yaml b/Documentation/devicetree/bindings/trivial-devices.yaml
index 4ace8039840a..25c4239ebbfb 100644
--- a/Documentation/devicetree/bindings/trivial-devices.yaml
+++ b/Documentation/devicetree/bindings/trivial-devices.yaml
@@ -326,6 +326,8 @@ properties:
- silabs,si7020
# Skyworks SKY81452: Six-Channel White LED Driver with Touch Panel Bias Supply
- skyworks,sky81452
+ # Socionext SynQuacer TPM MMIO module
+ - socionext,synquacer-tpm-mmio
# i2c serial eeprom (24cxx)
- st,24c256
# Ambient Light Sensor with SMBUS/Two Wire Serial Interface
diff --git a/Documentation/devicetree/bindings/vendor-prefixes.yaml b/Documentation/devicetree/bindings/vendor-prefixes.yaml
index 63996ab03521..7d58834c5aab 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.yaml
+++ b/Documentation/devicetree/bindings/vendor-prefixes.yaml
@@ -1174,6 +1174,8 @@ patternProperties:
description: Shenzhen Xingbangda Display Technology Co., Ltd
"^xinpeng,.*":
description: Shenzhen Xinpeng Technology Co., Ltd
+ "^xiphera,.*":
+ description: Xiphera Ltd.
"^xlnx,.*":
description: Xilinx
"^xnano,.*":
diff --git a/Documentation/doc-guide/kernel-doc.rst b/Documentation/doc-guide/kernel-doc.rst
index fff6604631ea..4fd86c21397b 100644
--- a/Documentation/doc-guide/kernel-doc.rst
+++ b/Documentation/doc-guide/kernel-doc.rst
@@ -387,22 +387,23 @@ Domain`_ references.
Cross-referencing from reStructuredText
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-To cross-reference the functions and types defined in the kernel-doc comments
-from reStructuredText documents, please use the `Sphinx C Domain`_
-references. For example::
-
- See function :c:func:`foo` and struct/union/enum/typedef :c:type:`bar`.
-
-While the type reference works with just the type name, without the
-struct/union/enum/typedef part in front, you may want to use::
-
- See :c:type:`struct foo <foo>`.
- See :c:type:`union bar <bar>`.
- See :c:type:`enum baz <baz>`.
- See :c:type:`typedef meh <meh>`.
-
-This will produce prettier links, and is in line with how kernel-doc does the
-cross-references.
+No additional syntax is needed to cross-reference the functions and types
+defined in the kernel-doc comments from reStructuredText documents.
+Just end function names with ``()`` and write ``struct``, ``union``, ``enum``
+or ``typedef`` before types.
+For example::
+
+ See foo().
+ See struct foo.
+ See union bar.
+ See enum baz.
+ See typedef meh.
+
+However, if you want custom text in the cross-reference link, that can be done
+through the following syntax::
+
+ See :c:func:`my custom link text for function foo <foo>`.
+ See :c:type:`my custom link text for struct bar <bar>`.
For further details, please refer to the `Sphinx C Domain`_ documentation.
diff --git a/Documentation/doc-guide/sphinx.rst b/Documentation/doc-guide/sphinx.rst
index f71ddd592aaa..896478baf570 100644
--- a/Documentation/doc-guide/sphinx.rst
+++ b/Documentation/doc-guide/sphinx.rst
@@ -337,6 +337,23 @@ Rendered as:
- column 3
+Cross-referencing
+-----------------
+
+Cross-referencing from one documentation page to another can be done by passing
+the path to the file starting from the Documentation folder.
+For example, to cross-reference to this page (the .rst extension is optional)::
+
+ See Documentation/doc-guide/sphinx.rst.
+
+If you want to use a relative path, you need to use Sphinx's ``doc`` directive.
+For example, referencing this page from the same directory would be done as::
+
+ See :doc:`sphinx`.
+
+For information on cross-referencing to kernel-doc functions or types, see
+Documentation/doc-guide/kernel-doc.rst.
+
.. _sphinx_kfigure:
Figures & Images
diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
index 13ea0cc0a3fa..4144b669e80c 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -85,7 +85,7 @@ consider though:
- Memory mapping the contents of the DMA buffer is also supported. See the
discussion below on `CPU Access to DMA Buffer Objects`_ for the full details.
-- The DMA buffer FD is also pollable, see `Fence Poll Support`_ below for
+- The DMA buffer FD is also pollable, see `Implicit Fence Poll Support`_ below for
details.
Basic Operation and Device DMA Access
diff --git a/Documentation/driver-api/gpio/driver.rst b/Documentation/driver-api/gpio/driver.rst
index 9809f593c0ab..072a7455044e 100644
--- a/Documentation/driver-api/gpio/driver.rst
+++ b/Documentation/driver-api/gpio/driver.rst
@@ -342,12 +342,12 @@ Cascaded GPIO irqchips usually fall in one of three categories:
forced to a thread. The "fake?" raw lock can be used to work around this
problem::
- raw_spinlock_t wa_lock;
- static irqreturn_t omap_gpio_irq_handler(int irq, void *gpiobank)
- unsigned long wa_lock_flags;
- raw_spin_lock_irqsave(&bank->wa_lock, wa_lock_flags);
- generic_handle_irq(irq_find_mapping(bank->chip.irq.domain, bit));
- raw_spin_unlock_irqrestore(&bank->wa_lock, wa_lock_flags);
+ raw_spinlock_t wa_lock;
+ static irqreturn_t omap_gpio_irq_handler(int irq, void *gpiobank)
+ unsigned long wa_lock_flags;
+ raw_spin_lock_irqsave(&bank->wa_lock, wa_lock_flags);
+ generic_handle_irq(irq_find_mapping(bank->chip.irq.domain, bit));
+ raw_spin_unlock_irqrestore(&bank->wa_lock, wa_lock_flags);
- GENERIC CHAINED GPIO IRQCHIPS: these are the same as "CHAINED GPIO irqchips",
but chained IRQ handlers are not used. Instead GPIO IRQs dispatching is
diff --git a/Documentation/driver-api/nvdimm/index.rst b/Documentation/driver-api/nvdimm/index.rst
index a4f8f98aeb94..5863bd04f056 100644
--- a/Documentation/driver-api/nvdimm/index.rst
+++ b/Documentation/driver-api/nvdimm/index.rst
@@ -10,3 +10,4 @@ Non-Volatile Memory Device (NVDIMM)
nvdimm
btt
security
+ firmware-activate
diff --git a/Documentation/driver-api/soundwire/stream.rst b/Documentation/driver-api/soundwire/stream.rst
index 8858cea7bfe0..b432a2de45d3 100644
--- a/Documentation/driver-api/soundwire/stream.rst
+++ b/Documentation/driver-api/soundwire/stream.rst
@@ -518,10 +518,10 @@ typically called during a dailink .shutdown() callback, which clears
the stream pointer for all DAIS connected to a stream and releases the
memory allocated for the stream.
- Not Supported
+Not Supported
=============
1. A single port with multiple channels supported cannot be used between two
-streams or across stream. For example a port with 4 channels cannot be used
-to handle 2 independent stereo streams even though it's possible in theory
-in SoundWire.
+ streams or across stream. For example a port with 4 channels cannot be used
+ to handle 2 independent stereo streams even though it's possible in theory
+ in SoundWire.
diff --git a/Documentation/fb/fbcon.rst b/Documentation/fb/fbcon.rst
index e57a3d1d085a..328f6980698c 100644
--- a/Documentation/fb/fbcon.rst
+++ b/Documentation/fb/fbcon.rst
@@ -87,15 +87,8 @@ C. Boot options
Note, not all drivers can handle font with widths not divisible by 8,
such as vga16fb.
-2. fbcon=scrollback:<value>[k]
- The scrollback buffer is memory that is used to preserve display
- contents that has already scrolled past your view. This is accessed
- by using the Shift-PageUp key combination. The value 'value' is any
- integer. It defaults to 32KB. The 'k' suffix is optional, and will
- multiply the 'value' by 1024.
-
-3. fbcon=map:<0123>
+2. fbcon=map:<0123>
This is an interesting option. It tells which driver gets mapped to
which console. The value '0123' is a sequence that gets repeated until
@@ -116,7 +109,7 @@ C. Boot options
Later on, when you want to map the console the to the framebuffer
device, you can use the con2fbmap utility.
-4. fbcon=vc:<n1>-<n2>
+3. fbcon=vc:<n1>-<n2>
This option tells fbcon to take over only a range of consoles as
specified by the values 'n1' and 'n2'. The rest of the consoles
@@ -127,7 +120,7 @@ C. Boot options
is typically located on the same video card. Thus, the consoles that
are controlled by the VGA console will be garbled.
-5. fbcon=rotate:<n>
+4. fbcon=rotate:<n>
This option changes the orientation angle of the console display. The
value 'n' accepts the following:
@@ -152,21 +145,21 @@ C. Boot options
Actually, the underlying fb driver is totally ignorant of console
rotation.
-6. fbcon=margin:<color>
+5. fbcon=margin:<color>
This option specifies the color of the margins. The margins are the
leftover area at the right and the bottom of the screen that are not
used by text. By default, this area will be black. The 'color' value
is an integer number that depends on the framebuffer driver being used.
-7. fbcon=nodefer
+6. fbcon=nodefer
If the kernel is compiled with deferred fbcon takeover support, normally
the framebuffer contents, left in place by the firmware/bootloader, will
be preserved until there actually is some text is output to the console.
This option causes fbcon to bind immediately to the fbdev device.
-8. fbcon=logo-pos:<location>
+7. fbcon=logo-pos:<location>
The only possible 'location' is 'center' (without quotes), and when
given, the bootup logo is moved from the default top-left corner
@@ -174,7 +167,7 @@ C. Boot options
displayed due to multiple CPUs, the collected line of logos is moved
as a whole.
-9. fbcon=logo-count:<n>
+8. fbcon=logo-count:<n>
The value 'n' overrides the number of bootup logos. 0 disables the
logo, and -1 gives the default which is the number of online CPUs.
diff --git a/Documentation/fb/matroxfb.rst b/Documentation/fb/matroxfb.rst
index f1859d98606e..6158c49c8571 100644
--- a/Documentation/fb/matroxfb.rst
+++ b/Documentation/fb/matroxfb.rst
@@ -317,8 +317,6 @@ Currently there are following known bugs:
- interlaced text mode is not supported; it looks like hardware limitation,
but I'm not sure.
- Gxx0 SGRAM/SDRAM is not autodetected.
- - If you are using more than one framebuffer device, you must boot kernel
- with 'video=scrollback:0'.
- maybe more...
And following misfeatures:
diff --git a/Documentation/fb/sstfb.rst b/Documentation/fb/sstfb.rst
index 8e8c1b940359..42466ff49c58 100644
--- a/Documentation/fb/sstfb.rst
+++ b/Documentation/fb/sstfb.rst
@@ -185,9 +185,6 @@ Bugs
contact me.
- The 24/32 is not likely to work anytime soon, knowing that the
hardware does ... unusual things in 24/32 bpp.
-- When used with another video board, current limitations of the linux
- console subsystem can cause some troubles, specifically, you should
- disable software scrollback, as it can oops badly ...
Todo
====
diff --git a/Documentation/fb/vesafb.rst b/Documentation/fb/vesafb.rst
index 6821c87b7893..f890a4f5623b 100644
--- a/Documentation/fb/vesafb.rst
+++ b/Documentation/fb/vesafb.rst
@@ -135,8 +135,6 @@ ypan enable display panning using the VESA protected mode
* scrolling (fullscreen) is fast, because there is
no need to copy around data.
- * You'll get scrollback (the Shift-PgUp thing),
- the video memory can be used as scrollback buffer
kontra:
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 4c536e66dc4c..98f59a864242 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -34,8 +34,6 @@ algorithms work.
quota
seq_file
sharedsubtree
- sysfs-pci
- sysfs-tagging
automount-support
diff --git a/Documentation/filesystems/mount_api.rst b/Documentation/filesystems/mount_api.rst
index 29c169c68961..d7f53d62b5bb 100644
--- a/Documentation/filesystems/mount_api.rst
+++ b/Documentation/filesystems/mount_api.rst
@@ -1,7 +1,7 @@
.. SPDX-License-Identifier: GPL-2.0
====================
-fILESYSTEM Mount API
+Filesystem Mount API
====================
.. CONTENTS
@@ -479,7 +479,7 @@ returned.
int vfs_parse_fs_param(struct fs_context *fc,
struct fs_parameter *param);
- Supply a single mount parameter to the filesystem context. This include
+ Supply a single mount parameter to the filesystem context. This includes
the specification of the source/device which is specified as the "source"
parameter (which may be specified multiple times if the filesystem
supports that).
@@ -592,8 +592,7 @@ The following helpers all wrap sget_fc():
one.
-=====================
-PARAMETER DESCRIPTION
+Parameter Description
=====================
Parameters are described using structures defined in linux/fs_parser.h.
diff --git a/Documentation/filesystems/seq_file.rst b/Documentation/filesystems/seq_file.rst
index 7f7ee06b2693..56856481dc8d 100644
--- a/Documentation/filesystems/seq_file.rst
+++ b/Documentation/filesystems/seq_file.rst
@@ -129,7 +129,9 @@ also a special value which can be returned by the start() function
called SEQ_START_TOKEN; it can be used if you wish to instruct your
show() function (described below) to print a header at the top of the
output. SEQ_START_TOKEN should only be used if the offset is zero,
-however.
+however. SEQ_START_TOKEN has no special meaning to the core seq_file
+code. It is provided as a convenience for a start() funciton to
+communicate with the next() and show() functions.
The next function to implement is called, amazingly, next(); its job is to
move the iterator forward to the next position in the sequence. The
@@ -145,6 +147,22 @@ complete. Here's the example version::
return spos;
}
+The next() function should set ``*pos`` to a value that start() can use
+to find the new location in the sequence. When the iterator is being
+stored in the private data area, rather than being reinitialized on each
+start(), it might seem sufficient to simply set ``*pos`` to any non-zero
+value (zero always tells start() to restart the sequence). This is not
+sufficient due to historical problems.
+
+Historically, many next() functions have *not* updated ``*pos`` at
+end-of-file. If the value is then used by start() to initialise the
+iterator, this can result in corner cases where the last entry in the
+sequence is reported twice in the file. In order to discourage this bug
+from being resurrected, the core seq_file code now produces a warning if
+a next() function does not change the value of ``*pos``. Consequently a
+next() function *must* change the value of ``*pos``, and of course must
+set it to a non-zero value.
+
The stop() function closes a session; its job, of course, is to clean
up. If dynamic memory is allocated for the iterator, stop() is the
place to free it; if a lock was taken by start(), stop() must release
diff --git a/Documentation/filesystems/sysfs.rst b/Documentation/filesystems/sysfs.rst
index ab0f7795792b..5a3209a4cebf 100644
--- a/Documentation/filesystems/sysfs.rst
+++ b/Documentation/filesystems/sysfs.rst
@@ -172,14 +172,13 @@ calls the associated methods.
To illustrate::
- #define to_dev(obj) container_of(obj, struct device, kobj)
#define to_dev_attr(_attr) container_of(_attr, struct device_attribute, attr)
static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr,
char *buf)
{
struct device_attribute *dev_attr = to_dev_attr(attr);
- struct device *dev = to_dev(kobj);
+ struct device *dev = kobj_to_dev(kobj);
ssize_t ret = -EIO;
if (dev_attr->show)
diff --git a/Documentation/filesystems/ubifs-authentication.rst b/Documentation/filesystems/ubifs-authentication.rst
index 1f39c8cea702..5210aed2afbc 100644
--- a/Documentation/filesystems/ubifs-authentication.rst
+++ b/Documentation/filesystems/ubifs-authentication.rst
@@ -1,11 +1,13 @@
.. SPDX-License-Identifier: GPL-2.0
-:orphan:
-
.. UBIFS Authentication
.. sigma star gmbh
.. 2018
+============================
+UBIFS Authentication Support
+============================
+
Introduction
============
diff --git a/Documentation/firmware-guide/acpi/index.rst b/Documentation/firmware-guide/acpi/index.rst
index ad3b5afdae77..f72b5f1769fb 100644
--- a/Documentation/firmware-guide/acpi/index.rst
+++ b/Documentation/firmware-guide/acpi/index.rst
@@ -26,3 +26,4 @@ ACPI Support
lpit
video_extension
extcon-intel-int3496
+ intel-pmc-mux
diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
index 750d3a975d82..77a1ae975037 100644
--- a/Documentation/hwmon/index.rst
+++ b/Documentation/hwmon/index.rst
@@ -158,6 +158,7 @@ Hardware Monitoring Kernel Drivers
smsc47b397
smsc47m192
smsc47m1
+ sparx5-temp
tc654
tc74
thmc50
diff --git a/Documentation/ia64/index.rst b/Documentation/ia64/index.rst
index 0436e1034115..4bdfe28067ee 100644
--- a/Documentation/ia64/index.rst
+++ b/Documentation/ia64/index.rst
@@ -15,4 +15,3 @@ IA-64 Architecture
irq-redir
mca
serial
- xen
diff --git a/Documentation/ia64/xen.rst b/Documentation/ia64/xen.rst
deleted file mode 100644
index 831339c74441..000000000000
--- a/Documentation/ia64/xen.rst
+++ /dev/null
@@ -1,206 +0,0 @@
-********************************************************
-Recipe for getting/building/running Xen/ia64 with pv_ops
-********************************************************
-This recipe describes how to get xen-ia64 source and build it,
-and run domU with pv_ops.
-
-Requirements
-============
-
- - python
- - mercurial
- it (aka "hg") is an open-source source code
- management software. See the below.
- http://www.selenic.com/mercurial/wiki/
- - git
- - bridge-utils
-
-Getting and Building Xen and Dom0
-=================================
-
- My environment is:
-
- - Machine : Tiger4
- - Domain0 OS : RHEL5
- - DomainU OS : RHEL5
-
- 1. Download source::
-
- # hg clone http://xenbits.xensource.com/ext/ia64/xen-unstable.hg
- # cd xen-unstable.hg
- # hg clone http://xenbits.xensource.com/ext/ia64/linux-2.6.18-xen.hg
-
- 2. # make world
-
- 3. # make install-tools
-
- 4. copy kernels and xen::
-
- # cp xen/xen.gz /boot/efi/efi/redhat/
- # cp build-linux-2.6.18-xen_ia64/vmlinux.gz \
- /boot/efi/efi/redhat/vmlinuz-2.6.18.8-xen
-
- 5. make initrd for Dom0/DomU::
-
- # make -C linux-2.6.18-xen.hg ARCH=ia64 modules_install \
- O=$(pwd)/build-linux-2.6.18-xen_ia64
- # mkinitrd -f /boot/efi/efi/redhat/initrd-2.6.18.8-xen.img \
- 2.6.18.8-xen --builtin mptspi --builtin mptbase \
- --builtin mptscsih --builtin uhci-hcd --builtin ohci-hcd \
- --builtin ehci-hcd
-
-Making a disk image for guest OS
-================================
-
- 1. make file::
-
- # dd if=/dev/zero of=/root/rhel5.img bs=1M seek=4096 count=0
- # mke2fs -F -j /root/rhel5.img
- # mount -o loop /root/rhel5.img /mnt
- # cp -ax /{dev,var,etc,usr,bin,sbin,lib} /mnt
- # mkdir /mnt/{root,proc,sys,home,tmp}
-
- Note: You may miss some device files. If so, please create them
- with mknod. Or you can use tar instead of cp.
-
- 2. modify DomU's fstab::
-
- # vi /mnt/etc/fstab
- /dev/xvda1 / ext3 defaults 1 1
- none /dev/pts devpts gid=5,mode=620 0 0
- none /dev/shm tmpfs defaults 0 0
- none /proc proc defaults 0 0
- none /sys sysfs defaults 0 0
-
- 3. modify inittab
-
- set runlevel to 3 to avoid X trying to start::
-
- # vi /mnt/etc/inittab
- id:3:initdefault:
-
- Start a getty on the hvc0 console::
-
- X0:2345:respawn:/sbin/mingetty hvc0
-
- tty1-6 mingetty can be commented out
-
- 4. add hvc0 into /etc/securetty::
-
- # vi /mnt/etc/securetty (add hvc0)
-
- 5. umount::
-
- # umount /mnt
-
-FYI, virt-manager can also make a disk image for guest OS.
-It's GUI tools and easy to make it.
-
-Boot Xen & Domain0
-==================
-
- 1. replace elilo
- elilo of RHEL5 can boot Xen and Dom0.
- If you use old elilo (e.g RHEL4), please download from the below
- http://elilo.sourceforge.net/cgi-bin/blosxom
- and copy into /boot/efi/efi/redhat/::
-
- # cp elilo-3.6-ia64.efi /boot/efi/efi/redhat/elilo.efi
-
- 2. modify elilo.conf (like the below)::
-
- # vi /boot/efi/efi/redhat/elilo.conf
- prompt
- timeout=20
- default=xen
- relocatable
-
- image=vmlinuz-2.6.18.8-xen
- label=xen
- vmm=xen.gz
- initrd=initrd-2.6.18.8-xen.img
- read-only
- append=" -- rhgb root=/dev/sda2"
-
-The append options before "--" are for xen hypervisor,
-the options after "--" are for dom0.
-
-FYI, your machine may need console options like
-"com1=19200,8n1 console=vga,com1". For example,
-append="com1=19200,8n1 console=vga,com1 -- rhgb console=tty0 \
-console=ttyS0 root=/dev/sda2"
-
-Getting and Building domU with pv_ops
-=====================================
-
- 1. get pv_ops tree::
-
- # git clone http://people.valinux.co.jp/~yamahata/xen-ia64/linux-2.6-xen-ia64.git/
-
- 2. git branch (if necessary)::
-
- # cd linux-2.6-xen-ia64/
- # git checkout -b your_branch origin/xen-ia64-domu-minimal-2008may19
-
- Note:
- The current branch is xen-ia64-domu-minimal-2008may19.
- But you would find the new branch. You can see with
- "git branch -r" to get the branch lists.
-
- http://people.valinux.co.jp/~yamahata/xen-ia64/for_eagl/linux-2.6-ia64-pv-ops.git/
-
- is also available.
-
- The tree is based on
-
- git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 test)
-
- 3. copy .config for pv_ops of domU::
-
- # cp arch/ia64/configs/xen_domu_wip_defconfig .config
-
- 4. make kernel with pv_ops::
-
- # make oldconfig
- # make
-
- 5. install the kernel and initrd::
-
- # cp vmlinux.gz /boot/efi/efi/redhat/vmlinuz-2.6-pv_ops-xenU
- # make modules_install
- # mkinitrd -f /boot/efi/efi/redhat/initrd-2.6-pv_ops-xenU.img \
- 2.6.26-rc3xen-ia64-08941-g1b12161 --builtin mptspi \
- --builtin mptbase --builtin mptscsih --builtin uhci-hcd \
- --builtin ohci-hcd --builtin ehci-hcd
-
-Boot DomainU with pv_ops
-========================
-
- 1. make config of DomU::
-
- # vi /etc/xen/rhel5
- kernel = "/boot/efi/efi/redhat/vmlinuz-2.6-pv_ops-xenU"
- ramdisk = "/boot/efi/efi/redhat/initrd-2.6-pv_ops-xenU.img"
- vcpus = 1
- memory = 512
- name = "rhel5"
- disk = [ 'file:/root/rhel5.img,xvda1,w' ]
- root = "/dev/xvda1 ro"
- extra= "rhgb console=hvc0"
-
- 2. After boot xen and dom0, start xend::
-
- # /etc/init.d/xend start
-
- ( In the debugging case, `# XEND_DEBUG=1 xend trace_start` )
-
- 3. start domU::
-
- # xm create -c rhel5
-
-Reference
-=========
-- Wiki of Xen/IA64 upstream merge
- http://wiki.xensource.com/xenwiki/XenIA64/UpstreamMerge
-
-Written by Akio Takebe <takebe_akio@jp.fujitsu.com> on 28 May 2008
diff --git a/Documentation/iio/iio_configfs.rst b/Documentation/iio/iio_configfs.rst
index 6e38cbbd2981..3a5d76f9e2b9 100644
--- a/Documentation/iio/iio_configfs.rst
+++ b/Documentation/iio/iio_configfs.rst
@@ -53,7 +53,7 @@ kernel module following the interface in include/linux/iio/sw_trigger.h::
*/
}
- static int iio_trig_hrtimer_remove(struct iio_sw_trigger *swt)
+ static int iio_trig_sample_remove(struct iio_sw_trigger *swt)
{
/*
* This undoes the actions in iio_trig_sample_probe
diff --git a/Documentation/kbuild/llvm.rst b/Documentation/kbuild/llvm.rst
index dae90c21aed3..cf3ca236d2cc 100644
--- a/Documentation/kbuild/llvm.rst
+++ b/Documentation/kbuild/llvm.rst
@@ -1,3 +1,5 @@
+.. _kbuild_llvm:
+
==============================
Building Linux with Clang/LLVM
==============================
@@ -73,6 +75,8 @@ Getting Help
- `Wiki <https://github.com/ClangBuiltLinux/linux/wiki>`_
- `Beginner Bugs <https://github.com/ClangBuiltLinux/linux/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22>`_
+.. _getting_llvm:
+
Getting LLVM
-------------
diff --git a/Documentation/locking/lockdep-design.rst b/Documentation/locking/lockdep-design.rst
index 23fcbc4d3fc0..cec03bd1294a 100644
--- a/Documentation/locking/lockdep-design.rst
+++ b/Documentation/locking/lockdep-design.rst
@@ -392,3 +392,261 @@ Run the command and save the output, then compare against the output from
a later run of this command to identify the leakers. This same output
can also help you find situations where runtime lock initialization has
been omitted.
+
+Recursive read locks:
+---------------------
+The whole of the rest document tries to prove a certain type of cycle is equivalent
+to deadlock possibility.
+
+There are three types of lockers: writers (i.e. exclusive lockers, like
+spin_lock() or write_lock()), non-recursive readers (i.e. shared lockers, like
+down_read()) and recursive readers (recursive shared lockers, like rcu_read_lock()).
+And we use the following notations of those lockers in the rest of the document:
+
+ W or E: stands for writers (exclusive lockers).
+ r: stands for non-recursive readers.
+ R: stands for recursive readers.
+ S: stands for all readers (non-recursive + recursive), as both are shared lockers.
+ N: stands for writers and non-recursive readers, as both are not recursive.
+
+Obviously, N is "r or W" and S is "r or R".
+
+Recursive readers, as their name indicates, are the lockers allowed to acquire
+even inside the critical section of another reader of the same lock instance,
+in other words, allowing nested read-side critical sections of one lock instance.
+
+While non-recursive readers will cause a self deadlock if trying to acquire inside
+the critical section of another reader of the same lock instance.
+
+The difference between recursive readers and non-recursive readers is because:
+recursive readers get blocked only by a write lock *holder*, while non-recursive
+readers could get blocked by a write lock *waiter*. Considering the follow example:
+
+ TASK A: TASK B:
+
+ read_lock(X);
+ write_lock(X);
+ read_lock_2(X);
+
+Task A gets the reader (no matter whether recursive or non-recursive) on X via
+read_lock() first. And when task B tries to acquire writer on X, it will block
+and become a waiter for writer on X. Now if read_lock_2() is recursive readers,
+task A will make progress, because writer waiters don't block recursive readers,
+and there is no deadlock. However, if read_lock_2() is non-recursive readers,
+it will get blocked by writer waiter B, and cause a self deadlock.
+
+Block conditions on readers/writers of the same lock instance:
+--------------------------------------------------------------
+There are simply four block conditions:
+
+1. Writers block other writers.
+2. Readers block writers.
+3. Writers block both recursive readers and non-recursive readers.
+4. And readers (recursive or not) don't block other recursive readers but
+ may block non-recursive readers (because of the potential co-existing
+ writer waiters)
+
+Block condition matrix, Y means the row blocks the column, and N means otherwise.
+
+ | E | r | R |
+ +---+---+---+---+
+ E | Y | Y | Y |
+ +---+---+---+---+
+ r | Y | Y | N |
+ +---+---+---+---+
+ R | Y | Y | N |
+
+ (W: writers, r: non-recursive readers, R: recursive readers)
+
+
+acquired recursively. Unlike non-recursive read locks, recursive read locks
+only get blocked by current write lock *holders* other than write lock
+*waiters*, for example:
+
+ TASK A: TASK B:
+
+ read_lock(X);
+
+ write_lock(X);
+
+ read_lock(X);
+
+is not a deadlock for recursive read locks, as while the task B is waiting for
+the lock X, the second read_lock() doesn't need to wait because it's a recursive
+read lock. However if the read_lock() is non-recursive read lock, then the above
+case is a deadlock, because even if the write_lock() in TASK B cannot get the
+lock, but it can block the second read_lock() in TASK A.
+
+Note that a lock can be a write lock (exclusive lock), a non-recursive read
+lock (non-recursive shared lock) or a recursive read lock (recursive shared
+lock), depending on the lock operations used to acquire it (more specifically,
+the value of the 'read' parameter for lock_acquire()). In other words, a single
+lock instance has three types of acquisition depending on the acquisition
+functions: exclusive, non-recursive read, and recursive read.
+
+To be concise, we call that write locks and non-recursive read locks as
+"non-recursive" locks and recursive read locks as "recursive" locks.
+
+Recursive locks don't block each other, while non-recursive locks do (this is
+even true for two non-recursive read locks). A non-recursive lock can block the
+corresponding recursive lock, and vice versa.
+
+A deadlock case with recursive locks involved is as follow:
+
+ TASK A: TASK B:
+
+ read_lock(X);
+ read_lock(Y);
+ write_lock(Y);
+ write_lock(X);
+
+Task A is waiting for task B to read_unlock() Y and task B is waiting for task
+A to read_unlock() X.
+
+Dependency types and strong dependency paths:
+---------------------------------------------
+Lock dependencies record the orders of the acquisitions of a pair of locks, and
+because there are 3 types for lockers, there are, in theory, 9 types of lock
+dependencies, but we can show that 4 types of lock dependencies are enough for
+deadlock detection.
+
+For each lock dependency:
+
+ L1 -> L2
+
+, which means lockdep has seen L1 held before L2 held in the same context at runtime.
+And in deadlock detection, we care whether we could get blocked on L2 with L1 held,
+IOW, whether there is a locker L3 that L1 blocks L3 and L2 gets blocked by L3. So
+we only care about 1) what L1 blocks and 2) what blocks L2. As a result, we can combine
+recursive readers and non-recursive readers for L1 (as they block the same types) and
+we can combine writers and non-recursive readers for L2 (as they get blocked by the
+same types).
+
+With the above combination for simplification, there are 4 types of dependency edges
+in the lockdep graph:
+
+1) -(ER)->: exclusive writer to recursive reader dependency, "X -(ER)-> Y" means
+ X -> Y and X is a writer and Y is a recursive reader.
+
+2) -(EN)->: exclusive writer to non-recursive locker dependency, "X -(EN)-> Y" means
+ X -> Y and X is a writer and Y is either a writer or non-recursive reader.
+
+3) -(SR)->: shared reader to recursive reader dependency, "X -(SR)-> Y" means
+ X -> Y and X is a reader (recursive or not) and Y is a recursive reader.
+
+4) -(SN)->: shared reader to non-recursive locker dependency, "X -(SN)-> Y" means
+ X -> Y and X is a reader (recursive or not) and Y is either a writer or
+ non-recursive reader.
+
+Note that given two locks, they may have multiple dependencies between them, for example:
+
+ TASK A:
+
+ read_lock(X);
+ write_lock(Y);
+ ...
+
+ TASK B:
+
+ write_lock(X);
+ write_lock(Y);
+
+, we have both X -(SN)-> Y and X -(EN)-> Y in the dependency graph.
+
+We use -(xN)-> to represent edges that are either -(EN)-> or -(SN)->, the
+similar for -(Ex)->, -(xR)-> and -(Sx)->
+
+A "path" is a series of conjunct dependency edges in the graph. And we define a
+"strong" path, which indicates the strong dependency throughout each dependency
+in the path, as the path that doesn't have two conjunct edges (dependencies) as
+-(xR)-> and -(Sx)->. In other words, a "strong" path is a path from a lock
+walking to another through the lock dependencies, and if X -> Y -> Z is in the
+path (where X, Y, Z are locks), and the walk from X to Y is through a -(SR)-> or
+-(ER)-> dependency, the walk from Y to Z must not be through a -(SN)-> or
+-(SR)-> dependency.
+
+We will see why the path is called "strong" in next section.
+
+Recursive Read Deadlock Detection:
+----------------------------------
+
+We now prove two things:
+
+Lemma 1:
+
+If there is a closed strong path (i.e. a strong circle), then there is a
+combination of locking sequences that causes deadlock. I.e. a strong circle is
+sufficient for deadlock detection.
+
+Lemma 2:
+
+If there is no closed strong path (i.e. strong circle), then there is no
+combination of locking sequences that could cause deadlock. I.e. strong
+circles are necessary for deadlock detection.
+
+With these two Lemmas, we can easily say a closed strong path is both sufficient
+and necessary for deadlocks, therefore a closed strong path is equivalent to
+deadlock possibility. As a closed strong path stands for a dependency chain that
+could cause deadlocks, so we call it "strong", considering there are dependency
+circles that won't cause deadlocks.
+
+Proof for sufficiency (Lemma 1):
+
+Let's say we have a strong circle:
+
+ L1 -> L2 ... -> Ln -> L1
+
+, which means we have dependencies:
+
+ L1 -> L2
+ L2 -> L3
+ ...
+ Ln-1 -> Ln
+ Ln -> L1
+
+We now can construct a combination of locking sequences that cause deadlock:
+
+Firstly let's make one CPU/task get the L1 in L1 -> L2, and then another get
+the L2 in L2 -> L3, and so on. After this, all of the Lx in Lx -> Lx+1 are
+held by different CPU/tasks.
+
+And then because we have L1 -> L2, so the holder of L1 is going to acquire L2
+in L1 -> L2, however since L2 is already held by another CPU/task, plus L1 ->
+L2 and L2 -> L3 are not -(xR)-> and -(Sx)-> (the definition of strong), which
+means either L2 in L1 -> L2 is a non-recursive locker (blocked by anyone) or
+the L2 in L2 -> L3, is writer (blocking anyone), therefore the holder of L1
+cannot get L2, it has to wait L2's holder to release.
+
+Moreover, we can have a similar conclusion for L2's holder: it has to wait L3's
+holder to release, and so on. We now can prove that Lx's holder has to wait for
+Lx+1's holder to release, and note that Ln+1 is L1, so we have a circular
+waiting scenario and nobody can get progress, therefore a deadlock.
+
+Proof for necessary (Lemma 2):
+
+Lemma 2 is equivalent to: If there is a deadlock scenario, then there must be a
+strong circle in the dependency graph.
+
+According to Wikipedia[1], if there is a deadlock, then there must be a circular
+waiting scenario, means there are N CPU/tasks, where CPU/task P1 is waiting for
+a lock held by P2, and P2 is waiting for a lock held by P3, ... and Pn is waiting
+for a lock held by P1. Let's name the lock Px is waiting as Lx, so since P1 is waiting
+for L1 and holding Ln, so we will have Ln -> L1 in the dependency graph. Similarly,
+we have L1 -> L2, L2 -> L3, ..., Ln-1 -> Ln in the dependency graph, which means we
+have a circle:
+
+ Ln -> L1 -> L2 -> ... -> Ln
+
+, and now let's prove the circle is strong:
+
+For a lock Lx, Px contributes the dependency Lx-1 -> Lx and Px+1 contributes
+the dependency Lx -> Lx+1, and since Px is waiting for Px+1 to release Lx,
+so it's impossible that Lx on Px+1 is a reader and Lx on Px is a recursive
+reader, because readers (no matter recursive or not) don't block recursive
+readers, therefore Lx-1 -> Lx and Lx -> Lx+1 cannot be a -(xR)-> -(Sx)-> pair,
+and this is true for any lock in the circle, therefore, the circle is strong.
+
+References:
+-----------
+[1]: https://en.wikipedia.org/wiki/Deadlock
+[2]: Shibu, K. (2009). Intro To Embedded Systems (1st ed.). Tata McGraw-Hill
diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst
index 62c5ad98c11c..a334b584f2b3 100644
--- a/Documentation/locking/seqlock.rst
+++ b/Documentation/locking/seqlock.rst
@@ -139,6 +139,24 @@ with the associated LOCKTYPE lock acquired.
Read path: same as in :ref:`seqcount_t`.
+
+.. _seqcount_latch_t:
+
+Latch sequence counters (``seqcount_latch_t``)
+----------------------------------------------
+
+Latch sequence counters are a multiversion concurrency control mechanism
+where the embedded seqcount_t counter even/odd value is used to switch
+between two copies of protected data. This allows the sequence counter
+read path to safely interrupt its own write side critical section.
+
+Use seqcount_latch_t when the write side sections cannot be protected
+from interruption by readers. This is typically the case when the read
+side can be invoked from NMI handlers.
+
+Check `raw_write_seqcount_latch()` for more information.
+
+
.. _seqlock_t:
Sequential locks (``seqlock_t``)
diff --git a/Documentation/maintainer/index.rst b/Documentation/maintainer/index.rst
index d904e74e1159..f0a60435b124 100644
--- a/Documentation/maintainer/index.rst
+++ b/Documentation/maintainer/index.rst
@@ -13,4 +13,5 @@ additions to this manual.
rebasing-and-merging
pull-requests
maintainer-entry-profile
+ modifying-patches
diff --git a/Documentation/maintainer/modifying-patches.rst b/Documentation/maintainer/modifying-patches.rst
new file mode 100644
index 000000000000..58385d2e8065
--- /dev/null
+++ b/Documentation/maintainer/modifying-patches.rst
@@ -0,0 +1,50 @@
+.. _modifyingpatches:
+
+Modifying Patches
+=================
+
+If you are a subsystem or branch maintainer, sometimes you need to slightly
+modify patches you receive in order to merge them, because the code is not
+exactly the same in your tree and the submitters'. If you stick strictly to
+rule (c) of the developers certificate of origin, you should ask the submitter
+to rediff, but this is a totally counter-productive waste of time and energy.
+Rule (b) allows you to adjust the code, but then it is very impolite to change
+one submitters code and make him endorse your bugs. To solve this problem, it
+is recommended that you add a line between the last Signed-off-by header and
+yours, indicating the nature of your changes. While there is nothing mandatory
+about this, it seems like prepending the description with your mail and/or
+name, all enclosed in square brackets, is noticeable enough to make it obvious
+that you are responsible for last-minute changes. Example::
+
+ Signed-off-by: Random J Developer <random@developer.example.org>
+ [lucky@maintainer.example.org: struct foo moved from foo.c to foo.h]
+ Signed-off-by: Lucky K Maintainer <lucky@maintainer.example.org>
+
+This practice is particularly helpful if you maintain a stable branch and
+want at the same time to credit the author, track changes, merge the fix,
+and protect the submitter from complaints. Note that under no circumstances
+can you change the author's identity (the From header), as it is the one
+which appears in the changelog.
+
+Special note to back-porters: It seems to be a common and useful practice
+to insert an indication of the origin of a patch at the top of the commit
+message (just after the subject line) to facilitate tracking. For instance,
+here's what we see in a 3.x-stable release::
+
+ Date: Tue Oct 7 07:26:38 2014 -0400
+
+ libata: Un-break ATA blacklist
+
+ commit 1c40279960bcd7d52dbdf1d466b20d24b99176c8 upstream.
+
+And here's what might appear in an older kernel once a patch is backported::
+
+ Date: Tue May 13 22:12:27 2008 +0200
+
+ wireless, airo: waitbusy() won't delay
+
+ [backport of 2.6 commit b7acbdfbd1f277c1eb23f344f899cfa4cd0bf36a]
+
+Whatever the format, this information provides a valuable help to people
+tracking your trees, and to people trying to troubleshoot bugs in your
+tree.
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 96186332e5f4..17c8e0c2deb4 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -546,8 +546,8 @@ There are certain things that the Linux kernel memory barriers do not guarantee:
[*] For information on bus mastering DMA and coherency please read:
Documentation/driver-api/pci/pci.rst
- Documentation/DMA-API-HOWTO.txt
- Documentation/DMA-API.txt
+ Documentation/core-api/dma-api-howto.rst
+ Documentation/core-api/dma-api.rst
DATA DEPENDENCY BARRIERS (HISTORICAL)
@@ -1932,8 +1932,8 @@ There are some more advanced barrier functions:
here.
See the subsection "Kernel I/O barrier effects" for more information on
- relaxed I/O accessors and the Documentation/DMA-API.txt file for more
- information on consistent memory.
+ relaxed I/O accessors and the Documentation/core-api/dma-api.rst file for
+ more information on consistent memory.
(*) pmem_wmb();
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index c29496fff81c..611e4b130c1e 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -95,6 +95,7 @@ Contents:
seg6-sysctl
strparser
switchdev
+ sysfs-tagging
tc-actions-env-rules
tcp-thin
team
diff --git a/Documentation/filesystems/sysfs-tagging.rst b/Documentation/networking/sysfs-tagging.rst
index 83647e10c207..83647e10c207 100644
--- a/Documentation/filesystems/sysfs-tagging.rst
+++ b/Documentation/networking/sysfs-tagging.rst
diff --git a/Documentation/process/2.Process.rst b/Documentation/process/2.Process.rst
index 4ae1e0f600c1..e05fb1b8f8b6 100644
--- a/Documentation/process/2.Process.rst
+++ b/Documentation/process/2.Process.rst
@@ -405,7 +405,7 @@ be found at:
http://vger.kernel.org/vger-lists.html
There are lists hosted elsewhere, though; a number of them are at
-lists.redhat.com.
+redhat.com/mailman/listinfo.
The core mailing list for kernel development is, of course, linux-kernel.
This list is an intimidating place to be; volume can reach 500 messages per
diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst
index ee741763a3fc..dac17711dc11 100644
--- a/Documentation/process/changes.rst
+++ b/Documentation/process/changes.rst
@@ -30,6 +30,7 @@ you probably needn't concern yourself with pcmciautils.
Program Minimal version Command to check the version
====================== =============== ========================================
GNU C 4.9 gcc --version
+Clang/LLVM (optional) 10.0.1 clang --version
GNU make 3.81 make --version
binutils 2.23 ld -v
flex 2.5.35 flex --version
@@ -68,6 +69,15 @@ GCC
The gcc version requirements may vary depending on the type of CPU in your
computer.
+Clang/LLVM (optional)
+---------------------
+
+The latest formal release of clang and LLVM utils (according to
+`releases.llvm.org <https://releases.llvm.org>`_) are supported for building
+kernels. Older releases aren't guaranteed to work, and we may drop workarounds
+from the kernel that were used to support older versions. Please see additional
+docs on :ref:`Building Linux with Clang/LLVM <kbuild_llvm>`.
+
Make
----
@@ -331,6 +341,11 @@ gcc
- <ftp://ftp.gnu.org/gnu/gcc/>
+Clang/LLVM
+----------
+
+- :ref:`Getting LLVM <getting_llvm>`.
+
Make
----
diff --git a/Documentation/process/deprecated.rst b/Documentation/process/deprecated.rst
index 918e32d76fc4..ff71d802b53d 100644
--- a/Documentation/process/deprecated.rst
+++ b/Documentation/process/deprecated.rst
@@ -51,24 +51,6 @@ to make sure their systems do not continue running in the face of
"unreachable" conditions. (For example, see commits like `this one
<https://git.kernel.org/linus/d4689846881d160a4d12a514e991a740bcb5d65a>`_.)
-uninitialized_var()
--------------------
-For any compiler warnings about uninitialized variables, just add
-an initializer. Using the uninitialized_var() macro (or similar
-warning-silencing tricks) is dangerous as it papers over `real bugs
-<https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/>`_
-(or can in the future), and suppresses unrelated compiler warnings
-(e.g. "unused variable"). If the compiler thinks it is uninitialized,
-either simply initialize the variable or make compiler changes. Keep in
-mind that in most cases, if an initialization is obviously redundant,
-the compiler's dead-store elimination pass will make sure there are no
-needless variable writes.
-
-As Linus has said, this macro
-`must <https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/>`_
-`be <https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/>`_
-`removed <https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/>`_.
-
open-coded arithmetic in allocator arguments
--------------------------------------------
Dynamic size calculations (especially multiplication) should not be
@@ -322,7 +304,8 @@ to allocate for a structure containing an array of this kind as a member::
In the example above, we had to remember to calculate ``count - 1`` when using
the struct_size() helper, otherwise we would have --unintentionally-- allocated
memory for one too many ``items`` objects. The cleanest and least error-prone way
-to implement this is through the use of a `flexible array member`::
+to implement this is through the use of a `flexible array member`, together with
+struct_size() and flex_array_size() helpers::
struct something {
size_t count;
@@ -334,5 +317,4 @@ to implement this is through the use of a `flexible array member`::
instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL);
instance->count = count;
- size = sizeof(instance->items[0]) * instance->count;
- memcpy(instance->items, source, size);
+ memcpy(instance->items, source, flex_array_size(instance, items, instance->count));
diff --git a/Documentation/process/email-clients.rst b/Documentation/process/email-clients.rst
index c9e4ce2613c0..16586f6cc888 100644
--- a/Documentation/process/email-clients.rst
+++ b/Documentation/process/email-clients.rst
@@ -25,6 +25,11 @@ attachments, but then the attachments should have content-type
it makes quoting portions of the patch more difficult in the patch
review process.
+It's also strongly recommended that you use plain text in your email body,
+for patches and other emails alike. https://useplaintext.email may be useful
+for information on how to configure your preferred email client, as well as
+listing recommended email clients should you not already have a preference.
+
Email clients that are used for Linux kernel patches should send the
patch text untouched. For example, they should not modify or delete tabs
or spaces, even at the beginning or end of lines.
diff --git a/Documentation/process/programming-language.rst b/Documentation/process/programming-language.rst
index e5f5f065dc24..ec474a70a02f 100644
--- a/Documentation/process/programming-language.rst
+++ b/Documentation/process/programming-language.rst
@@ -6,14 +6,15 @@ Programming Language
The kernel is written in the C programming language [c-language]_.
More precisely, the kernel is typically compiled with ``gcc`` [gcc]_
under ``-std=gnu89`` [gcc-c-dialect-options]_: the GNU dialect of ISO C90
-(including some C99 features).
+(including some C99 features). ``clang`` [clang]_ is also supported, see
+docs on :ref:`Building Linux with Clang/LLVM <kbuild_llvm>`.
This dialect contains many extensions to the language [gnu-extensions]_,
and many of them are used within the kernel as a matter of course.
-There is some support for compiling the kernel with ``clang`` [clang]_
-and ``icc`` [icc]_ for several of the architectures, although at the time
-of writing it is not completed, requiring third-party patches.
+There is some support for compiling the kernel with ``icc`` [icc]_ for several
+of the architectures, although at the time of writing it is not completed,
+requiring third-party patches.
Attributes
----------
diff --git a/Documentation/process/submit-checklist.rst b/Documentation/process/submit-checklist.rst
index 3f8e9d5d95c2..b681e862a335 100644
--- a/Documentation/process/submit-checklist.rst
+++ b/Documentation/process/submit-checklist.rst
@@ -24,6 +24,10 @@ and elsewhere regarding submitting Linux kernel patches.
c) Builds successfully when using ``O=builddir``
+ d) Any Documentation/ changes build successfully without new warnings/errors.
+ Use ``make htmldocs`` or ``make pdfdocs`` to check the build and
+ fix any issues.
+
3) Builds on multiple CPU architectures by using local cross-compile tools
or some other build farm.
diff --git a/Documentation/process/submitting-drivers.rst b/Documentation/process/submitting-drivers.rst
index 74b35bfc6623..3861887e0ca5 100644
--- a/Documentation/process/submitting-drivers.rst
+++ b/Documentation/process/submitting-drivers.rst
@@ -60,10 +60,11 @@ What Criteria Determine Acceptance
Licensing:
The code must be released to us under the
- GNU General Public License. We don't insist on any kind
- of exclusive GPL licensing, and if you wish the driver
- to be useful to other communities such as BSD you may well
- wish to release under multiple licenses.
+ GNU General Public License. If you wish the driver to be
+ useful to other communities such as BSD you may release
+ under multiple licenses. If you choose to release under
+ licenses other than the GPL, you should include your
+ rationale for your license choices in your cover letter.
See accepted licenses at include/linux/module.h
Copyright:
diff --git a/Documentation/process/submitting-patches.rst b/Documentation/process/submitting-patches.rst
index 5219bf3cddfc..58586ffe2808 100644
--- a/Documentation/process/submitting-patches.rst
+++ b/Documentation/process/submitting-patches.rst
@@ -10,22 +10,18 @@ can greatly increase the chances of your change being accepted.
This document contains a large number of suggestions in a relatively terse
format. For detailed information on how the kernel development process
-works, see :ref:`Documentation/process <development_process_main>`.
-Also, read :ref:`Documentation/process/submit-checklist.rst <submitchecklist>`
-for a list of items to check before
-submitting code. If you are submitting a driver, also read
-:ref:`Documentation/process/submitting-drivers.rst <submittingdrivers>`;
-for device tree binding patches, read
-Documentation/devicetree/bindings/submitting-patches.rst.
-
-Many of these steps describe the default behavior of the ``git`` version
-control system; if you use ``git`` to prepare your patches, you'll find much
-of the mechanical work done for you, though you'll still need to prepare
-and document a sensible set of patches. In general, use of ``git`` will make
-your life as a kernel developer easier.
-
-0) Obtain a current source tree
--------------------------------
+works, see :doc:`development-process`. Also, read :doc:`submit-checklist`
+for a list of items to check before submitting code. If you are submitting
+a driver, also read :doc:`submitting-drivers`; for device tree binding patches,
+read :doc:`submitting-patches`.
+
+This documentation assumes that you're using ``git`` to prepare your patches.
+If you're unfamiliar with ``git``, you would be well-advised to learn how to
+use it, it will make your life as a kernel developer and in general much
+easier.
+
+Obtain a current source tree
+----------------------------
If you do not have a repository with the current kernel source handy, use
``git`` to obtain one. You'll want to start with the mainline repository,
@@ -39,68 +35,10 @@ patches prepared against those trees. See the **T:** entry for the subsystem
in the MAINTAINERS file to find that tree, or simply ask the maintainer if
the tree is not listed there.
-It is still possible to download kernel releases via tarballs (as described
-in the next section), but that is the hard way to do kernel development.
-
-1) ``diff -up``
----------------
-
-If you must generate your patches by hand, use ``diff -up`` or ``diff -uprN``
-to create patches. Git generates patches in this form by default; if
-you're using ``git``, you can skip this section entirely.
-
-All changes to the Linux kernel occur in the form of patches, as
-generated by :manpage:`diff(1)`. When creating your patch, make sure to
-create it in "unified diff" format, as supplied by the ``-u`` argument
-to :manpage:`diff(1)`.
-Also, please use the ``-p`` argument which shows which C function each
-change is in - that makes the resultant ``diff`` a lot easier to read.
-Patches should be based in the root kernel source directory,
-not in any lower subdirectory.
-
-To create a patch for a single file, it is often sufficient to do::
-
- SRCTREE=linux
- MYFILE=drivers/net/mydriver.c
-
- cd $SRCTREE
- cp $MYFILE $MYFILE.orig
- vi $MYFILE # make your change
- cd ..
- diff -up $SRCTREE/$MYFILE{.orig,} > /tmp/patch
-
-To create a patch for multiple files, you should unpack a "vanilla",
-or unmodified kernel source tree, and generate a ``diff`` against your
-own source tree. For example::
-
- MYSRC=/devel/linux
-
- tar xvfz linux-3.19.tar.gz
- mv linux-3.19 linux-3.19-vanilla
- diff -uprN -X linux-3.19-vanilla/Documentation/dontdiff \
- linux-3.19-vanilla $MYSRC > /tmp/patch
-
-``dontdiff`` is a list of files which are generated by the kernel during
-the build process, and should be ignored in any :manpage:`diff(1)`-generated
-patch.
-
-Make sure your patch does not include any extra files which do not
-belong in a patch submission. Make sure to review your patch -after-
-generating it with :manpage:`diff(1)`, to ensure accuracy.
-
-If your changes produce a lot of deltas, you need to split them into
-individual patches which modify things in logical stages; see
-:ref:`split_changes`. This will facilitate review by other kernel developers,
-very important if you want your patch accepted.
-
-If you're using ``git``, ``git rebase -i`` can help you with this process. If
-you're not using ``git``, ``quilt`` <https://savannah.nongnu.org/projects/quilt>
-is another popular alternative.
-
.. _describe_changes:
-2) Describe your changes
-------------------------
+Describe your changes
+---------------------
Describe your problem. Whether your patch is a one-line bug fix or
5000 lines of a new feature, there must be an underlying problem that
@@ -203,8 +141,8 @@ An example call::
.. _split_changes:
-3) Separate your changes
-------------------------
+Separate your changes
+---------------------
Separate each **logical change** into a separate patch.
@@ -236,8 +174,8 @@ then only post say 15 or so at a time and wait for review and integration.
-4) Style-check your changes
----------------------------
+Style-check your changes
+------------------------
Check your patch for basic style violations, details of which can be
found in
@@ -267,8 +205,8 @@ You should be able to justify all violations that remain in your
patch.
-5) Select the recipients for your patch
----------------------------------------
+Select the recipients for your patch
+------------------------------------
You should always copy the appropriate subsystem maintainer(s) on any patch
to code that they maintain; look through the MAINTAINERS file and the
@@ -299,7 +237,8 @@ sending him e-mail.
If you have a patch that fixes an exploitable security bug, send that patch
to security@kernel.org. For severe bugs, a short embargo may be considered
to allow distributors to get the patch out to users; in such cases,
-obviously, the patch should not be sent to any public lists.
+obviously, the patch should not be sent to any public lists. See also
+:doc:`/admin-guide/security-bugs`.
Patches that fix a severe bug in a released kernel should be directed
toward the stable maintainers by putting a line like this::
@@ -342,15 +281,20 @@ Trivial patches must qualify for one of the following rules:
-6) No MIME, no links, no compression, no attachments. Just plain text
-----------------------------------------------------------------------
+No MIME, no links, no compression, no attachments. Just plain text
+-------------------------------------------------------------------
Linus and other kernel developers need to be able to read and comment
on the changes you are submitting. It is important for a kernel
developer to be able to "quote" your changes, using standard e-mail
tools, so that they may comment on specific portions of your code.
-For this reason, all patches should be submitted by e-mail "inline".
+For this reason, all patches should be submitted by e-mail "inline". The
+easiest way to do this is with ``git send-email``, which is strongly
+recommended. An interactive tutorial for ``git send-email`` is available at
+https://git-send-email.io.
+
+If you choose not to use ``git send-email``:
.. warning::
@@ -366,27 +310,17 @@ decreasing the likelihood of your MIME-attached change being accepted.
Exception: If your mailer is mangling patches then someone may ask
you to re-send them using MIME.
-See :ref:`Documentation/process/email-clients.rst <email_clients>`
-for hints about configuring your e-mail client so that it sends your patches
-untouched.
-
-7) E-mail size
---------------
+See :doc:`/process/email-clients` for hints about configuring your e-mail
+client so that it sends your patches untouched.
-Large changes are not appropriate for mailing lists, and some
-maintainers. If your patch, uncompressed, exceeds 300 kB in size,
-it is preferred that you store your patch on an Internet-accessible
-server, and provide instead a URL (link) pointing to your patch. But note
-that if your patch exceeds 300 kB, it almost certainly needs to be broken up
-anyway.
-
-8) Respond to review comments
------------------------------
+Respond to review comments
+--------------------------
Your patch will almost certainly get comments from reviewers on ways in
-which the patch can be improved. You must respond to those comments;
-ignoring reviewers is a good way to get ignored in return. Review comments
-or questions that do not lead to a code change should almost certainly
+which the patch can be improved, in the form of a reply to your email. You must
+respond to those comments; ignoring reviewers is a good way to get ignored in
+return. You can simply reply to their emails to answer their comments. Review
+comments or questions that do not lead to a code change should almost certainly
bring about a comment or changelog entry so that the next reviewer better
understands what is going on.
@@ -395,9 +329,12 @@ for their time. Code review is a tiring and time-consuming process, and
reviewers sometimes get grumpy. Even in that case, though, respond
politely and address the problems they have pointed out.
+See :doc:`email-clients` for recommendations on email
+clients and mailing list etiquette.
-9) Don't get discouraged - or impatient
----------------------------------------
+
+Don't get discouraged - or impatient
+------------------------------------
After you have submitted your change, be patient and wait. Reviewers are
busy people and may not get to your patch right away.
@@ -410,18 +347,19 @@ one week before resubmitting or pinging reviewers - possibly longer during
busy times like merge windows.
-10) Include PATCH in the subject
---------------------------------
+Include PATCH in the subject
+-----------------------------
Due to high e-mail traffic to Linus, and to linux-kernel, it is common
convention to prefix your subject line with [PATCH]. This lets Linus
and other kernel developers more easily distinguish patches from other
e-mail discussions.
+``git send-email`` will do this for you automatically.
-11) Sign your work - the Developer's Certificate of Origin
-----------------------------------------------------------
+Sign your work - the Developer's Certificate of Origin
+------------------------------------------------------
To improve tracking of who did what, especially with patches that can
percolate to their final resting place in the kernel through several
@@ -465,60 +403,15 @@ then you just add a line saying::
Signed-off-by: Random J Developer <random@developer.example.org>
using your real name (sorry, no pseudonyms or anonymous contributions.)
+This will be done for you automatically if you use ``git commit -s``.
Some people also put extra tags at the end. They'll just be ignored for
now, but you can do this to mark internal company procedures or just
point out some special detail about the sign-off.
-If you are a subsystem or branch maintainer, sometimes you need to slightly
-modify patches you receive in order to merge them, because the code is not
-exactly the same in your tree and the submitters'. If you stick strictly to
-rule (c), you should ask the submitter to rediff, but this is a totally
-counter-productive waste of time and energy. Rule (b) allows you to adjust
-the code, but then it is very impolite to change one submitter's code and
-make him endorse your bugs. To solve this problem, it is recommended that
-you add a line between the last Signed-off-by header and yours, indicating
-the nature of your changes. While there is nothing mandatory about this, it
-seems like prepending the description with your mail and/or name, all
-enclosed in square brackets, is noticeable enough to make it obvious that
-you are responsible for last-minute changes. Example::
- Signed-off-by: Random J Developer <random@developer.example.org>
- [lucky@maintainer.example.org: struct foo moved from foo.c to foo.h]
- Signed-off-by: Lucky K Maintainer <lucky@maintainer.example.org>
-
-This practice is particularly helpful if you maintain a stable branch and
-want at the same time to credit the author, track changes, merge the fix,
-and protect the submitter from complaints. Note that under no circumstances
-can you change the author's identity (the From header), as it is the one
-which appears in the changelog.
-
-Special note to back-porters: It seems to be a common and useful practice
-to insert an indication of the origin of a patch at the top of the commit
-message (just after the subject line) to facilitate tracking. For instance,
-here's what we see in a 3.x-stable release::
-
- Date: Tue Oct 7 07:26:38 2014 -0400
-
- libata: Un-break ATA blacklist
-
- commit 1c40279960bcd7d52dbdf1d466b20d24b99176c8 upstream.
-
-And here's what might appear in an older kernel once a patch is backported::
-
- Date: Tue May 13 22:12:27 2008 +0200
-
- wireless, airo: waitbusy() won't delay
-
- [backport of 2.6 commit b7acbdfbd1f277c1eb23f344f899cfa4cd0bf36a]
-
-Whatever the format, this information provides a valuable help to people
-tracking your trees, and to people trying to troubleshoot bugs in your
-tree.
-
-
-12) When to use Acked-by:, Cc:, and Co-developed-by:
--------------------------------------------------------
+When to use Acked-by:, Cc:, and Co-developed-by:
+------------------------------------------------
The Signed-off-by: tag indicates that the signer was involved in the
development of the patch, or that he/she was in the patch's delivery path.
@@ -586,8 +479,8 @@ Example of a patch submitted by a Co-developed-by: author::
Signed-off-by: Submitting Co-Author <sub@coauthor.example.org>
-13) Using Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: and Fixes:
---------------------------------------------------------------------------
+Using Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: and Fixes:
+----------------------------------------------------------------------
The Reported-by tag gives credit to people who find bugs and report them and it
hopefully inspires them to help us again in the future. Please note that if
@@ -650,8 +543,8 @@ for more details.
.. _the_canonical_patch_format:
-14) The canonical patch format
-------------------------------
+The canonical patch format
+--------------------------
This section describes how the patch itself should be formatted. Note
that, if you have your patches stored in a ``git`` repository, proper patch
@@ -773,8 +666,8 @@ references.
.. _explicit_in_reply_to:
-15) Explicit In-Reply-To headers
---------------------------------
+Explicit In-Reply-To headers
+----------------------------
It can be helpful to manually add In-Reply-To: headers to a patch
(e.g., when using ``git send-email``) to associate the patch with
@@ -787,8 +680,8 @@ helpful, you can use the https://lkml.kernel.org/ redirector (e.g., in
the cover email text) to link to an earlier version of the patch series.
-16) Providing base tree information
------------------------------------
+Providing base tree information
+-------------------------------
When other developers receive your patches and start the review process,
it is often useful for them to know where in the tree history they
@@ -838,61 +731,6 @@ either below the ``---`` line or at the very bottom of all other
content, right before your email signature.
-17) Sending ``git pull`` requests
----------------------------------
-
-If you have a series of patches, it may be most convenient to have the
-maintainer pull them directly into the subsystem repository with a
-``git pull`` operation. Note, however, that pulling patches from a developer
-requires a higher degree of trust than taking patches from a mailing list.
-As a result, many subsystem maintainers are reluctant to take pull
-requests, especially from new, unknown developers. If in doubt you can use
-the pull request as the cover letter for a normal posting of the patch
-series, giving the maintainer the option of using either.
-
-A pull request should have [GIT PULL] in the subject line. The
-request itself should include the repository name and the branch of
-interest on a single line; it should look something like::
-
- Please pull from
-
- git://jdelvare.pck.nerim.net/jdelvare-2.6 i2c-for-linus
-
- to get these changes:
-
-A pull request should also include an overall message saying what will be
-included in the request, a ``git shortlog`` listing of the patches
-themselves, and a ``diffstat`` showing the overall effect of the patch series.
-The easiest way to get all this information together is, of course, to let
-``git`` do it for you with the ``git request-pull`` command.
-
-Some maintainers (including Linus) want to see pull requests from signed
-commits; that increases their confidence that the request actually came
-from you. Linus, in particular, will not pull from public hosting sites
-like GitHub in the absence of a signed tag.
-
-The first step toward creating such tags is to make a GNUPG key and get it
-signed by one or more core kernel developers. This step can be hard for
-new developers, but there is no way around it. Attending conferences can
-be a good way to find developers who can sign your key.
-
-Once you have prepared a patch series in ``git`` that you wish to have somebody
-pull, create a signed tag with ``git tag -s``. This will create a new tag
-identifying the last commit in the series and containing a signature
-created with your private key. You will also have the opportunity to add a
-changelog-style message to the tag; this is an ideal place to describe the
-effects of the pull request as a whole.
-
-If the tree the maintainer will be pulling from is not the repository you
-are working from, don't forget to push the signed tag explicitly to the
-public tree.
-
-When generating your pull request, use the signed tag as the target. A
-command like this will do the trick::
-
- git request-pull master git://my.public.tree/linux.git my-signed-tag
-
-
References
----------
diff --git a/Documentation/scheduler/sched-capacity.rst b/Documentation/scheduler/sched-capacity.rst
index 00bf0d011e2a..9b7cbe43b2d1 100644
--- a/Documentation/scheduler/sched-capacity.rst
+++ b/Documentation/scheduler/sched-capacity.rst
@@ -365,7 +365,7 @@ giving it a high uclamp.min value.
.. note::
Wakeup CPU selection in CFS can be eclipsed by Energy Aware Scheduling
- (EAS), which is described in Documentation/scheduling/sched-energy.rst.
+ (EAS), which is described in Documentation/scheduler/sched-energy.rst.
5.1.3 Load balancing
~~~~~~~~~~~~~~~~~~~~
diff --git a/Documentation/scheduler/sched-energy.rst b/Documentation/scheduler/sched-energy.rst
index 78f850778982..001e09c95e1d 100644
--- a/Documentation/scheduler/sched-energy.rst
+++ b/Documentation/scheduler/sched-energy.rst
@@ -331,7 +331,7 @@ asymmetric CPU topologies for now. This requirement is checked at run-time by
looking for the presence of the SD_ASYM_CPUCAPACITY flag when the scheduling
domains are built.
-See Documentation/sched/sched-capacity.rst for requirements to be met for this
+See Documentation/scheduler/sched-capacity.rst for requirements to be met for this
flag to be set in the sched_domain hierarchy.
Please note that EAS is not fundamentally incompatible with SMP, but no
diff --git a/Documentation/security/credentials.rst b/Documentation/security/credentials.rst
index d9387209d143..357328d566c8 100644
--- a/Documentation/security/credentials.rst
+++ b/Documentation/security/credentials.rst
@@ -323,7 +323,6 @@ credentials (the value is simply returned in each case)::
uid_t current_fsuid(void) Current's file access UID
gid_t current_fsgid(void) Current's file access GID
kernel_cap_t current_cap(void) Current's effective capabilities
- void *current_security(void) Current's LSM security pointer
struct user_struct *current_user(void) Current's user account
There are also convenience wrappers for retrieving specific associated pairs of
diff --git a/Documentation/security/keys/trusted-encrypted.rst b/Documentation/security/keys/trusted-encrypted.rst
index 9483a7425ad5..1da879a68640 100644
--- a/Documentation/security/keys/trusted-encrypted.rst
+++ b/Documentation/security/keys/trusted-encrypted.rst
@@ -39,10 +39,9 @@ With the IBM TSS 2 stack::
Or with the Intel TSS 2 stack::
- #> tpm2_createprimary --hierarchy o -G rsa2048 -o key.ctxt
+ #> tpm2_createprimary --hierarchy o -G rsa2048 -c key.ctxt
[...]
- handle: 0x800000FF
- #> tpm2_evictcontrol -c key.ctxt -p 0x81000001
+ #> tpm2_evictcontrol -c key.ctxt 0x81000001
persistentHandle: 0x81000001
Usage::
diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
index b18236370742..a1b0f554cd82 100644
--- a/Documentation/sphinx/automarkup.py
+++ b/Documentation/sphinx/automarkup.py
@@ -13,6 +13,7 @@ if sphinx.version_info[0] < 2 or \
else:
from sphinx.errors import NoUri
import re
+from itertools import chain
#
# Regex nastiness. Of course.
@@ -21,7 +22,13 @@ import re
# :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
# bit tries to restrict matches to things that won't create trouble.
#
-RE_function = re.compile(r'([\w_][\w\d_]+\(\))')
+RE_function = re.compile(r'(([\w_][\w\d_]+)\(\))')
+RE_type = re.compile(r'(struct|union|enum|typedef)\s+([\w_][\w\d_]+)')
+#
+# Detects a reference to a documentation page of the form Documentation/... with
+# an optional extension
+#
+RE_doc = re.compile(r'Documentation(/[\w\-_/]+)(\.\w+)*')
#
# Many places in the docs refer to common system calls. It is
@@ -34,56 +41,110 @@ Skipfuncs = [ 'open', 'close', 'read', 'write', 'fcntl', 'mmap',
'select', 'poll', 'fork', 'execve', 'clone', 'ioctl',
'socket' ]
-#
-# Find all occurrences of function() and try to replace them with
-# appropriate cross references.
-#
-def markup_funcs(docname, app, node):
- cdom = app.env.domains['c']
+def markup_refs(docname, app, node):
t = node.astext()
done = 0
repl = [ ]
- for m in RE_function.finditer(t):
+ #
+ # Associate each regex with the function that will markup its matches
+ #
+ markup_func = {RE_type: markup_c_ref,
+ RE_function: markup_c_ref,
+ RE_doc: markup_doc_ref}
+ match_iterators = [regex.finditer(t) for regex in markup_func]
+ #
+ # Sort all references by the starting position in text
+ #
+ sorted_matches = sorted(chain(*match_iterators), key=lambda m: m.start())
+ for m in sorted_matches:
#
- # Include any text prior to function() as a normal text node.
+ # Include any text prior to match as a normal text node.
#
if m.start() > done:
repl.append(nodes.Text(t[done:m.start()]))
+
#
- # Go through the dance of getting an xref out of the C domain
- #
- target = m.group(1)[:-2]
- target_text = nodes.Text(target + '()')
- xref = None
- if target not in Skipfuncs:
- lit_text = nodes.literal(classes=['xref', 'c', 'c-func'])
- lit_text += target_text
- pxref = addnodes.pending_xref('', refdomain = 'c',
- reftype = 'function',
- reftarget = target, modname = None,
- classname = None)
- #
- # XXX The Latex builder will throw NoUri exceptions here,
- # work around that by ignoring them.
- #
- try:
- xref = cdom.resolve_xref(app.env, docname, app.builder,
- 'function', target, pxref, lit_text)
- except NoUri:
- xref = None
- #
- # Toss the xref into the list if we got it; otherwise just put
- # the function text.
+ # Call the function associated with the regex that matched this text and
+ # append its return to the text
#
- if xref:
- repl.append(xref)
- else:
- repl.append(target_text)
+ repl.append(markup_func[m.re](docname, app, m))
+
done = m.end()
if done < len(t):
repl.append(nodes.Text(t[done:]))
return repl
+#
+# Try to replace a C reference (function() or struct/union/enum/typedef
+# type_name) with an appropriate cross reference.
+#
+def markup_c_ref(docname, app, match):
+ class_str = {RE_function: 'c-func', RE_type: 'c-type'}
+ reftype_str = {RE_function: 'function', RE_type: 'type'}
+
+ cdom = app.env.domains['c']
+ #
+ # Go through the dance of getting an xref out of the C domain
+ #
+ target = match.group(2)
+ target_text = nodes.Text(match.group(0))
+ xref = None
+ if not (match.re == RE_function and target in Skipfuncs):
+ lit_text = nodes.literal(classes=['xref', 'c', class_str[match.re]])
+ lit_text += target_text
+ pxref = addnodes.pending_xref('', refdomain = 'c',
+ reftype = reftype_str[match.re],
+ reftarget = target, modname = None,
+ classname = None)
+ #
+ # XXX The Latex builder will throw NoUri exceptions here,
+ # work around that by ignoring them.
+ #
+ try:
+ xref = cdom.resolve_xref(app.env, docname, app.builder,
+ reftype_str[match.re], target, pxref,
+ lit_text)
+ except NoUri:
+ xref = None
+ #
+ # Return the xref if we got it; otherwise just return the plain text.
+ #
+ if xref:
+ return xref
+ else:
+ return target_text
+
+#
+# Try to replace a documentation reference of the form Documentation/... with a
+# cross reference to that page
+#
+def markup_doc_ref(docname, app, match):
+ stddom = app.env.domains['std']
+ #
+ # Go through the dance of getting an xref out of the std domain
+ #
+ target = match.group(1)
+ xref = None
+ pxref = addnodes.pending_xref('', refdomain = 'std', reftype = 'doc',
+ reftarget = target, modname = None,
+ classname = None, refexplicit = False)
+ #
+ # XXX The Latex builder will throw NoUri exceptions here,
+ # work around that by ignoring them.
+ #
+ try:
+ xref = stddom.resolve_xref(app.env, docname, app.builder, 'doc',
+ target, pxref, None)
+ except NoUri:
+ xref = None
+ #
+ # Return the xref if we got it; otherwise just return the plain text.
+ #
+ if xref:
+ return xref
+ else:
+ return nodes.Text(match.group(0))
+
def auto_markup(app, doctree, name):
#
# This loop could eventually be improved on. Someday maybe we
@@ -97,7 +158,7 @@ def auto_markup(app, doctree, name):
for para in doctree.traverse(nodes.paragraph):
for node in para.traverse(nodes.Text):
if not isinstance(node.parent, nodes.literal):
- node.parent.replace(node, markup_funcs(name, app, node))
+ node.parent.replace(node, markup_refs(name, app, node))
def setup(app):
app.connect('doctree-resolved', auto_markup)
diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index c1709165c553..10850a9e9af3 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -40,7 +40,7 @@ Synopsis of kprobe_events
MEMADDR : Address where the probe is inserted.
MAXACTIVE : Maximum number of instances of the specified function that
can be probed simultaneously, or 0 for the default value
- as defined in Documentation/staging/kprobes.rst section 1.3.1.
+ as defined in Documentation/trace/kprobes.rst section 1.3.1.
FETCHARGS : Arguments. Each probe can have up to 128 args.
%REG : Fetch register REG
diff --git a/Documentation/trace/ring-buffer-design.rst b/Documentation/trace/ring-buffer-design.rst
index 9c8d22a53d6c..c5d77fcbb5bc 100644
--- a/Documentation/trace/ring-buffer-design.rst
+++ b/Documentation/trace/ring-buffer-design.rst
@@ -1,28 +1,4 @@
-.. This file is dual-licensed: you can use it either under the terms
-.. of the GPL 2.0 or the GFDL 1.2 license, at your option. Note that this
-.. dual licensing only applies to this file, and not this project as a
-.. whole.
-..
-.. a) This file is free software; you can redistribute it and/or
-.. modify it under the terms of the GNU General Public License as
-.. published by the Free Software Foundation version 2 of
-.. the License.
-..
-.. This file is distributed in the hope that it will be useful,
-.. but WITHOUT ANY WARRANTY; without even the implied warranty of
-.. MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-.. GNU General Public License for more details.
-..
-.. Or, alternatively,
-..
-.. b) Permission is granted to copy, distribute and/or modify this
-.. document under the terms of the GNU Free Documentation License,
-.. Version 1.2 version published by the Free Software
-.. Foundation, with no Invariant Sections, no Front-Cover Texts
-.. and no Back-Cover Texts. A copy of the license is included at
-.. Documentation/userspace-api/media/fdl-appendix.rst.
-..
-.. TODO: replace it to GPL-2.0 OR GFDL-1.2 WITH no-invariant-sections
+.. SPDX-License-Identifier: GPL-2.0 OR GFDL-1.2-no-invariants-only
===========================
Lockless Ring Buffer Design
diff --git a/Documentation/translations/ko_KR/howto.rst b/Documentation/translations/ko_KR/howto.rst
index 71d4823e41e1..240d29be38f2 100644
--- a/Documentation/translations/ko_KR/howto.rst
+++ b/Documentation/translations/ko_KR/howto.rst
@@ -284,9 +284,10 @@ Andrew Mortonì˜ ê¸€ì´ ìžˆë‹¤.
여러 ë©”ì´ì € 넘버를 갖는 다양한 ì•ˆì •ëœ ì»¤ë„ íŠ¸ë¦¬ë“¤
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-3 ìžë¦¬ 숫ìžë¡œ ì´ë£¨ì–´ì§„ ë²„ì ¼ì˜ ì»¤ë„ë“¤ì€ -stable 커ë„들ì´ë‹¤. ê·¸ê²ƒë“¤ì€ í•´ë‹¹ ë©”ì´ì €
-ë©”ì¸ë¼ì¸ 릴리즈ì—ì„œ ë°œê²¬ëœ í° íšŒê·€ë“¤ì´ë‚˜ 보안 문제들 중 비êµì  ìž‘ê³  중요한
-ìˆ˜ì •ë“¤ì„ í¬í•¨í•˜ë©°, ì•žì˜ ë‘ ë²„ì „ 넘버는 ê°™ì€ ê¸°ë°˜ ë²„ì „ì„ ì˜ë¯¸í•œë‹¤.
+ì„¸ê°œì˜ ë²„ì ¼ 넘버로 ì´ë£¨ì–´ì§„ ë²„ì ¼ì˜ ì»¤ë„ë“¤ì€ -stable 커ë„들ì´ë‹¤. ê·¸ê²ƒë“¤ì€ í•´ë‹¹
+ë©”ì´ì € ë©”ì¸ë¼ì¸ 릴리즈ì—ì„œ ë°œê²¬ëœ í° íšŒê·€ë“¤ì´ë‚˜ 보안 문제들 중 비êµì  ìž‘ê³ 
+중요한 ìˆ˜ì •ë“¤ì„ í¬í•¨í•œë‹¤. 주요 stable 시리즈 릴리즈는 세번째 버젼 넘버를
+ì¦ê°€ì‹œí‚¤ë©° ì•žì˜ ë‘ ë²„ì ¼ 넘버는 그대로 유지한다.
ì´ê²ƒì€ 가장 ìµœê·¼ì˜ ì•ˆì •ì ì¸ 커ë„ì„ ì›í•˜ëŠ” 사용ìžì—게 추천ë˜ëŠ” 브랜치ì´ë©°,
개발/ì‹¤í—˜ì  ë²„ì ¼ì„ í…ŒìŠ¤íŠ¸í•˜ëŠ” ê²ƒì„ ë•ê³ ìž 하는 사용ìžë“¤ê³¼ëŠ” 별로 ê´€ë ¨ì´ ì—†ë‹¤.
@@ -316,7 +317,7 @@ Andrew Mortonì˜ ê¸€ì´ ìžˆë‹¤.
ì œì•ˆëœ íŒ¨ì¹˜ëŠ” 서브시스템 íŠ¸ë¦¬ì— ì»¤ë°‹ë˜ê¸° ì „ì— ë©”ì¼ë§ 리스트를 통해
리뷰ëœë‹¤(ì•„ëž˜ì˜ ê´€ë ¨ ì„¹ì…˜ì„ ì°¸ê³ í•˜ê¸° 바란다). ì¼ë¶€ ì»¤ë„ ì„œë¸Œì‹œìŠ¤í…œì˜ ê²½ìš°, ì´
리뷰 프로세스는 patchworkë¼ëŠ” ë„구를 통해 추ì ëœë‹¤. patchworkì€ ë“±ë¡ëœ 패치와
-íŒ¨ì¹˜ì— ëŒ€í•œ 코멘트, íŒ¨ì¹˜ì˜ ë²„ì „ì„ ë³¼ 수 있는 웹 ì¸í„°íŽ˜ì´ìŠ¤ë¥¼ 제공하고,
+íŒ¨ì¹˜ì— ëŒ€í•œ 코멘트, íŒ¨ì¹˜ì˜ ë²„ì ¼ì„ ë³¼ 수 있는 웹 ì¸í„°íŽ˜ì´ìŠ¤ë¥¼ 제공하고,
ë©”ì¸í…Œì´ë„ˆëŠ” 패치를 리뷰 중, 리뷰 통과, ë˜ëŠ” 반려ë¨ìœ¼ë¡œ 표시할 수 있다.
ëŒ€ë¶€ë¶„ì˜ ì´ëŸ¬í•œ patchwork 사ì´íŠ¸ëŠ” https://patchwork.kernel.org/ ì— ë‚˜ì—´ë˜ì–´
있다.
diff --git a/Documentation/translations/ko_KR/memory-barriers.txt b/Documentation/translations/ko_KR/memory-barriers.txt
index 9dcc7c9d52e6..64d932f5dc77 100644
--- a/Documentation/translations/ko_KR/memory-barriers.txt
+++ b/Documentation/translations/ko_KR/memory-barriers.txt
@@ -91,7 +91,6 @@ Documentation/memory-barriers.txt
- 컴파ì¼ëŸ¬ 배리어.
- CPU 메모리 배리어.
- - MMIO 쓰기 배리어.
(*) ì•”ë¬µì  ì»¤ë„ ë©”ëª¨ë¦¬ 배리어.
@@ -103,7 +102,6 @@ Documentation/memory-barriers.txt
(*) CPU ê°„ ACQUIRING ë°°ë¦¬ì–´ì˜ íš¨ê³¼.
- Acquire vs 메모리 액세스.
- - Acquire vs I/O 액세스.
(*) 메모리 배리어가 필요한 곳
@@ -515,14 +513,13 @@ CPU ì—게 기대할 수 있는 ìµœì†Œí•œì˜ ë³´ìž¥ì‚¬í•­ 몇가지가 있습니
완료ë˜ê¸° ì „ì— í–‰í•´ì§„ 것처럼 ë³´ì¼ ìˆ˜ 있습니다.
ACQUIRE 와 RELEASE 오í¼ë ˆì´ì…˜ì˜ ì‚¬ìš©ì€ ì¼ë°˜ì ìœ¼ë¡œ 다른 메모리 배리어ì˜
- í•„ìš”ì„±ì„ ì—†ì•±ë‹ˆë‹¤ (하지만 "MMIO 쓰기 배리어" 서브섹션ì—ì„œ 설명ë˜ëŠ” 예외를
- 알아ë‘세요). ë˜í•œ, RELEASE+ACQUIRE ì¡°í•©ì€ ë²”ìš© 메모리 배리어처럼 ë™ìž‘í• 
- ê²ƒì„ ë³´ìž¥í•˜ì§€ -않습니다-. 하지만, ì–´ë–¤ ë³€ìˆ˜ì— ëŒ€í•œ RELEASE 오í¼ë ˆì´ì…˜ì„
- 앞서는 메모리 ì•¡ì„¸ìŠ¤ë“¤ì˜ ìˆ˜í–‰ 결과는 ì´ RELEASE 오í¼ë ˆì´ì…˜ì„ ë’¤ì´ì–´ ê°™ì€
- ë³€ìˆ˜ì— ëŒ€í•´ ìˆ˜í–‰ëœ ACQUIRE 오í¼ë ˆì´ì…˜ì„ 뒤따르는 메모리 액세스ì—는 보여질
- ê²ƒì´ ë³´ìž¥ë©ë‹ˆë‹¤. 다르게 ë§í•˜ìžë©´, 주어진 ë³€ìˆ˜ì˜ í¬ë¦¬í‹°ì»¬ 섹션ì—서는, 해당
- ë³€ìˆ˜ì— ëŒ€í•œ ì•žì˜ í¬ë¦¬í‹°ì»¬ 섹션ì—ì„œì˜ ëª¨ë“  ì•¡ì„¸ìŠ¤ë“¤ì´ ì™„ë£Œë˜ì—ˆì„ 것ì„
- 보장합니다.
+ í•„ìš”ì„±ì„ ì—†ì•±ë‹ˆë‹¤. ë˜í•œ, RELEASE+ACQUIRE ì¡°í•©ì€ ë²”ìš© 메모리 배리어처럼
+ ë™ìž‘í•  ê²ƒì„ ë³´ìž¥í•˜ì§€ -않습니다-. 하지만, ì–´ë–¤ ë³€ìˆ˜ì— ëŒ€í•œ RELEASE
+ 오í¼ë ˆì´ì…˜ì„ 앞서는 메모리 ì•¡ì„¸ìŠ¤ë“¤ì˜ ìˆ˜í–‰ 결과는 ì´ RELEASE 오í¼ë ˆì´ì…˜ì„
+ ë’¤ì´ì–´ ê°™ì€ ë³€ìˆ˜ì— ëŒ€í•´ ìˆ˜í–‰ëœ ACQUIRE 오í¼ë ˆì´ì…˜ì„ 뒤따르는 메모리
+ 액세스ì—는 보여질 ê²ƒì´ ë³´ìž¥ë©ë‹ˆë‹¤. 다르게 ë§í•˜ìžë©´, 주어진 변수ì˜
+ í¬ë¦¬í‹°ì»¬ 섹션ì—서는, 해당 ë³€ìˆ˜ì— ëŒ€í•œ ì•žì˜ í¬ë¦¬í‹°ì»¬ 섹션ì—ì„œì˜ ëª¨ë“ 
+ ì•¡ì„¸ìŠ¤ë“¤ì´ ì™„ë£Œë˜ì—ˆì„ ê²ƒì„ ë³´ìž¥í•©ë‹ˆë‹¤.
즉, ACQUIRE 는 ìµœì†Œí•œì˜ "ì·¨ë“" ë™ìž‘처럼, 그리고 RELEASE 는 ìµœì†Œí•œì˜ "공개"
처럼 ë™ìž‘한다는 ì˜ë¯¸ìž…니다.
@@ -1501,8 +1498,6 @@ u ë¡œì˜ ìŠ¤í† ì–´ë¥¼ cpu1() ì˜ v ë¡œë¶€í„°ì˜ ë¡œë“œ ë’¤ì— ì¼ì–´ë‚œ 것으ë¡
(*) CPU 메모리 배리어.
- (*) MMIO 쓰기 배리어.
-
컴파ì¼ëŸ¬ 배리어
---------------
@@ -1909,6 +1904,19 @@ Mandatory ë°°ë¦¬ì–´ë“¤ì€ SMP 시스템ì—ì„œë„ UP 시스템ì—ì„œë„ SMP 효ê³
"ì»¤ë„ I/O ë°°ë¦¬ì–´ì˜ íš¨ê³¼" 섹션ì„, consistent memory ì— ëŒ€í•œ ìžì„¸í•œ ë‚´ìš©ì„
위해선 Documentation/core-api/dma-api.rst 문서를 참고하세요.
+ (*) pmem_wmb();
+
+ ì´ê²ƒì€ persistent memory 를 위한 것으로, persistent ì €ìž¥ì†Œì— ê°€í•´ì§„ 변경
+ ì‚¬í•­ì´ í”Œëž«í¼ ì—°ì†ì„± ë„ë©”ì¸ì— ë„ë‹¬í–ˆì„ ê²ƒì„ ë³´ìž¥í•˜ê¸° 위한 것입니다.
+
+ 예를 들어, ìž„ì‹œì ì´ì§€ ì•Šì€ pmem ì˜ì—­ìœ¼ë¡œì˜ 쓰기 후, 우리는 쓰기가 플랫í¼
+ ì—°ì†ì„± ë„ë©”ì¸ì— ë„ë‹¬í–ˆì„ ê²ƒì„ ë³´ìž¥í•˜ê¸° 위해 pmem_wmb() 를 사용합니다.
+ ì´ëŠ” 쓰기가 뒤따르는 instruction ë“¤ì´ ìœ ë°œí•˜ëŠ” ì–´ë– í•œ ë°ì´í„° 액세스나
+ ë°ì´í„° ì „ì†¡ì˜ ì‹œìž‘ ì „ì— persistent 저장소를 ì—…ë°ì´íŠ¸ í–ˆì„ ê²ƒì„ ë³´ìž¥í•©ë‹ˆë‹¤.
+ ì´ëŠ” wmb() ì— ì˜í•´ ì´ë¤„지는 순서 ê·œì¹™ì„ í¬í•¨í•©ë‹ˆë‹¤.
+
+ Persistent memory ì—ì„œì˜ ë¡œë“œë¥¼ 위해선 í˜„ìž¬ì˜ ì½ê¸° 메모리 ë°°ë¦¬ì–´ë¡œë„ ì½ê¸°
+ 순서를 ë³´ìž¥í•˜ëŠ”ë° ì¶©ë¶„í•©ë‹ˆë‹¤.
=========================
ì•”ë¬µì  ì»¤ë„ ë©”ëª¨ë¦¬ 배리어
diff --git a/Documentation/translations/zh_CN/arm64/amu.rst b/Documentation/translations/zh_CN/arm64/amu.rst
new file mode 100644
index 000000000000..bd875f221330
--- /dev/null
+++ b/Documentation/translations/zh_CN/arm64/amu.rst
@@ -0,0 +1,100 @@
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: :ref:`Documentation/arm64/amu.rst <amu_index>`
+
+Translator: Bailu Lin <bailu.lin@vivo.com>
+
+=================================
+AArch64 Linux 中扩展的活动监控å•å…ƒ
+=================================
+
+作者: Ionela Voinescu <ionela.voinescu@arm.com>
+
+日期: 2019-09-10
+
+本文档简è¦æ述了 AArch64 Linux 支æŒçš„活动监控å•å…ƒçš„规范。
+
+
+架构总述
+--------
+
+活动监控是 ARMv8.4 CPU 架构引入的一个å¯é€‰æ‰©å±•ç‰¹æ€§ã€‚
+
+活动监控å•å…ƒ(在æ¯ä¸ª CPU 中实现)为系统管ç†æ供了性能计数器。既å¯ä»¥é€š
+过系统寄存器的方å¼è®¿é—®è®¡æ•°å™¨ï¼ŒåŒæ—¶ä¹Ÿæ”¯æŒå¤–部内存映射的方å¼è®¿é—®è®¡æ•°å™¨ã€‚
+
+AMUv1 架构实现了一个由4个固定的64ä½äº‹ä»¶è®¡æ•°å™¨ç»„æˆçš„计数器组。
+
+ - CPU å‘¨æœŸè®¡æ•°å™¨ï¼šåŒ CPU 的频率增长
+ - 常é‡è®¡æ•°å™¨ï¼šåŒå›ºå®šçš„系统时钟频率增长
+ - 淘汰指令计数器: åŒæ¯æ¬¡æž¶æž„指令执行增长
+ - 内存åœé¡¿å‘¨æœŸè®¡æ•°å™¨ï¼šè®¡ç®—由在时钟域内的最åŽä¸€çº§ç¼“存中未命中而引起
+ 的指令调度åœé¡¿å‘¨æœŸæ•°
+
+当处于 WFI 或者 WFE 状æ€æ—¶ï¼Œè®¡æ•°å™¨ä¸ä¼šå¢žé•¿ã€‚
+
+AMU 架构æ供了一个高达16ä½çš„事件计数器空间,未æ¥æ–°çš„ AMU 版本中å¯èƒ½
+用它æ¥å®žçŽ°æ–°å¢žçš„事件计数器。
+
+å¦å¤–,AMUv1 实现了一个多达16个64ä½è¾…助事件计数器的计数器组。
+
+冷å¤ä½æ—¶æ‰€æœ‰çš„计数器会清零。
+
+
+基本支æŒ
+--------
+
+内核å¯ä»¥å®‰å…¨åœ°è¿è¡Œåœ¨æ”¯æŒ AMU å’Œä¸æ”¯æŒ AMU çš„ CPU 组åˆä¸­ã€‚
+因此,当é…ç½® CONFIG_ARM64_AMU_EXTN åŽæˆ‘们无æ¡ä»¶ä½¿èƒ½åŽç»­
+(secondary or hotplugged) CPU 检测和使用这个特性。
+
+当在 CPU 上检测到该特性时,我们会标记为特性å¯ç”¨ä½†æ˜¯ä¸èƒ½ä¿è¯è®¡æ•°å™¨çš„功能,
+仅表明有扩展属性。
+
+固件(代ç è¿è¡Œåœ¨é«˜å¼‚常级别,例如 arm-tf )需支æŒä»¥ä¸‹åŠŸèƒ½ï¼š
+
+ - æ供低异常级别(EL2 å’Œ EL1)访问 AMU 寄存器的能力。
+ - 使能计数器。如果未使能,它的值应为 0。
+ - 在从电æºå…³é—­çŠ¶æ€å¯åŠ¨ CPU å‰æˆ–åŽä¿å­˜æˆ–者æ¢å¤è®¡æ•°å™¨ã€‚
+
+当使用使能了该特性的内核å¯åŠ¨ä½†å›ºä»¶æŸå时,访问计数器寄存器å¯èƒ½ä¼šé­é‡
+panic 或者死é”。å³ä½¿æœªå‘现这些症状,计数器寄存器返回的数æ®ç»“果并ä¸ä¸€
+定能å映真实情况。通常,计数器会返回 0,表明他们未被使能。
+
+如果固件没有æ供适当的支æŒæœ€å¥½å…³é—­ CONFIG_ARM64_AMU_EXTN。
+值得注æ„的是,出于安全原因,ä¸è¦ç»•è¿‡ AMUSERRENR_EL0 设置而æ•èŽ·ä»Ž
+EL0(用户空间) 访问 EL1(内核空间)。 因此,固件应该确ä¿è®¿é—® AMU寄存器
+ä¸ä¼šå›°åœ¨ EL2或EL3。
+
+AMUv1 的固定计数器å¯ä»¥é€šè¿‡å¦‚下系统寄存器访问:
+
+ - SYS_AMEVCNTR0_CORE_EL0
+ - SYS_AMEVCNTR0_CONST_EL0
+ - SYS_AMEVCNTR0_INST_RET_EL0
+ - SYS_AMEVCNTR0_MEM_STALL_EL0
+
+特定辅助计数器å¯ä»¥é€šè¿‡ SYS_AMEVCNTR1_EL0(n) 访问,其中n介于0到15。
+
+详细信æ¯å®šä¹‰åœ¨ç›®å½•ï¼šarch/arm64/include/asm/sysreg.h。
+
+
+用户空间访问
+------------
+
+由于以下原因,当å‰ç¦æ­¢ä»Žç”¨æˆ·ç©ºé—´è®¿é—® AMU 的寄存器:
+
+ - 安全因数:å¯èƒ½ä¼šæš´éœ²å¤„于安全模å¼æ‰§è¡Œçš„代ç ä¿¡æ¯ã€‚
+ - æ„愿:AMU 是用于系统管ç†çš„。
+
+åŒæ ·ï¼Œè¯¥åŠŸèƒ½å¯¹ç”¨æˆ·ç©ºé—´ä¸å¯è§ã€‚
+
+
+虚拟化
+------
+
+由于以下原因,当å‰ç¦æ­¢ä»Ž KVM 客户端的用户空间(EL0)和内核空间(EL1)
+访问 AMU 的寄存器:
+
+ - 安全因数:å¯èƒ½ä¼šæš´éœ²ç»™å…¶ä»–客户端或主机端执行的代ç ä¿¡æ¯ã€‚
+
+任何试图访问 AMU 寄存器的行为都会触å‘一个注册在客户端的未定义异常。
diff --git a/Documentation/translations/zh_CN/arm64/index.rst b/Documentation/translations/zh_CN/arm64/index.rst
new file mode 100644
index 000000000000..646ed1f7aea3
--- /dev/null
+++ b/Documentation/translations/zh_CN/arm64/index.rst
@@ -0,0 +1,16 @@
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: :ref:`Documentation/arm64/index.rst <arm64_index>`
+:Translator: Bailu Lin <bailu.lin@vivo.com>
+
+.. _cn_arm64_index:
+
+
+==========
+ARM64 架构
+==========
+
+.. toctree::
+ :maxdepth: 2
+
+ amu
diff --git a/Documentation/translations/zh_CN/filesystems/sysfs.txt b/Documentation/translations/zh_CN/filesystems/sysfs.txt
index 9481e3ed2a06..046cc1d52058 100644
--- a/Documentation/translations/zh_CN/filesystems/sysfs.txt
+++ b/Documentation/translations/zh_CN/filesystems/sysfs.txt
@@ -154,14 +154,13 @@ sysfs 会为这个类型调用适当的方法。当一个文件被读写时,è¿
示例:
-#define to_dev(obj) container_of(obj, struct device, kobj)
#define to_dev_attr(_attr) container_of(_attr, struct device_attribute, attr)
static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr,
char *buf)
{
struct device_attribute *dev_attr = to_dev_attr(attr);
- struct device *dev = to_dev(kobj);
+ struct device *dev = kobj_to_dev(kobj);
ssize_t ret = -EIO;
if (dev_attr->show)
diff --git a/Documentation/translations/zh_CN/index.rst b/Documentation/translations/zh_CN/index.rst
index 85643e46e308..be6f11176200 100644
--- a/Documentation/translations/zh_CN/index.rst
+++ b/Documentation/translations/zh_CN/index.rst
@@ -19,6 +19,7 @@
admin-guide/index
process/index
filesystems/index
+ arm64/index
目录和表格
----------
diff --git a/Documentation/virt/index.rst b/Documentation/virt/index.rst
index de1ab81df958..d20490292642 100644
--- a/Documentation/virt/index.rst
+++ b/Documentation/virt/index.rst
@@ -8,7 +8,7 @@ Linux Virtualization Support
:maxdepth: 2
kvm/index
- uml/user_mode_linux
+ uml/user_mode_linux_howto_v2
paravirt_ops
guest-halt-polling
diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index 2d44388438cc..09a8f2a34e39 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -53,11 +53,11 @@ key management interface to perform common hypervisor activities such as
encrypting bootstrap code, snapshot, migrating and debugging the guest. For more
information, see the SEV Key Management spec [api-spec]_
-The main ioctl to access SEV is KVM_MEM_ENCRYPT_OP. If the argument
-to KVM_MEM_ENCRYPT_OP is NULL, the ioctl returns 0 if SEV is enabled
+The main ioctl to access SEV is KVM_MEMORY_ENCRYPT_OP. If the argument
+to KVM_MEMORY_ENCRYPT_OP is NULL, the ioctl returns 0 if SEV is enabled
and ``ENOTTY` if it is disabled (on some older versions of Linux,
the ioctl runs normally even with a NULL argument, and therefore will
-likely return ``EFAULT``). If non-NULL, the argument to KVM_MEM_ENCRYPT_OP
+likely return ``EFAULT``). If non-NULL, the argument to KVM_MEMORY_ENCRYPT_OP
must be a struct kvm_sev_cmd::
struct kvm_sev_cmd {
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 51191b56e61c..1f26d83e6b16 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4211,7 +4211,7 @@ H_GET_CPU_CHARACTERISTICS hypercall.
:Capability: basic
:Architectures: x86
-:Type: system
+:Type: vm
:Parameters: an opaque platform specific structure (in/out)
:Returns: 0 on success; -1 on error
@@ -4343,7 +4343,7 @@ Errors:
#define KVM_STATE_NESTED_VMX_SMM_GUEST_MODE 0x00000001
#define KVM_STATE_NESTED_VMX_SMM_VMXON 0x00000002
-#define KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE 0x00000001
+ #define KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE 0x00000001
struct kvm_vmx_nested_state_hdr {
__u64 vmxon_pa;
diff --git a/Documentation/virt/kvm/arm/hyp-abi.rst b/Documentation/virt/kvm/arm/hyp-abi.rst
index d9eba93aa364..83cadd8186fa 100644
--- a/Documentation/virt/kvm/arm/hyp-abi.rst
+++ b/Documentation/virt/kvm/arm/hyp-abi.rst
@@ -54,9 +54,9 @@ these functions (see arch/arm{,64}/include/asm/virt.h):
x3 = x1's value when entering the next payload (arm64)
x4 = x2's value when entering the next payload (arm64)
- Mask all exceptions, disable the MMU, move the arguments into place
- (arm64 only), and jump to the restart address while at HYP/EL2. This
- hypercall is not expected to return to its caller.
+ Mask all exceptions, disable the MMU, clear I+D bits, move the arguments
+ into place (arm64 only), and jump to the restart address while at HYP/EL2.
+ This hypercall is not expected to return to its caller.
Any other value of r0/x0 triggers a hypervisor-specific handling,
which is not documented here.
diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index a7dff9186bed..9150e9d1c39b 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -78,7 +78,7 @@ KVM_FEATURE_PV_SEND_IPI 11 guest checks this feature bit
before enabling paravirtualized
sebd IPIs
-KVM_FEATURE_PV_POLL_CONTROL 12 host-side polling on HLT can
+KVM_FEATURE_POLL_CONTROL 12 host-side polling on HLT can
be disabled by writing
to msr 0x4b564d05.
diff --git a/Documentation/virt/uml/user_mode_linux.rst b/Documentation/virt/uml/user_mode_linux.rst
deleted file mode 100644
index de0f0b2c9d5b..000000000000
--- a/Documentation/virt/uml/user_mode_linux.rst
+++ /dev/null
@@ -1,4403 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-=====================
-User Mode Linux HOWTO
-=====================
-
-:Author: User Mode Linux Core Team
-:Last-updated: Sat Jan 25 16:07:55 CET 2020
-
-This document describes the use and abuse of Jeff Dike's User Mode
-Linux: a port of the Linux kernel as a normal Intel Linux process.
-
-
-.. Table of Contents
-
- 1. Introduction
-
- 1.1 How is User Mode Linux Different?
- 1.2 Why Would I Want User Mode Linux?
-
- 2. Compiling the kernel and modules
-
- 2.1 Compiling the kernel
- 2.2 Compiling and installing kernel modules
- 2.3 Compiling and installing uml_utilities
-
- 3. Running UML and logging in
-
- 3.1 Running UML
- 3.2 Logging in
- 3.3 Examples
-
- 4. UML on 2G/2G hosts
-
- 4.1 Introduction
- 4.2 The problem
- 4.3 The solution
-
- 5. Setting up serial lines and consoles
-
- 5.1 Specifying the device
- 5.2 Specifying the channel
- 5.3 Examples
-
- 6. Setting up the network
-
- 6.1 General setup
- 6.2 Userspace daemons
- 6.3 Specifying ethernet addresses
- 6.4 UML interface setup
- 6.5 Multicast
- 6.6 TUN/TAP with the uml_net helper
- 6.7 TUN/TAP with a preconfigured tap device
- 6.8 Ethertap
- 6.9 The switch daemon
- 6.10 Slip
- 6.11 Slirp
- 6.12 pcap
- 6.13 Setting up the host yourself
-
- 7. Sharing Filesystems between Virtual Machines
-
- 7.1 A warning
- 7.2 Using layered block devices
- 7.3 Note!
- 7.4 Another warning
- 7.5 uml_moo : Merging a COW file with its backing file
-
- 8. Creating filesystems
-
- 8.1 Create the filesystem file
- 8.2 Assign the file to a UML device
- 8.3 Creating and mounting the filesystem
-
- 9. Host file access
-
- 9.1 Using hostfs
- 9.2 hostfs as the root filesystem
- 9.3 Building hostfs
-
- 10. The Management Console
- 10.1 version
- 10.2 halt and reboot
- 10.3 config
- 10.4 remove
- 10.5 sysrq
- 10.6 help
- 10.7 cad
- 10.8 stop
- 10.9 go
-
- 11. Kernel debugging
-
- 11.1 Starting the kernel under gdb
- 11.2 Examining sleeping processes
- 11.3 Running ddd on UML
- 11.4 Debugging modules
- 11.5 Attaching gdb to the kernel
- 11.6 Using alternate debuggers
-
- 12. Kernel debugging examples
-
- 12.1 The case of the hung fsck
- 12.2 Episode 2: The case of the hung fsck
-
- 13. What to do when UML doesn't work
-
- 13.1 Strange compilation errors when you build from source
- 13.2 (obsolete)
- 13.3 A variety of panics and hangs with /tmp on a reiserfs filesystem
- 13.4 The compile fails with errors about conflicting types for 'open', 'dup', and 'waitpid'
- 13.5 UML doesn't work when /tmp is an NFS filesystem
- 13.6 UML hangs on boot when compiled with gprof support
- 13.7 syslogd dies with a SIGTERM on startup
- 13.8 TUN/TAP networking doesn't work on a 2.4 host
- 13.9 You can network to the host but not to other machines on the net
- 13.10 I have no root and I want to scream
- 13.11 UML build conflict between ptrace.h and ucontext.h
- 13.12 The UML BogoMips is exactly half the host's BogoMips
- 13.13 When you run UML, it immediately segfaults
- 13.14 xterms appear, then immediately disappear
- 13.15 Any other panic, hang, or strange behavior
-
- 14. Diagnosing Problems
-
- 14.1 Case 1 : Normal kernel panics
- 14.2 Case 2 : Tracing thread panics
- 14.3 Case 3 : Tracing thread panics caused by other threads
- 14.4 Case 4 : Hangs
-
- 15. Thanks
-
- 15.1 Code and Documentation
- 15.2 Flushing out bugs
- 15.3 Buglets and clean-ups
- 15.4 Case Studies
- 15.5 Other contributions
-
-
-1. Introduction
-================
-
- Welcome to User Mode Linux. It's going to be fun.
-
-
-
-1.1. How is User Mode Linux Different?
----------------------------------------
-
- Normally, the Linux Kernel talks straight to your hardware (video
- card, keyboard, hard drives, etc), and any programs which run ask the
- kernel to operate the hardware, like so::
-
-
-
- +-----------+-----------+----+
- | Process 1 | Process 2 | ...|
- +-----------+-----------+----+
- | Linux Kernel |
- +----------------------------+
- | Hardware |
- +----------------------------+
-
-
-
-
- The User Mode Linux Kernel is different; instead of talking to the
- hardware, it talks to a `real` Linux kernel (called the `host kernel`
- from now on), like any other program. Programs can then run inside
- User-Mode Linux as if they were running under a normal kernel, like
- so::
-
-
-
- +----------------+
- | Process 2 | ...|
- +-----------+----------------+
- | Process 1 | User-Mode Linux|
- +----------------------------+
- | Linux Kernel |
- +----------------------------+
- | Hardware |
- +----------------------------+
-
-
-
-
-
-1.2. Why Would I Want User Mode Linux?
----------------------------------------
-
-
- 1. If User Mode Linux crashes, your host kernel is still fine.
-
- 2. You can run a usermode kernel as a non-root user.
-
- 3. You can debug the User Mode Linux like any normal process.
-
- 4. You can run gprof (profiling) and gcov (coverage testing).
-
- 5. You can play with your kernel without breaking things.
-
- 6. You can use it as a sandbox for testing new apps.
-
- 7. You can try new development kernels safely.
-
- 8. You can run different distributions simultaneously.
-
- 9. It's extremely fun.
-
-
-
-.. _Compiling_the_kernel_and_modules:
-
-2. Compiling the kernel and modules
-====================================
-
-
-
-
-2.1. Compiling the kernel
---------------------------
-
-
- Compiling the user mode kernel is just like compiling any other
- kernel.
-
-
- 1. Download the latest kernel from your favourite kernel mirror,
- such as:
-
- https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/linux-5.4.14.tar.xz
-
- 2. Make a directory and unpack the kernel into it::
-
- host%
- mkdir ~/uml
-
- host%
- cd ~/uml
-
- host%
- tar xvf linux-5.4.14.tar.xz
-
-
- 3. Run your favorite config; ``make xconfig ARCH=um`` is the most
- convenient. ``make config ARCH=um`` and ``make menuconfig ARCH=um``
- will work as well. The defaults will give you a useful kernel. If
- you want to change something, go ahead, it probably won't hurt
- anything.
-
-
- Note: If the host is configured with a 2G/2G address space split
- rather than the usual 3G/1G split, then the packaged UML binaries
- will not run. They will immediately segfault. See
- :ref:`UML_on_2G/2G_hosts` for the scoop on running UML on your system.
-
-
-
- 4. Finish with ``make linux ARCH=um``: the result is a file called
- ``linux`` in the top directory of your source tree.
-
-
-2.2. Compiling and installing kernel modules
----------------------------------------------
-
- UML modules are built in the same way as the native kernel (with the
- exception of the 'ARCH=um' that you always need for UML)::
-
-
- host% make modules ARCH=um
-
-
-
-
- Any modules that you want to load into this kernel need to be built in
- the user-mode pool. Modules from the native kernel won't work.
-
- You can install them by using ftp or something to copy them into the
- virtual machine and dropping them into ``/lib/modules/$(uname -r)``.
-
- You can also get the kernel build process to install them as follows:
-
- 1. with the kernel not booted, mount the root filesystem in the top
- level of the kernel pool::
-
-
- host% mount root_fs mnt -o loop
-
-
-
-
-
-
- 2. run::
-
-
- host%
- make modules_install INSTALL_MOD_PATH=`pwd`/mnt ARCH=um
-
-
-
-
-
-
- 3. unmount the filesystem::
-
-
- host% umount mnt
-
-
-
-
-
-
- 4. boot the kernel on it
-
-
- When the system is booted, you can use insmod as usual to get the
- modules into the kernel. A number of things have been loaded into UML
- as modules, especially filesystems and network protocols and filters,
- so most symbols which need to be exported probably already are.
- However, if you do find symbols that need exporting, let us
- know at http://user-mode-linux.sourceforge.net/, and
- they'll be "taken care of".
-
-
-
-2.3. Compiling and installing uml_utilities
---------------------------------------------
-
- Many features of the UML kernel require a user-space helper program,
- so a uml_utilities package is distributed separately from the kernel
- patch which provides these helpers. Included within this is:
-
- - port-helper - Used by consoles which connect to xterms or ports
-
- - tunctl - Configuration tool to create and delete tap devices
-
- - uml_net - Setuid binary for automatic tap device configuration
-
- - uml_switch - User-space virtual switch required for daemon
- transport
-
- The uml_utilities tree is compiled with::
-
-
- host#
- make && make install
-
-
-
-
- Note that UML kernel patches may require a specific version of the
- uml_utilities distribution. If you don't keep up with the mailing
- lists, ensure that you have the latest release of uml_utilities if you
- are experiencing problems with your UML kernel, particularly when
- dealing with consoles or command-line switches to the helper programs
-
-
-
-
-
-
-
-
-3. Running UML and logging in
-==============================
-
-
-
-3.1. Running UML
------------------
-
- It runs on 2.2.15 or later, and all kernel versions since 2.4.
-
-
- Booting UML is straightforward. Simply run 'linux': it will try to
- mount the file ``root_fs`` in the current directory. You do not need to
- run it as root. If your root filesystem is not named ``root_fs``, then
- you need to put a ``ubd0=root_fs_whatever`` switch on the linux command
- line.
-
-
- You will need a filesystem to boot UML from. There are a number
- available for download from http://user-mode-linux.sourceforge.net.
- There are also several tools at
- http://user-mode-linux.sourceforge.net/ which can be
- used to generate UML-compatible filesystem images from media.
- The kernel will boot up and present you with a login prompt.
-
-
-Note:
- If the host is configured with a 2G/2G address space split
- rather than the usual 3G/1G split, then the packaged UML binaries will
- not run. They will immediately segfault. See :ref:`UML_on_2G/2G_hosts`
- for the scoop on running UML on your system.
-
-
-
-3.2. Logging in
-----------------
-
-
-
- The prepackaged filesystems have a root account with password 'root'
- and a user account with password 'user'. The login banner will
- generally tell you how to log in. So, you log in and you will find
- yourself inside a little virtual machine. Our filesystems have a
- variety of commands and utilities installed (and it is fairly easy to
- add more), so you will have a lot of tools with which to poke around
- the system.
-
- There are a couple of other ways to log in:
-
- - On a virtual console
-
-
-
- Each virtual console that is configured (i.e. the device exists in
- /dev and /etc/inittab runs a getty on it) will come up in its own
- xterm. If you get tired of the xterms, read
- :ref:`setting_up_serial_lines_and_consoles` to see how to attach
- the consoles to something else, like host ptys.
-
-
-
- - Over the serial line
-
-
- In the boot output, find a line that looks like::
-
-
-
- serial line 0 assigned pty /dev/ptyp1
-
-
-
-
- Attach your favorite terminal program to the corresponding tty. I.e.
- for minicom, the command would be::
-
-
- host% minicom -o -p /dev/ttyp1
-
-
-
-
-
-
- - Over the net
-
-
- If the network is running, then you can telnet to the virtual
- machine and log in to it. See :ref:`Setting_up_the_network` to learn
- about setting up a virtual network.
-
- When you're done using it, run halt, and the kernel will bring itself
- down and the process will exit.
-
-
-3.3. Examples
---------------
-
- Here are some examples of UML in action:
-
- - A login session http://user-mode-linux.sourceforge.net/old/login.html
-
- - A virtual network http://user-mode-linux.sourceforge.net/old/net.html
-
-
-
-
-
-.. _UML_on_2G/2G_hosts:
-
-4. UML on 2G/2G hosts
-======================
-
-
-
-
-4.1. Introduction
-------------------
-
-
- Most Linux machines are configured so that the kernel occupies the
- upper 1G (0xc0000000 - 0xffffffff) of the 4G address space and
- processes use the lower 3G (0x00000000 - 0xbfffffff). However, some
- machine are configured with a 2G/2G split, with the kernel occupying
- the upper 2G (0x80000000 - 0xffffffff) and processes using the lower
- 2G (0x00000000 - 0x7fffffff).
-
-
-
-
-4.2. The problem
------------------
-
-
- The prebuilt UML binaries on this site will not run on 2G/2G hosts
- because UML occupies the upper .5G of the 3G process address space
- (0xa0000000 - 0xbfffffff). Obviously, on 2G/2G hosts, this is right
- in the middle of the kernel address space, so UML won't even load - it
- will immediately segfault.
-
-
-
-
-4.3. The solution
-------------------
-
-
- The fix for this is to rebuild UML from source after enabling
- CONFIG_HOST_2G_2G (under 'General Setup'). This will cause UML to
- load itself in the top .5G of that smaller process address space,
- where it will run fine. See :ref:`Compiling_the_kernel_and_modules` if
- you need help building UML from source.
-
-
-
-
-
-
-
-.. _setting_up_serial_lines_and_consoles:
-
-
-5. Setting up serial lines and consoles
-========================================
-
-
- It is possible to attach UML serial lines and consoles to many types
- of host I/O channels by specifying them on the command line.
-
-
- You can attach them to host ptys, ttys, file descriptors, and ports.
- This allows you to do things like:
-
- - have a UML console appear on an unused host console,
-
- - hook two virtual machines together by having one attach to a pty
- and having the other attach to the corresponding tty
-
- - make a virtual machine accessible from the net by attaching a
- console to a port on the host.
-
-
- The general format of the command line option is ``device=channel``.
-
-
-
-5.1. Specifying the device
----------------------------
-
- Devices are specified with "con" or "ssl" (console or serial line,
- respectively), optionally with a device number if you are talking
- about a specific device.
-
-
- Using just "con" or "ssl" describes all of the consoles or serial
- lines. If you want to talk about console #3 or serial line #10, they
- would be "con3" and "ssl10", respectively.
-
-
- A specific device name will override a less general "con=" or "ssl=".
- So, for example, you can assign a pty to each of the serial lines
- except for the first two like this::
-
-
- ssl=pty ssl0=tty:/dev/tty0 ssl1=tty:/dev/tty1
-
-
-
-
- The specificity of the device name is all that matters; order on the
- command line is irrelevant.
-
-
-
-5.2. Specifying the channel
-----------------------------
-
- There are a number of different types of channels to attach a UML
- device to, each with a different way of specifying exactly what to
- attach to.
-
- - pseudo-terminals - device=pty pts terminals - device=pts
-
-
- This will cause UML to allocate a free host pseudo-terminal for the
- device. The terminal that it got will be announced in the boot
- log. You access it by attaching a terminal program to the
- corresponding tty:
-
- - screen /dev/pts/n
-
- - screen /dev/ttyxx
-
- - minicom -o -p /dev/ttyxx - minicom seems not able to handle pts
- devices
-
- - kermit - start it up, 'open' the device, then 'connect'
-
-
-
-
-
- - terminals - device=tty:tty device file
-
-
- This will make UML attach the device to the specified tty (i.e::
-
-
- con1=tty:/dev/tty3
-
-
-
-
- will attach UML's console 1 to the host's /dev/tty3). If the tty that
- you specify is the slave end of a tty/pty pair, something else must
- have already opened the corresponding pty in order for this to work.
-
-
-
-
-
- - xterms - device=xterm
-
-
- UML will run an xterm and the device will be attached to it.
-
-
-
-
-
- - Port - device=port:port number
-
-
- This will attach the UML devices to the specified host port.
- Attaching console 1 to the host's port 9000 would be done like
- this::
-
-
- con1=port:9000
-
-
-
-
- Attaching all the serial lines to that port would be done similarly::
-
-
- ssl=port:9000
-
-
-
-
- You access these devices by telnetting to that port. Each active
- telnet session gets a different device. If there are more telnets to a
- port than UML devices attached to it, then the extra telnet sessions
- will block until an existing telnet detaches, or until another device
- becomes active (i.e. by being activated in /etc/inittab).
-
- This channel has the advantage that you can both attach multiple UML
- devices to it and know how to access them without reading the UML boot
- log. It is also unique in allowing access to a UML from remote
- machines without requiring that the UML be networked. This could be
- useful in allowing public access to UMLs because they would be
- accessible from the net, but wouldn't need any kind of network
- filtering or access control because they would have no network access.
-
-
- If you attach the main console to a portal, then the UML boot will
- appear to hang. In reality, it's waiting for a telnet to connect, at
- which point the boot will proceed.
-
-
-
-
-
- - already-existing file descriptors - device=file descriptor
-
-
- If you set up a file descriptor on the UML command line, you can
- attach a UML device to it. This is most commonly used to put the
- main console back on stdin and stdout after assigning all the other
- consoles to something else::
-
-
- con0=fd:0,fd:1 con=pts
-
-
-
-
-
-
-
-
- - Nothing - device=null
-
-
- This allows the device to be opened, in contrast to 'none', but
- reads will block, and writes will succeed and the data will be
- thrown out.
-
-
-
-
-
- - None - device=none
-
-
- This causes the device to disappear.
-
-
-
- You can also specify different input and output channels for a device
- by putting a comma between them::
-
-
- ssl3=tty:/dev/tty2,xterm
-
-
-
-
- will cause serial line 3 to accept input on the host's /dev/tty2 and
- display output on an xterm. That's a silly example - the most common
- use of this syntax is to reattach the main console to stdin and stdout
- as shown above.
-
-
- If you decide to move the main console away from stdin/stdout, the
- initial boot output will appear in the terminal that you're running
- UML in. However, once the console driver has been officially
- initialized, then the boot output will start appearing wherever you
- specified that console 0 should be. That device will receive all
- subsequent output.
-
-
-
-5.3. Examples
---------------
-
- There are a number of interesting things you can do with this
- capability.
-
-
- First, this is how you get rid of those bleeding console xterms by
- attaching them to host ptys::
-
-
- con=pty con0=fd:0,fd:1
-
-
-
-
- This will make a UML console take over an unused host virtual console,
- so that when you switch to it, you will see the UML login prompt
- rather than the host login prompt::
-
-
- con1=tty:/dev/tty6
-
-
-
-
- You can attach two virtual machines together with what amounts to a
- serial line as follows:
-
- Run one UML with a serial line attached to a pty::
-
-
- ssl1=pty
-
-
-
-
- Look at the boot log to see what pty it got (this example will assume
- that it got /dev/ptyp1).
-
- Boot the other UML with a serial line attached to the corresponding
- tty::
-
-
- ssl1=tty:/dev/ttyp1
-
-
-
-
- Log in, make sure that it has no getty on that serial line, attach a
- terminal program like minicom to it, and you should see the login
- prompt of the other virtual machine.
-
-
-.. _setting_up_the_network:
-
-6. Setting up the network
-==========================
-
-
-
- This page describes how to set up the various transports and to
- provide a UML instance with network access to the host, other machines
- on the local net, and the rest of the net.
-
-
- As of 2.4.5, UML networking has been completely redone to make it much
- easier to set up, fix bugs, and add new features.
-
-
- There is a new helper, uml_net, which does the host setup that
- requires root privileges.
-
-
- There are currently five transport types available for a UML virtual
- machine to exchange packets with other hosts:
-
- - ethertap
-
- - TUN/TAP
-
- - Multicast
-
- - a switch daemon
-
- - slip
-
- - slirp
-
- - pcap
-
- The TUN/TAP, ethertap, slip, and slirp transports allow a UML
- instance to exchange packets with the host. They may be directed
- to the host or the host may just act as a router to provide access
- to other physical or virtual machines.
-
-
- The pcap transport is a synthetic read-only interface, using the
- libpcap binary to collect packets from interfaces on the host and
- filter them. This is useful for building preconfigured traffic
- monitors or sniffers.
-
-
- The daemon and multicast transports provide a completely virtual
- network to other virtual machines. This network is completely
- disconnected from the physical network unless one of the virtual
- machines on it is acting as a gateway.
-
-
- With so many host transports, which one should you use? Here's when
- you should use each one:
-
- - ethertap - if you want access to the host networking and it is
- running 2.2
-
- - TUN/TAP - if you want access to the host networking and it is
- running 2.4. Also, the TUN/TAP transport is able to use a
- preconfigured device, allowing it to avoid using the setuid uml_net
- helper, which is a security advantage.
-
- - Multicast - if you want a purely virtual network and you don't want
- to set up anything but the UML
-
- - a switch daemon - if you want a purely virtual network and you
- don't mind running the daemon in order to get somewhat better
- performance
-
- - slip - there is no particular reason to run the slip backend unless
- ethertap and TUN/TAP are just not available for some reason
-
- - slirp - if you don't have root access on the host to setup
- networking, or if you don't want to allocate an IP to your UML
-
- - pcap - not much use for actual network connectivity, but great for
- monitoring traffic on the host
-
- Ethertap is available on 2.4 and works fine. TUN/TAP is preferred
- to it because it has better performance and ethertap is officially
- considered obsolete in 2.4. Also, the root helper only needs to
- run occasionally for TUN/TAP, rather than handling every packet, as
- it does with ethertap. This is a slight security advantage since
- it provides fewer opportunities for a nasty UML user to somehow
- exploit the helper's root privileges.
-
-
-6.1. General setup
--------------------
-
- First, you must have the virtual network enabled in your UML. If are
- running a prebuilt kernel from this site, everything is already
- enabled. If you build the kernel yourself, under the "Network device
- support" menu, enable "Network device support", and then the three
- transports.
-
-
- The next step is to provide a network device to the virtual machine.
- This is done by describing it on the kernel command line.
-
- The general format is::
-
-
- eth <n> = <transport> , <transport args>
-
-
-
-
- For example, a virtual ethernet device may be attached to a host
- ethertap device as follows::
-
-
- eth0=ethertap,tap0,fe:fd:0:0:0:1,192.168.0.254
-
-
-
-
- This sets up eth0 inside the virtual machine to attach itself to the
- host /dev/tap0, assigns it an ethernet address, and assigns the host
- tap0 interface an IP address.
-
-
-
- Note that the IP address you assign to the host end of the tap device
- must be different than the IP you assign to the eth device inside UML.
- If you are short on IPs and don't want to consume two per UML, then
- you can reuse the host's eth IP address for the host ends of the tap
- devices. Internally, the UMLs must still get unique IPs for their eth
- devices. You can also give the UMLs non-routable IPs (192.168.x.x or
- 10.x.x.x) and have the host masquerade them. This will let outgoing
- connections work, but incoming connections won't without more work,
- such as port forwarding from the host.
- Also note that when you configure the host side of an interface, it is
- only acting as a gateway. It will respond to pings sent to it
- locally, but is not useful to do that since it's a host interface.
- You are not talking to the UML when you ping that interface and get a
- response.
-
-
- You can also add devices to a UML and remove them at runtime. See the
- :ref:`The_Management_Console` page for details.
-
-
- The sections below describe this in more detail.
-
-
- Once you've decided how you're going to set up the devices, you boot
- UML, log in, configure the UML side of the devices, and set up routes
- to the outside world. At that point, you will be able to talk to any
- other machines, physical or virtual, on the net.
-
-
- If ifconfig inside UML fails and the network refuses to come up, run
- tell you what went wrong.
-
-
-
-6.2. Userspace daemons
------------------------
-
- You will likely need the setuid helper, or the switch daemon, or both.
- They are both installed with the RPM and deb, so if you've installed
- either, you can skip the rest of this section.
-
-
- If not, then you need to check them out of CVS, build them, and
- install them. The helper is uml_net, in CVS /tools/uml_net, and the
- daemon is uml_switch, in CVS /tools/uml_router. They are both built
- with a plain 'make'. Both need to be installed in a directory that's
- in your path - /usr/bin is recommend. On top of that, uml_net needs
- to be setuid root.
-
-
-
-6.3. Specifying ethernet addresses
------------------------------------
-
- Below, you will see that the TUN/TAP, ethertap, and daemon interfaces
- allow you to specify hardware addresses for the virtual ethernet
- devices. This is generally not necessary. If you don't have a
- specific reason to do it, you probably shouldn't. If one is not
- specified on the command line, the driver will assign one based on the
- device IP address. It will provide the address fe:fd:nn:nn:nn:nn
- where nn.nn.nn.nn is the device IP address. This is nearly always
- sufficient to guarantee a unique hardware address for the device. A
- couple of exceptions are:
-
- - Another set of virtual ethernet devices are on the same network and
- they are assigned hardware addresses using a different scheme which
- may conflict with the UML IP address-based scheme
-
- - You aren't going to use the device for IP networking, so you don't
- assign the device an IP address
-
- If you let the driver provide the hardware address, you should make
- sure that the device IP address is known before the interface is
- brought up. So, inside UML, this will guarantee that::
-
-
-
- UML#
- ifconfig eth0 192.168.0.250 up
-
-
-
-
- If you decide to assign the hardware address yourself, make sure that
- the first byte of the address is even. Addresses with an odd first
- byte are broadcast addresses, which you don't want assigned to a
- device.
-
-
-
-6.4. UML interface setup
--------------------------
-
- Once the network devices have been described on the command line, you
- should boot UML and log in.
-
-
- The first thing to do is bring the interface up::
-
-
- UML# ifconfig ethn ip-address up
-
-
-
-
- You should be able to ping the host at this point.
-
-
- To reach the rest of the world, you should set a default route to the
- host::
-
-
- UML# route add default gw host ip
-
-
-
-
- Again, with host ip of 192.168.0.4::
-
-
- UML# route add default gw 192.168.0.4
-
-
-
-
- This page used to recommend setting a network route to your local net.
- This is wrong, because it will cause UML to try to figure out hardware
- addresses of the local machines by arping on the interface to the
- host. Since that interface is basically a single strand of ethernet
- with two nodes on it (UML and the host) and arp requests don't cross
- networks, they will fail to elicit any responses. So, what you want
- is for UML to just blindly throw all packets at the host and let it
- figure out what to do with them, which is what leaving out the network
- route and adding the default route does.
-
-
- Note: If you can't communicate with other hosts on your physical
- ethernet, it's probably because of a network route that's
- automatically set up. If you run 'route -n' and see a route that
- looks like this::
-
-
-
-
- Destination Gateway Genmask Flags Metric Ref Use Iface
- 192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
-
-
-
-
- with a mask that's not 255.255.255.255, then replace it with a route
- to your host::
-
-
- UML#
- route del -net 192.168.0.0 dev eth0 netmask 255.255.255.0
-
-
- UML#
- route add -host 192.168.0.4 dev eth0
-
-
-
-
- This, plus the default route to the host, will allow UML to exchange
- packets with any machine on your ethernet.
-
-
-
-6.5. Multicast
----------------
-
- The simplest way to set up a virtual network between multiple UMLs is
- to use the mcast transport. This was written by Harald Welte and is
- present in UML version 2.4.5-5um and later. Your system must have
- multicast enabled in the kernel and there must be a multicast-capable
- network device on the host. Normally, this is eth0, but if there is
- no ethernet card on the host, then you will likely get strange error
- messages when you bring the device up inside UML.
-
-
- To use it, run two UMLs with::
-
-
- eth0=mcast
-
-
-
-
- on their command lines. Log in, configure the ethernet device in each
- machine with different IP addresses::
-
-
- UML1# ifconfig eth0 192.168.0.254
-
-
- UML2# ifconfig eth0 192.168.0.253
-
-
-
-
- and they should be able to talk to each other.
-
- The full set of command line options for this transport are::
-
-
-
- ethn=mcast,ethernet address,multicast
- address,multicast port,ttl
-
-
-
- There is also a related point-to-point only "ucast" transport.
- This is useful when your network does not support multicast, and
- all network connections are simple point to point links.
-
- The full set of command line options for this transport are::
-
-
- ethn=ucast,ethernet address,remote address,listen port,remote port
-
-
-
-
-6.6. TUN/TAP with the uml_net helper
--------------------------------------
-
- TUN/TAP is the preferred mechanism on 2.4 to exchange packets with the
- host. The TUN/TAP backend has been in UML since 2.4.9-3um.
-
-
- The easiest way to get up and running is to let the setuid uml_net
- helper do the host setup for you. This involves insmod-ing the tun.o
- module if necessary, configuring the device, and setting up IP
- forwarding, routing, and proxy arp. If you are new to UML networking,
- do this first. If you're concerned about the security implications of
- the setuid helper, use it to get up and running, then read the next
- section to see how to have UML use a preconfigured tap device, which
- avoids the use of uml_net.
-
-
- If you specify an IP address for the host side of the device, the
- uml_net helper will do all necessary setup on the host - the only
- requirement is that TUN/TAP be available, either built in to the host
- kernel or as the tun.o module.
-
- The format of the command line switch to attach a device to a TUN/TAP
- device is::
-
-
- eth <n> =tuntap,,, <IP address>
-
-
-
-
- For example, this argument will attach the UML's eth0 to the next
- available tap device and assign an ethernet address to it based on its
- IP address::
-
-
- eth0=tuntap,,,192.168.0.254
-
-
-
-
-
-
- Note that the IP address that must be used for the eth device inside
- UML is fixed by the routing and proxy arp that is set up on the
- TUN/TAP device on the host. You can use a different one, but it won't
- work because reply packets won't reach the UML. This is a feature.
- It prevents a nasty UML user from doing things like setting the UML IP
- to the same as the network's nameserver or mail server.
-
-
- There are a couple potential problems with running the TUN/TAP
- transport on a 2.4 host kernel
-
- - TUN/TAP seems not to work on 2.4.3 and earlier. Upgrade the host
- kernel or use the ethertap transport.
-
- - With an upgraded kernel, TUN/TAP may fail with::
-
-
- File descriptor in bad state
-
-
-
-
- This is due to a header mismatch between the upgraded kernel and the
- kernel that was originally installed on the machine. The fix is to
- make sure that /usr/src/linux points to the headers for the running
- kernel.
-
- These were pointed out by Tim Robinson <timro at trkr dot net> in the past.
-
-
-
-6.7. TUN/TAP with a preconfigured tap device
----------------------------------------------
-
- If you prefer not to have UML use uml_net (which is somewhat
- insecure), with UML 2.4.17-11, you can set up a TUN/TAP device
- beforehand. The setup needs to be done as root, but once that's done,
- there is no need for root assistance. Setting up the device is done
- as follows:
-
- - Create the device with tunctl (available from the UML utilities
- tarball)::
-
-
-
-
- host# tunctl -u uid
-
-
-
-
- where uid is the user id or username that UML will be run as. This
- will tell you what device was created.
-
- - Configure the device IP (change IP addresses and device name to
- suit)::
-
-
-
-
- host# ifconfig tap0 192.168.0.254 up
-
-
-
-
-
- - Set up routing and arping if desired - this is my recipe, there are
- other ways of doing the same thing::
-
-
- host#
- bash -c 'echo 1 > /proc/sys/net/ipv4/ip_forward'
-
- host#
- route add -host 192.168.0.253 dev tap0
-
- host#
- bash -c 'echo 1 > /proc/sys/net/ipv4/conf/tap0/proxy_arp'
-
- host#
- arp -Ds 192.168.0.253 eth0 pub
-
-
-
-
- Note that this must be done every time the host boots - this configu-
- ration is not stored across host reboots. So, it's probably a good
- idea to stick it in an rc file. An even better idea would be a little
- utility which reads the information from a config file and sets up
- devices at boot time.
-
- - Rather than using up two IPs and ARPing for one of them, you can
- also provide direct access to your LAN by the UML by using a
- bridge::
-
-
- host#
- brctl addbr br0
-
-
- host#
- ifconfig eth0 0.0.0.0 promisc up
-
-
- host#
- ifconfig tap0 0.0.0.0 promisc up
-
-
- host#
- ifconfig br0 192.168.0.1 netmask 255.255.255.0 up
-
-
- host#
- brctl stp br0 off
-
-
- host#
- brctl setfd br0 1
-
-
- host#
- brctl sethello br0 1
-
-
- host#
- brctl addif br0 eth0
-
-
- host#
- brctl addif br0 tap0
-
-
-
-
- Note that 'br0' should be setup using ifconfig with the existing IP
- address of eth0, as eth0 no longer has its own IP.
-
- -
-
-
- Also, the /dev/net/tun device must be writable by the user running
- UML in order for the UML to use the device that's been configured
- for it. The simplest thing to do is::
-
-
- host# chmod 666 /dev/net/tun
-
-
-
-
- Making it world-writable looks bad, but it seems not to be
- exploitable as a security hole. However, it does allow anyone to cre-
- ate useless tap devices (useless because they can't configure them),
- which is a DOS attack. A somewhat more secure alternative would to be
- to create a group containing all the users who have preconfigured tap
- devices and chgrp /dev/net/tun to that group with mode 664 or 660.
-
-
- - Once the device is set up, run UML with 'eth0=tuntap,device name'
- (i.e. 'eth0=tuntap,tap0') on the command line (or do it with the
- mconsole config command).
-
- - Bring the eth device up in UML and you're in business.
-
- If you don't want that tap device any more, you can make it non-
- persistent with::
-
-
- host# tunctl -d tap device
-
-
-
-
- Finally, tunctl has a -b (for brief mode) switch which causes it to
- output only the name of the tap device it created. This makes it
- suitable for capture by a script::
-
-
- host# TAP=`tunctl -u 1000 -b`
-
-
-
-
-
-
-6.8. Ethertap
---------------
-
- Ethertap is the general mechanism on 2.2 for userspace processes to
- exchange packets with the kernel.
-
-
-
- To use this transport, you need to describe the virtual network device
- on the UML command line. The general format for this is::
-
-
- eth <n> =ethertap, <device> , <ethernet address> , <tap IP address>
-
-
-
-
- So, the previous example::
-
-
- eth0=ethertap,tap0,fe:fd:0:0:0:1,192.168.0.254
-
-
-
-
- attaches the UML eth0 device to the host /dev/tap0, assigns it the
- ethernet address fe:fd:0:0:0:1, and assigns the IP address
- 192.168.0.254 to the tap device.
-
-
-
- The tap device is mandatory, but the others are optional. If the
- ethernet address is omitted, one will be assigned to it.
-
-
- The presence of the tap IP address will cause the helper to run and do
- whatever host setup is needed to allow the virtual machine to
- communicate with the outside world. If you're not sure you know what
- you're doing, this is the way to go.
-
-
- If it is absent, then you must configure the tap device and whatever
- arping and routing you will need on the host. However, even in this
- case, the uml_net helper still needs to be in your path and it must be
- setuid root if you're not running UML as root. This is because the
- tap device doesn't support SIGIO, which UML needs in order to use
- something as a source of input. So, the helper is used as a
- convenient asynchronous IO thread.
-
- If you're using the uml_net helper, you can ignore the following host
- setup - uml_net will do it for you. You just need to make sure you
- have ethertap available, either built in to the host kernel or
- available as a module.
-
-
- If you want to set things up yourself, you need to make sure that the
- appropriate /dev entry exists. If it doesn't, become root and create
- it as follows::
-
-
- mknod /dev/tap <minor> c 36 <minor> + 16
-
-
-
-
- For example, this is how to create /dev/tap0::
-
-
- mknod /dev/tap0 c 36 0 + 16
-
-
-
-
- You also need to make sure that the host kernel has ethertap support.
- If ethertap is enabled as a module, you apparently need to insmod
- ethertap once for each ethertap device you want to enable. So,::
-
-
- host#
- insmod ethertap
-
-
-
-
- will give you the tap0 interface. To get the tap1 interface, you need
- to run::
-
-
- host#
- insmod ethertap unit=1 -o ethertap1
-
-
-
-
-
-
-
-6.9. The switch daemon
------------------------
-
- Note: This is the daemon formerly known as uml_router, but which was
- renamed so the network weenies of the world would stop growling at me.
-
-
- The switch daemon, uml_switch, provides a mechanism for creating a
- totally virtual network. By default, it provides no connection to the
- host network (but see -tap, below).
-
-
- The first thing you need to do is run the daemon. Running it with no
- arguments will make it listen on a default pair of unix domain
- sockets.
-
-
- If you want it to listen on a different pair of sockets, use::
-
-
- -unix control socket data socket
-
-
-
-
-
- If you want it to act as a hub rather than a switch, use::
-
-
- -hub
-
-
-
-
-
- If you want the switch to be connected to host networking (allowing
- the umls to get access to the outside world through the host), use::
-
-
- -tap tap0
-
-
-
-
-
- Note that the tap device must be preconfigured (see "TUN/TAP with a
- preconfigured tap device", above). If you're using a different tap
- device than tap0, specify that instead of tap0.
-
-
- uml_switch can be backgrounded as follows::
-
-
- host%
- uml_switch [ options ] < /dev/null > /dev/null
-
-
-
-
- The reason it doesn't background by default is that it listens to
- stdin for EOF. When it sees that, it exits.
-
-
- The general format of the kernel command line switch is::
-
-
-
- ethn=daemon,ethernet address,socket
- type,control socket,data socket
-
-
-
-
- You can leave off everything except the 'daemon'. You only need to
- specify the ethernet address if the one that will be assigned to it
- isn't acceptable for some reason. The rest of the arguments describe
- how to communicate with the daemon. You should only specify them if
- you told the daemon to use different sockets than the default. So, if
- you ran the daemon with no arguments, running the UML on the same
- machine with::
-
- eth0=daemon
-
-
-
-
- will cause the eth0 driver to attach itself to the daemon correctly.
-
-
-
-6.10. Slip
------------
-
- Slip is another, less general, mechanism for a process to communicate
- with the host networking. In contrast to the ethertap interface,
- which exchanges ethernet frames with the host and can be used to
- transport any higher-level protocol, it can only be used to transport
- IP.
-
-
- The general format of the command line switch is::
-
-
-
- ethn=slip,slip IP
-
-
-
-
- The slip IP argument is the IP address that will be assigned to the
- host end of the slip device. If it is specified, the helper will run
- and will set up the host so that the virtual machine can reach it and
- the rest of the network.
-
-
- There are some oddities with this interface that you should be aware
- of. You should only specify one slip device on a given virtual
- machine, and its name inside UML will be 'umn', not 'eth0' or whatever
- you specified on the command line. These problems will be fixed at
- some point.
-
-
-
-6.11. Slirp
-------------
-
- slirp uses an external program, usually /usr/bin/slirp, to provide IP
- only networking connectivity through the host. This is similar to IP
- masquerading with a firewall, although the translation is performed in
- user-space, rather than by the kernel. As slirp does not set up any
- interfaces on the host, or changes routing, slirp does not require
- root access or setuid binaries on the host.
-
-
- The general format of the command line switch for slirp is::
-
-
-
- ethn=slirp,ethernet address,slirp path
-
-
-
-
- The ethernet address is optional, as UML will set up the interface
- with an ethernet address based upon the initial IP address of the
- interface. The slirp path is generally /usr/bin/slirp, although it
- will depend on distribution.
-
-
- The slirp program can have a number of options passed to the command
- line and we can't add them to the UML command line, as they will be
- parsed incorrectly. Instead, a wrapper shell script can be written or
- the options inserted into the /.slirprc file. More information on
- all of the slirp options can be found in its man pages.
-
-
- The eth0 interface on UML should be set up with the IP 10.2.0.15,
- although you can use anything as long as it is not used by a network
- you will be connecting to. The default route on UML should be set to
- use::
-
-
- UML#
- route add default dev eth0
-
-
-
-
- slirp provides a number of useful IP addresses which can be used by
- UML, such as 10.0.2.3 which is an alias for the DNS server specified
- in /etc/resolv.conf on the host or the IP given in the 'dns' option
- for slirp.
-
-
- Even with a baudrate setting higher than 115200, the slirp connection
- is limited to 115200. If you need it to go faster, the slirp binary
- needs to be compiled with FULL_BOLT defined in config.h.
-
-
-
-6.12. pcap
------------
-
- The pcap transport is attached to a UML ethernet device on the command
- line or with uml_mconsole with the following syntax::
-
-
-
- ethn=pcap,host interface,filter
- expression,option1,option2
-
-
-
-
- The expression and options are optional.
-
-
- The interface is whatever network device on the host you want to
- sniff. The expression is a pcap filter expression, which is also what
- tcpdump uses, so if you know how to specify tcpdump filters, you will
- use the same expressions here. The options are up to two of
- 'promisc', control whether pcap puts the host interface into
- promiscuous mode. 'optimize' and 'nooptimize' control whether the pcap
- expression optimizer is used.
-
-
- Example::
-
-
-
- eth0=pcap,eth0,tcp
-
- eth1=pcap,eth0,!tcp
-
-
-
- will cause the UML eth0 to emit all tcp packets on the host eth0 and
- the UML eth1 to emit all non-tcp packets on the host eth0.
-
-
-
-6.13. Setting up the host yourself
------------------------------------
-
- If you don't specify an address for the host side of the ethertap or
- slip device, UML won't do any setup on the host. So this is what is
- needed to get things working (the examples use a host-side IP of
- 192.168.0.251 and a UML-side IP of 192.168.0.250 - adjust to suit your
- own network):
-
- - The device needs to be configured with its IP address. Tap devices
- are also configured with an mtu of 1484. Slip devices are
- configured with a point-to-point address pointing at the UML ip
- address::
-
-
- host# ifconfig tap0 arp mtu 1484 192.168.0.251 up
-
-
- host#
- ifconfig sl0 192.168.0.251 pointopoint 192.168.0.250 up
-
-
-
-
-
- - If a tap device is being set up, a route is set to the UML IP::
-
-
- UML# route add -host 192.168.0.250 gw 192.168.0.251
-
-
-
-
-
- - To allow other hosts on your network to see the virtual machine,
- proxy arp is set up for it::
-
-
- host# arp -Ds 192.168.0.250 eth0 pub
-
-
-
-
-
- - Finally, the host is set up to route packets::
-
-
- host# echo 1 > /proc/sys/net/ipv4/ip_forward
-
-
-
-
-
-
-
-
-
-
-7. Sharing Filesystems between Virtual Machines
-================================================
-
-
-
-
-7.1. A warning
----------------
-
- Don't attempt to share filesystems simply by booting two UMLs from the
- same file. That's the same thing as booting two physical machines
- from a shared disk. It will result in filesystem corruption.
-
-
-
-7.2. Using layered block devices
----------------------------------
-
- The way to share a filesystem between two virtual machines is to use
- the copy-on-write (COW) layering capability of the ubd block driver.
- As of 2.4.6-2um, the driver supports layering a read-write private
- device over a read-only shared device. A machine's writes are stored
- in the private device, while reads come from either device - the
- private one if the requested block is valid in it, the shared one if
- not. Using this scheme, the majority of data which is unchanged is
- shared between an arbitrary number of virtual machines, each of which
- has a much smaller file containing the changes that it has made. With
- a large number of UMLs booting from a large root filesystem, this
- leads to a huge disk space saving. It will also help performance,
- since the host will be able to cache the shared data using a much
- smaller amount of memory, so UML disk requests will be served from the
- host's memory rather than its disks.
-
-
-
-
- To add a copy-on-write layer to an existing block device file, simply
- add the name of the COW file to the appropriate ubd switch::
-
-
- ubd0=root_fs_cow,root_fs_debian_22
-
-
-
-
- where 'root_fs_cow' is the private COW file and 'root_fs_debian_22' is
- the existing shared filesystem. The COW file need not exist. If it
- doesn't, the driver will create and initialize it. Once the COW file
- has been initialized, it can be used on its own on the command line::
-
-
- ubd0=root_fs_cow
-
-
-
-
- The name of the backing file is stored in the COW file header, so it
- would be redundant to continue specifying it on the command line.
-
-
-
-7.3. Note!
------------
-
- When checking the size of the COW file in order to see the gobs of
- space that you're saving, make sure you use 'ls -ls' to see the actual
- disk consumption rather than the length of the file. The COW file is
- sparse, so the length will be very different from the disk usage.
- Here is a 'ls -l' of a COW file and backing file from one boot and
- shutdown::
-
- host% ls -l cow.debian debian2.2
- -rw-r--r-- 1 jdike jdike 492504064 Aug 6 21:16 cow.debian
- -rwxrw-rw- 1 jdike jdike 537919488 Aug 6 20:42 debian2.2
-
-
-
-
- Doesn't look like much saved space, does it? Well, here's 'ls -ls'::
-
-
- host% ls -ls cow.debian debian2.2
- 880 -rw-r--r-- 1 jdike jdike 492504064 Aug 6 21:16 cow.debian
- 525832 -rwxrw-rw- 1 jdike jdike 537919488 Aug 6 20:42 debian2.2
-
-
-
-
- Now, you can see that the COW file has less than a meg of disk, rather
- than 492 meg.
-
-
-
-7.4. Another warning
----------------------
-
- Once a filesystem is being used as a readonly backing file for a COW
- file, do not boot directly from it or modify it in any way. Doing so
- will invalidate any COW files that are using it. The mtime and size
- of the backing file are stored in the COW file header at its creation,
- and they must continue to match. If they don't, the driver will
- refuse to use the COW file.
-
-
-
-
- If you attempt to evade this restriction by changing either the
- backing file or the COW header by hand, you will get a corrupted
- filesystem.
-
-
-
-
- Among other things, this means that upgrading the distribution in a
- backing file and expecting that all of the COW files using it will see
- the upgrade will not work.
-
-
-
-
-7.5. uml_moo : Merging a COW file with its backing file
---------------------------------------------------------
-
- Depending on how you use UML and COW devices, it may be advisable to
- merge the changes in the COW file into the backing file every once in
- a while.
-
-
-
-
- The utility that does this is uml_moo. Its usage is::
-
-
- host% uml_moo COW file new backing file
-
-
-
-
- There's no need to specify the backing file since that information is
- already in the COW file header. If you're paranoid, boot the new
- merged file, and if you're happy with it, move it over the old backing
- file.
-
-
-
-
- uml_moo creates a new backing file by default as a safety measure. It
- also has a destructive merge option which will merge the COW file
- directly into its current backing file. This is really only usable
- when the backing file only has one COW file associated with it. If
- there are multiple COWs associated with a backing file, a -d merge of
- one of them will invalidate all of the others. However, it is
- convenient if you're short of disk space, and it should also be
- noticeably faster than a non-destructive merge.
-
-
-
-
- uml_moo is installed with the UML deb and RPM. If you didn't install
- UML from one of those packages, you can also get it from the UML
- utilities http://user-mode-linux.sourceforge.net/utilities tar file
- in tools/moo.
-
-
-
-
-
-
-
-
-8. Creating filesystems
-========================
-
-
- You may want to create and mount new UML filesystems, either because
- your root filesystem isn't large enough or because you want to use a
- filesystem other than ext2.
-
-
- This was written on the occasion of reiserfs being included in the
- 2.4.1 kernel pool, and therefore the 2.4.1 UML, so the examples will
- talk about reiserfs. This information is generic, and the examples
- should be easy to translate to the filesystem of your choice.
-
-
-8.1. Create the filesystem file
-================================
-
- dd is your friend. All you need to do is tell dd to create an empty
- file of the appropriate size. I usually make it sparse to save time
- and to avoid allocating disk space until it's actually used. For
- example, the following command will create a sparse 100 meg file full
- of zeroes::
-
-
- host%
- dd if=/dev/zero of=new_filesystem seek=100 count=1 bs=1M
-
-
-
-
-
-
- 8.2. Assign the file to a UML device
-
- Add an argument like the following to the UML command line::
-
- ubd4=new_filesystem
-
-
-
-
- making sure that you use an unassigned ubd device number.
-
-
-
- 8.3. Creating and mounting the filesystem
-
- Make sure that the filesystem is available, either by being built into
- the kernel, or available as a module, then boot up UML and log in. If
- the root filesystem doesn't have the filesystem utilities (mkfs, fsck,
- etc), then get them into UML by way of the net or hostfs.
-
-
- Make the new filesystem on the device assigned to the new file::
-
-
- host# mkreiserfs /dev/ubd/4
-
-
- <----------- MKREISERFSv2 ----------->
-
- ReiserFS version 3.6.25
- Block size 4096 bytes
- Block count 25856
- Used blocks 8212
- Journal - 8192 blocks (18-8209), journal header is in block 8210
- Bitmaps: 17
- Root block 8211
- Hash function "r5"
- ATTENTION: ALL DATA WILL BE LOST ON '/dev/ubd/4'! (y/n)y
- journal size 8192 (from 18)
- Initializing journal - 0%....20%....40%....60%....80%....100%
- Syncing..done.
-
-
-
-
- Now, mount it::
-
-
- UML#
- mount /dev/ubd/4 /mnt
-
-
-
-
- and you're in business.
-
-
-
-
-
-
-
-
-
-9. Host file access
-====================
-
-
- If you want to access files on the host machine from inside UML, you
- can treat it as a separate machine and either nfs mount directories
- from the host or copy files into the virtual machine with scp or rcp.
- However, since UML is running on the host, it can access those
- files just like any other process and make them available inside the
- virtual machine without needing to use the network.
-
-
- This is now possible with the hostfs virtual filesystem. With it, you
- can mount a host directory into the UML filesystem and access the
- files contained in it just as you would on the host.
-
-
-9.1. Using hostfs
-------------------
-
- To begin with, make sure that hostfs is available inside the virtual
- machine with::
-
-
- UML# cat /proc/filesystems
-
-
-
- . hostfs should be listed. If it's not, either rebuild the kernel
- with hostfs configured into it or make sure that hostfs is built as a
- module and available inside the virtual machine, and insmod it.
-
-
- Now all you need to do is run mount::
-
-
- UML# mount none /mnt/host -t hostfs
-
-
-
-
- will mount the host's / on the virtual machine's /mnt/host.
-
-
- If you don't want to mount the host root directory, then you can
- specify a subdirectory to mount with the -o switch to mount::
-
-
- UML# mount none /mnt/home -t hostfs -o /home
-
-
-
-
- will mount the hosts's /home on the virtual machine's /mnt/home.
-
-
-
-9.2. hostfs as the root filesystem
------------------------------------
-
- It's possible to boot from a directory hierarchy on the host using
- hostfs rather than using the standard filesystem in a file.
-
- To start, you need that hierarchy. The easiest way is to loop mount
- an existing root_fs file::
-
-
- host# mount root_fs uml_root_dir -o loop
-
-
-
-
- You need to change the filesystem type of / in etc/fstab to be
- 'hostfs', so that line looks like this::
-
- /dev/ubd/0 / hostfs defaults 1 1
-
-
-
-
- Then you need to chown to yourself all the files in that directory
- that are owned by root. This worked for me::
-
-
- host# find . -uid 0 -exec chown jdike {} \;
-
-
-
-
- Next, make sure that your UML kernel has hostfs compiled in, not as a
- module. Then run UML with the boot device pointing at that directory::
-
-
- ubd0=/path/to/uml/root/directory
-
-
-
-
- UML should then boot as it does normally.
-
-
-9.3. Building hostfs
----------------------
-
- If you need to build hostfs because it's not in your kernel, you have
- two choices:
-
-
-
- - Compiling hostfs into the kernel:
-
-
- Reconfigure the kernel and set the 'Host filesystem' option under
-
-
- - Compiling hostfs as a module:
-
-
- Reconfigure the kernel and set the 'Host filesystem' option under
- be in arch/um/fs/hostfs/hostfs.o. Install that in
- ``/lib/modules/$(uname -r)/fs`` in the virtual machine, boot it up, and::
-
-
- UML# insmod hostfs
-
-
-.. _The_Management_Console:
-
-10. The Management Console
-===========================
-
-
-
- The UML management console is a low-level interface to the kernel,
- somewhat like the i386 SysRq interface. Since there is a full-blown
- operating system under UML, there is much greater flexibility possible
- than with the SysRq mechanism.
-
-
- There are a number of things you can do with the mconsole interface:
-
- - get the kernel version
-
- - add and remove devices
-
- - halt or reboot the machine
-
- - Send SysRq commands
-
- - Pause and resume the UML
-
-
- You need the mconsole client (uml_mconsole) which is present in CVS
- (/tools/mconsole) in 2.4.5-9um and later, and will be in the RPM in
- 2.4.6.
-
-
- You also need CONFIG_MCONSOLE (under 'General Setup') enabled in UML.
- When you boot UML, you'll see a line like::
-
-
- mconsole initialized on /home/jdike/.uml/umlNJ32yL/mconsole
-
-
-
-
- If you specify a unique machine id one the UML command line, i.e.::
-
-
- umid=debian
-
-
-
-
- you'll see this::
-
-
- mconsole initialized on /home/jdike/.uml/debian/mconsole
-
-
-
-
- That file is the socket that uml_mconsole will use to communicate with
- UML. Run it with either the umid or the full path as its argument::
-
-
- host% uml_mconsole debian
-
-
-
-
- or::
-
-
- host% uml_mconsole /home/jdike/.uml/debian/mconsole
-
-
-
-
- You'll get a prompt, at which you can run one of these commands:
-
- - version
-
- - halt
-
- - reboot
-
- - config
-
- - remove
-
- - sysrq
-
- - help
-
- - cad
-
- - stop
-
- - go
-
-
-10.1. version
---------------
-
- This takes no arguments. It prints the UML version::
-
-
- (mconsole) version
- OK Linux usermode 2.4.5-9um #1 Wed Jun 20 22:47:08 EDT 2001 i686
-
-
-
-
- There are a couple actual uses for this. It's a simple no-op which
- can be used to check that a UML is running. It's also a way of
- sending an interrupt to the UML. This is sometimes useful on SMP
- hosts, where there's a bug which causes signals to UML to be lost,
- often causing it to appear to hang. Sending such a UML the mconsole
- version command is a good way to 'wake it up' before networking has
- been enabled, as it does not do anything to the function of the UML.
-
-
-
-10.2. halt and reboot
-----------------------
-
- These take no arguments. They shut the machine down immediately, with
- no syncing of disks and no clean shutdown of userspace. So, they are
- pretty close to crashing the machine::
-
-
- (mconsole) halt
- OK
-
-
-
-
-
-
-10.3. config
--------------
-
- "config" adds a new device to the virtual machine. Currently the ubd
- and network drivers support this. It takes one argument, which is the
- device to add, with the same syntax as the kernel command line::
-
-
-
-
- (mconsole)
- config ubd3=/home/jdike/incoming/roots/root_fs_debian22
-
- OK
- (mconsole) config eth1=mcast
- OK
-
-
-
-
-
-
-10.4. remove
--------------
-
- "remove" deletes a device from the system. Its argument is just the
- name of the device to be removed. The device must be idle in whatever
- sense the driver considers necessary. In the case of the ubd driver,
- the removed block device must not be mounted, swapped on, or otherwise
- open, and in the case of the network driver, the device must be down::
-
-
- (mconsole) remove ubd3
- OK
- (mconsole) remove eth1
- OK
-
-
-
-
-
-
-10.5. sysrq
-------------
-
- This takes one argument, which is a single letter. It calls the
- generic kernel's SysRq driver, which does whatever is called for by
- that argument. See the SysRq documentation in
- Documentation/admin-guide/sysrq.rst in your favorite kernel tree to
- see what letters are valid and what they do.
-
-
-
-10.6. help
------------
-
- "help" returns a string listing the valid commands and what each one
- does.
-
-
-
-10.7. cad
-----------
-
- This invokes the Ctl-Alt-Del action on init. What exactly this ends
- up doing is up to /etc/inittab. Normally, it reboots the machine.
- With UML, this is usually not desired, so if a halt would be better,
- then find the section of inittab that looks like this::
-
-
- # What to do when CTRL-ALT-DEL is pressed.
- ca:12345:ctrlaltdel:/sbin/shutdown -t1 -a -r now
-
-
-
-
- and change the command to halt.
-
-
-
-10.8. stop
------------
-
- This puts the UML in a loop reading mconsole requests until a 'go'
- mconsole command is received. This is very useful for making backups
- of UML filesystems, as the UML can be stopped, then synced via 'sysrq
- s', so that everything is written to the filesystem. You can then copy
- the filesystem and then send the UML 'go' via mconsole.
-
-
- Note that a UML running with more than one CPU will have problems
- after you send the 'stop' command, as only one CPU will be held in a
- mconsole loop and all others will continue as normal. This is a bug,
- and will be fixed.
-
-
-
-10.9. go
----------
-
- This resumes a UML after being paused by a 'stop' command. Note that
- when the UML has resumed, TCP connections may have timed out and if
- the UML is paused for a long period of time, crond might go a little
- crazy, running all the jobs it didn't do earlier.
-
-
-
-
-
-
-.. _Kernel_debugging:
-
-11. Kernel debugging
-=====================
-
-
- Note: The interface that makes debugging, as described here, possible
- is present in 2.4.0-test6 kernels and later.
-
-
- Since the user-mode kernel runs as a normal Linux process, it is
- possible to debug it with gdb almost like any other process. It is
- slightly different because the kernel's threads are already being
- ptraced for system call interception, so gdb can't ptrace them.
- However, a mechanism has been added to work around that problem.
-
-
- In order to debug the kernel, you need build it from source. See
- :ref:`Compiling_the_kernel_and_modules` for information on doing that.
- Make sure that you enable CONFIG_DEBUGSYM and CONFIG_PT_PROXY during
- the config. These will compile the kernel with ``-g``, and enable the
- ptrace proxy so that gdb works with UML, respectively.
-
-
-
-
-11.1. Starting the kernel under gdb
-------------------------------------
-
- You can have the kernel running under the control of gdb from the
- beginning by putting 'debug' on the command line. You will get an
- xterm with gdb running inside it. The kernel will send some commands
- to gdb which will leave it stopped at the beginning of start_kernel.
- At this point, you can get things going with 'next', 'step', or
- 'cont'.
-
-
- There is a transcript of a debugging session here <debug-
- session.html> , with breakpoints being set in the scheduler and in an
- interrupt handler.
-
-
-11.2. Examining sleeping processes
------------------------------------
-
-
- Not every bug is evident in the currently running process. Sometimes,
- processes hang in the kernel when they shouldn't because they've
- deadlocked on a semaphore or something similar. In this case, when
- you ^C gdb and get a backtrace, you will see the idle thread, which
- isn't very relevant.
-
-
- What you want is the stack of whatever process is sleeping when it
- shouldn't be. You need to figure out which process that is, which is
- generally fairly easy. Then you need to get its host process id,
- which you can do either by looking at ps on the host or at
- task.thread.extern_pid in gdb.
-
-
- Now what you do is this:
-
- - detach from the current thread::
-
-
- (UML gdb) det
-
-
-
-
-
- - attach to the thread you are interested in::
-
-
- (UML gdb) att <host pid>
-
-
-
-
-
- - look at its stack and anything else of interest::
-
-
- (UML gdb) bt
-
-
-
-
- Note that you can't do anything at this point that requires that a
- process execute, e.g. calling a function
-
- - when you're done looking at that process, reattach to the current
- thread and continue it::
-
-
- (UML gdb)
- att 1
-
-
- (UML gdb)
- c
-
-
-
-
- Here, specifying any pid which is not the process id of a UML thread
- will cause gdb to reattach to the current thread. I commonly use 1,
- but any other invalid pid would work.
-
-
-
-11.3. Running ddd on UML
--------------------------
-
- ddd works on UML, but requires a special kludge. The process goes
- like this:
-
- - Start ddd::
-
-
- host% ddd linux
-
-
-
-
-
- - With ps, get the pid of the gdb that ddd started. You can ask the
- gdb to tell you, but for some reason that confuses things and
- causes a hang.
-
- - run UML with 'debug=parent gdb-pid=<pid>' added to the command line
- - it will just sit there after you hit return
-
- - type 'att 1' to the ddd gdb and you will see something like::
-
-
- 0xa013dc51 in __kill ()
-
-
- (gdb)
-
-
-
-
-
- - At this point, type 'c', UML will boot up, and you can use ddd just
- as you do on any other process.
-
-
-
-11.4. Debugging modules
-------------------------
-
-
- gdb has support for debugging code which is dynamically loaded into
- the process. This support is what is needed to debug kernel modules
- under UML.
-
-
- Using that support is somewhat complicated. You have to tell gdb what
- object file you just loaded into UML and where in memory it is. Then,
- it can read the symbol table, and figure out where all the symbols are
- from the load address that you provided. It gets more interesting
- when you load the module again (i.e. after an rmmod). You have to
- tell gdb to forget about all its symbols, including the main UML ones
- for some reason, then load then all back in again.
-
-
- There's an easy way and a hard way to do this. The easy way is to use
- the umlgdb expect script written by Chandan Kudige. It basically
- automates the process for you.
-
-
- First, you must tell it where your modules are. There is a list in
- the script that looks like this::
-
- set MODULE_PATHS {
- "fat" "/usr/src/uml/linux-2.4.18/fs/fat/fat.o"
- "isofs" "/usr/src/uml/linux-2.4.18/fs/isofs/isofs.o"
- "minix" "/usr/src/uml/linux-2.4.18/fs/minix/minix.o"
- }
-
-
-
-
- You change that to list the names and paths of the modules that you
- are going to debug. Then you run it from the toplevel directory of
- your UML pool and it basically tells you what to do::
-
-
- ******** GDB pid is 21903 ********
- Start UML as: ./linux <kernel switches> debug gdb-pid=21903
-
-
-
- GNU gdb 5.0rh-5 Red Hat Linux 7.1
- Copyright 2001 Free Software Foundation, Inc.
- GDB is free software, covered by the GNU General Public License, and you are
- welcome to change it and/or distribute copies of it under certain conditions.
- Type "show copying" to see the conditions.
- There is absolutely no warranty for GDB. Type "show warranty" for details.
- This GDB was configured as "i386-redhat-linux"...
- (gdb) b sys_init_module
- Breakpoint 1 at 0xa0011923: file module.c, line 349.
- (gdb) att 1
-
-
-
-
- After you run UML and it sits there doing nothing, you hit return at
- the 'att 1' and continue it::
-
-
- Attaching to program: /home/jdike/linux/2.4/um/./linux, process 1
- 0xa00f4221 in __kill ()
- (UML gdb) c
- Continuing.
-
-
-
-
- At this point, you debug normally. When you insmod something, the
- expect magic will kick in and you'll see something like::
-
-
- *** Module hostfs loaded ***
- Breakpoint 1, sys_init_module (name_user=0x805abb0 "hostfs",
- mod_user=0x8070e00) at module.c:349
- 349 char *name, *n_name, *name_tmp = NULL;
- (UML gdb) finish
- Run till exit from #0 sys_init_module (name_user=0x805abb0 "hostfs",
- mod_user=0x8070e00) at module.c:349
- 0xa00e2e23 in execute_syscall (r=0xa8140284) at syscall_kern.c:411
- 411 else res = EXECUTE_SYSCALL(syscall, regs);
- Value returned is $1 = 0
- (UML gdb)
- p/x (int)module_list + module_list->size_of_struct
-
- $2 = 0xa9021054
- (UML gdb) symbol-file ./linux
- Load new symbol table from "./linux"? (y or n) y
- Reading symbols from ./linux...
- done.
- (UML gdb)
- add-symbol-file /home/jdike/linux/2.4/um/arch/um/fs/hostfs/hostfs.o 0xa9021054
-
- add symbol table from file "/home/jdike/linux/2.4/um/arch/um/fs/hostfs/hostfs.o" at
- .text_addr = 0xa9021054
- (y or n) y
-
- Reading symbols from /home/jdike/linux/2.4/um/arch/um/fs/hostfs/hostfs.o...
- done.
- (UML gdb) p *module_list
- $1 = {size_of_struct = 84, next = 0xa0178720, name = 0xa9022de0 "hostfs",
- size = 9016, uc = {usecount = {counter = 0}, pad = 0}, flags = 1,
- nsyms = 57, ndeps = 0, syms = 0xa9023170, deps = 0x0, refs = 0x0,
- init = 0xa90221f0 <init_hostfs>, cleanup = 0xa902222c <exit_hostfs>,
- ex_table_start = 0x0, ex_table_end = 0x0, persist_start = 0x0,
- persist_end = 0x0, can_unload = 0, runsize = 0, kallsyms_start = 0x0,
- kallsyms_end = 0x0,
- archdata_start = 0x1b855 <Address 0x1b855 out of bounds>,
- archdata_end = 0xe5890000 <Address 0xe5890000 out of bounds>,
- kernel_data = 0xf689c35d <Address 0xf689c35d out of bounds>}
- >> Finished loading symbols for hostfs ...
-
-
-
-
- That's the easy way. It's highly recommended. The hard way is
- described below in case you're interested in what's going on.
-
-
- Boot the kernel under the debugger and load the module with insmod or
- modprobe. With gdb, do::
-
-
- (UML gdb) p module_list
-
-
-
-
- This is a list of modules that have been loaded into the kernel, with
- the most recently loaded module first. Normally, the module you want
- is at module_list. If it's not, walk down the next links, looking at
- the name fields until find the module you want to debug. Take the
- address of that structure, and add module.size_of_struct (which in
- 2.4.10 kernels is 96 (0x60)) to it. Gdb can make this hard addition
- for you :-)::
-
-
-
- (UML gdb)
- printf "%#x\n", (int)module_list module_list->size_of_struct
-
-
-
-
- The offset from the module start occasionally changes (before 2.4.0,
- it was module.size_of_struct + 4), so it's a good idea to check the
- init and cleanup addresses once in a while, as describe below. Now
- do::
-
-
- (UML gdb)
- add-symbol-file /path/to/module/on/host that_address
-
-
-
-
- Tell gdb you really want to do it, and you're in business.
-
-
- If there's any doubt that you got the offset right, like breakpoints
- appear not to work, or they're appearing in the wrong place, you can
- check it by looking at the module structure. The init and cleanup
- fields should look like::
-
-
- init = 0x588066b0 <init_hostfs>, cleanup = 0x588066c0 <exit_hostfs>
-
-
-
-
- with no offsets on the symbol names. If the names are right, but they
- are offset, then the offset tells you how much you need to add to the
- address you gave to add-symbol-file.
-
-
- When you want to load in a new version of the module, you need to get
- gdb to forget about the old one. The only way I've found to do that
- is to tell gdb to forget about all symbols that it knows about::
-
-
- (UML gdb) symbol-file
-
-
-
-
- Then reload the symbols from the kernel binary::
-
-
- (UML gdb) symbol-file /path/to/kernel
-
-
-
-
- and repeat the process above. You'll also need to re-enable break-
- points. They were disabled when you dumped all the symbols because
- gdb couldn't figure out where they should go.
-
-
-
-11.5. Attaching gdb to the kernel
-----------------------------------
-
- If you don't have the kernel running under gdb, you can attach gdb to
- it later by sending the tracing thread a SIGUSR1. The first line of
- the console output identifies its pid::
-
- tracing thread pid = 20093
-
-
-
-
- When you send it the signal::
-
-
- host% kill -USR1 20093
-
-
-
-
- you will get an xterm with gdb running in it.
-
-
- If you have the mconsole compiled into UML, then the mconsole client
- can be used to start gdb::
-
-
- (mconsole) (mconsole) config gdb=xterm
-
-
-
-
- will fire up an xterm with gdb running in it.
-
-
-
-11.6. Using alternate debuggers
---------------------------------
-
- UML has support for attaching to an already running debugger rather
- than starting gdb itself. This is present in CVS as of 17 Apr 2001.
- I sent it to Alan for inclusion in the ac tree, and it will be in my
- 2.4.4 release.
-
-
- This is useful when gdb is a subprocess of some UI, such as emacs or
- ddd. It can also be used to run debuggers other than gdb on UML.
- Below is an example of using strace as an alternate debugger.
-
-
- To do this, you need to get the pid of the debugger and pass it in
- with the
-
-
- If you are using gdb under some UI, then tell it to 'att 1', and
- you'll find yourself attached to UML.
-
-
- If you are using something other than gdb as your debugger, then
- you'll need to get it to do the equivalent of 'att 1' if it doesn't do
- it automatically.
-
-
- An example of an alternate debugger is strace. You can strace the
- actual kernel as follows:
-
- - Run the following in a shell::
-
-
- host%
- sh -c 'echo pid=$$; echo -n hit return; read x; exec strace -p 1 -o strace.out'
-
-
-
- - Run UML with 'debug' and 'gdb-pid=<pid>' with the pid printed out
- by the previous command
-
- - Hit return in the shell, and UML will start running, and strace
- output will start accumulating in the output file.
-
- Note that this is different from running::
-
-
- host% strace ./linux
-
-
-
-
- That will strace only the main UML thread, the tracing thread, which
- doesn't do any of the actual kernel work. It just oversees the vir-
- tual machine. In contrast, using strace as described above will show
- you the low-level activity of the virtual machine.
-
-
-
-
-
-12. Kernel debugging examples
-==============================
-
-12.1. The case of the hung fsck
---------------------------------
-
- When booting up the kernel, fsck failed, and dropped me into a shell
- to fix things up. I ran fsck -y, which hung::
-
-
- Setting hostname uml [ OK ]
- Checking root filesystem
- /dev/fhd0 was not cleanly unmounted, check forced.
- Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780.
-
- /dev/fhd0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
- (i.e., without -a or -p options)
- [ FAILED ]
-
- *** An error occurred during the file system check.
- *** Dropping you to a shell; the system will reboot
- *** when you leave the shell.
- Give root password for maintenance
- (or type Control-D for normal startup):
-
- [root@uml /root]# fsck -y /dev/fhd0
- fsck -y /dev/fhd0
- Parallelizing fsck version 1.14 (9-Jan-1999)
- e2fsck 1.14, 9-Jan-1999 for EXT2 FS 0.5b, 95/08/09
- /dev/fhd0 contains a file system with errors, check forced.
- Pass 1: Checking inodes, blocks, and sizes
- Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780. Ignore error? yes
-
- Inode 19780, i_blocks is 1548, should be 540. Fix? yes
-
- Pass 2: Checking directory structure
- Error reading block 49405 (Attempt to read block from filesystem resulted in short read). Ignore error? yes
-
- Directory inode 11858, block 0, offset 0: directory corrupted
- Salvage? yes
-
- Missing '.' in directory inode 11858.
- Fix? yes
-
- Missing '..' in directory inode 11858.
- Fix? yes
-
-
- The standard drill in this sort of situation is to fire up gdb on the
- signal thread, which, in this case, was pid 1935. In another window,
- I run gdb and attach pid 1935::
-
-
- ~/linux/2.3.26/um 1016: gdb linux
- GNU gdb 4.17.0.11 with Linux support
- Copyright 1998 Free Software Foundation, Inc.
- GDB is free software, covered by the GNU General Public License, and you are
- welcome to change it and/or distribute copies of it under certain conditions.
- Type "show copying" to see the conditions.
- There is absolutely no warranty for GDB. Type "show warranty" for details.
- This GDB was configured as "i386-redhat-linux"...
-
- (gdb) att 1935
- Attaching to program `/home/dike/linux/2.3.26/um/linux', Pid 1935
- 0x100756d9 in __wait4 ()
-
-
- Let's see what's currently running::
-
-
-
- (gdb) p current_task.pid
- $1 = 0
-
-
-
-
-
- It's the idle thread, which means that fsck went to sleep for some
- reason and never woke up.
-
-
- Let's guess that the last process in the process list is fsck::
-
-
-
- (gdb) p current_task.prev_task.comm
- $13 = "fsck.ext2\000\000\000\000\000\000"
-
-
-
-
-
- It is, so let's see what it thinks it's up to::
-
-
-
- (gdb) p current_task.prev_task.thread
- $14 = {extern_pid = 1980, tracing = 0, want_tracing = 0, forking = 0,
- kernel_stack_page = 0, signal_stack = 1342627840, syscall = {id = 4, args = {
- 3, 134973440, 1024, 0, 1024}, have_result = 0, result = 50590720},
- request = {op = 2, u = {exec = {ip = 1350467584, sp = 2952789424}, fork = {
- regs = {1350467584, 2952789424, 0 <repeats 15 times>}, sigstack = 0,
- pid = 0}, switch_to = 0x507e8000, thread = {proc = 0x507e8000,
- arg = 0xaffffdb0, flags = 0, new_pid = 0}, input_request = {
- op = 1350467584, fd = -1342177872, proc = 0, pid = 0}}}}
-
-
-
- The interesting things here are the fact that its .thread.syscall.id
- is __NR_write (see the big switch in arch/um/kernel/syscall_kern.c or
- the defines in include/asm-um/arch/unistd.h), and that it never
- returned. Also, its .request.op is OP_SWITCH (see
- arch/um/include/user_util.h). These mean that it went into a write,
- and, for some reason, called schedule().
-
-
- The fact that it never returned from write means that its stack should
- be fairly interesting. Its pid is 1980 (.thread.extern_pid). That
- process is being ptraced by the signal thread, so it must be detached
- before gdb can attach it::
-
-
-
- (gdb) call detach(1980)
-
- Program received signal SIGSEGV, Segmentation fault.
- <function called from gdb>
- The program being debugged stopped while in a function called from GDB.
- When the function (detach) is done executing, GDB will silently
- stop (instead of continuing to evaluate the expression containing
- the function call).
- (gdb) call detach(1980)
- $15 = 0
-
-
- The first detach segfaults for some reason, and the second one
- succeeds.
-
-
- Now I detach from the signal thread, attach to the fsck thread, and
- look at its stack::
-
-
- (gdb) det
- Detaching from program: /home/dike/linux/2.3.26/um/linux Pid 1935
- (gdb) att 1980
- Attaching to program `/home/dike/linux/2.3.26/um/linux', Pid 1980
- 0x10070451 in __kill ()
- (gdb) bt
- #0 0x10070451 in __kill ()
- #1 0x10068ccd in usr1_pid (pid=1980) at process.c:30
- #2 0x1006a03f in _switch_to (prev=0x50072000, next=0x507e8000)
- at process_kern.c:156
- #3 0x1006a052 in switch_to (prev=0x50072000, next=0x507e8000, last=0x50072000)
- at process_kern.c:161
- #4 0x10001d12 in schedule () at core.c:777
- #5 0x1006a744 in __down (sem=0x507d241c) at semaphore.c:71
- #6 0x1006aa10 in __down_failed () at semaphore.c:157
- #7 0x1006c5d8 in segv_handler (sc=0x5006e940) at trap_user.c:174
- #8 0x1006c5ec in kern_segv_handler (sig=11) at trap_user.c:182
- #9 <signal handler called>
- #10 0x10155404 in errno ()
- #11 0x1006c0aa in segv (address=1342179328, is_write=2) at trap_kern.c:50
- #12 0x1006c5d8 in segv_handler (sc=0x5006eaf8) at trap_user.c:174
- #13 0x1006c5ec in kern_segv_handler (sig=11) at trap_user.c:182
- #14 <signal handler called>
- #15 0xc0fd in ?? ()
- #16 0x10016647 in sys_write (fd=3,
- buf=0x80b8800 <Address 0x80b8800 out of bounds>, count=1024)
- at read_write.c:159
- #17 0x1006d5b3 in execute_syscall (syscall=4, args=0x5006ef08)
- at syscall_kern.c:254
- #18 0x1006af87 in really_do_syscall (sig=12) at syscall_user.c:35
- #19 <signal handler called>
- #20 0x400dc8b0 in ?? ()
-
-
-
-
-
- The interesting things here are:
-
- - There are two segfaults on this stack (frames 9 and 14)
-
- - The first faulting address (frame 11) is 0x50000800::
-
- (gdb) p (void *)1342179328
- $16 = (void *) 0x50000800
-
-
-
-
-
- The initial faulting address is interesting because it is on the idle
- thread's stack. I had been seeing the idle thread segfault for no
- apparent reason, and the cause looked like stack corruption. In hopes
- of catching the culprit in the act, I had turned off all protections
- to that stack while the idle thread wasn't running. This apparently
- tripped that trap.
-
-
- However, the more immediate problem is that second segfault and I'm
- going to concentrate on that. First, I want to see where the fault
- happened, so I have to go look at the sigcontent struct in frame 8::
-
-
-
- (gdb) up
- #1 0x10068ccd in usr1_pid (pid=1980) at process.c:30
- 30 kill(pid, SIGUSR1);
- (gdb)
- #2 0x1006a03f in _switch_to (prev=0x50072000, next=0x507e8000)
- at process_kern.c:156
- 156 usr1_pid(getpid());
- (gdb)
- #3 0x1006a052 in switch_to (prev=0x50072000, next=0x507e8000, last=0x50072000)
- at process_kern.c:161
- 161 _switch_to(prev, next);
- (gdb)
- #4 0x10001d12 in schedule () at core.c:777
- 777 switch_to(prev, next, prev);
- (gdb)
- #5 0x1006a744 in __down (sem=0x507d241c) at semaphore.c:71
- 71 schedule();
- (gdb)
- #6 0x1006aa10 in __down_failed () at semaphore.c:157
- 157 }
- (gdb)
- #7 0x1006c5d8 in segv_handler (sc=0x5006e940) at trap_user.c:174
- 174 segv(sc->cr2, sc->err & 2);
- (gdb)
- #8 0x1006c5ec in kern_segv_handler (sig=11) at trap_user.c:182
- 182 segv_handler(sc);
- (gdb) p *sc
- Cannot access memory at address 0x0.
-
-
-
-
- That's not very useful, so I'll try a more manual method::
-
-
- (gdb) p *((struct sigcontext *) (&sig + 1))
- $19 = {gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43,
- __dsh = 0, edi = 1342179328, esi = 1350378548, ebp = 1342630440,
- esp = 1342630420, ebx = 1348150624, edx = 1280, ecx = 0, eax = 0,
- trapno = 14, err = 4, eip = 268480945, cs = 35, __csh = 0, eflags = 66118,
- esp_at_signal = 1342630420, ss = 43, __ssh = 0, fpstate = 0x0, oldmask = 0,
- cr2 = 1280}
-
-
-
- The ip is in handle_mm_fault::
-
-
- (gdb) p (void *)268480945
- $20 = (void *) 0x1000b1b1
- (gdb) i sym $20
- handle_mm_fault + 57 in section .text
-
-
-
-
-
- Specifically, it's in pte_alloc::
-
-
- (gdb) i line *$20
- Line 124 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
- starts at address 0x1000b1b1 <handle_mm_fault+57>
- and ends at 0x1000b1b7 <handle_mm_fault+63>.
-
-
-
-
-
- To find where in handle_mm_fault this is, I'll jump forward in the
- code until I see an address in that procedure::
-
-
-
- (gdb) i line *0x1000b1c0
- Line 126 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
- starts at address 0x1000b1b7 <handle_mm_fault+63>
- and ends at 0x1000b1c3 <handle_mm_fault+75>.
- (gdb) i line *0x1000b1d0
- Line 131 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
- starts at address 0x1000b1d0 <handle_mm_fault+88>
- and ends at 0x1000b1da <handle_mm_fault+98>.
- (gdb) i line *0x1000b1e0
- Line 61 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
- starts at address 0x1000b1da <handle_mm_fault+98>
- and ends at 0x1000b1e1 <handle_mm_fault+105>.
- (gdb) i line *0x1000b1f0
- Line 134 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
- starts at address 0x1000b1f0 <handle_mm_fault+120>
- and ends at 0x1000b200 <handle_mm_fault+136>.
- (gdb) i line *0x1000b200
- Line 135 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
- starts at address 0x1000b200 <handle_mm_fault+136>
- and ends at 0x1000b208 <handle_mm_fault+144>.
- (gdb) i line *0x1000b210
- Line 139 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
- starts at address 0x1000b210 <handle_mm_fault+152>
- and ends at 0x1000b219 <handle_mm_fault+161>.
- (gdb) i line *0x1000b220
- Line 1168 of "memory.c" starts at address 0x1000b21e <handle_mm_fault+166>
- and ends at 0x1000b222 <handle_mm_fault+170>.
-
-
-
-
-
- Something is apparently wrong with the page tables or vma_structs, so
- lets go back to frame 11 and have a look at them::
-
-
-
- #11 0x1006c0aa in segv (address=1342179328, is_write=2) at trap_kern.c:50
- 50 handle_mm_fault(current, vma, address, is_write);
- (gdb) call pgd_offset_proc(vma->vm_mm, address)
- $22 = (pgd_t *) 0x80a548c
-
-
-
-
-
- That's pretty bogus. Page tables aren't supposed to be in process
- text or data areas. Let's see what's in the vma::
-
-
- (gdb) p *vma
- $23 = {vm_mm = 0x507d2434, vm_start = 0, vm_end = 134512640,
- vm_next = 0x80a4f8c, vm_page_prot = {pgprot = 0}, vm_flags = 31200,
- vm_avl_height = 2058, vm_avl_left = 0x80a8c94, vm_avl_right = 0x80d1000,
- vm_next_share = 0xaffffdb0, vm_pprev_share = 0xaffffe63,
- vm_ops = 0xaffffe7a, vm_pgoff = 2952789626, vm_file = 0xafffffec,
- vm_private_data = 0x62}
- (gdb) p *vma.vm_mm
- $24 = {mmap = 0x507d2434, mmap_avl = 0x0, mmap_cache = 0x8048000,
- pgd = 0x80a4f8c, mm_users = {counter = 0}, mm_count = {counter = 134904288},
- map_count = 134909076, mmap_sem = {count = {counter = 135073792},
- sleepers = -1342177872, wait = {lock = <optimized out or zero length>,
- task_list = {next = 0xaffffe63, prev = 0xaffffe7a},
- __magic = -1342177670, __creator = -1342177300}, __magic = 98},
- page_table_lock = {}, context = 138, start_code = 0, end_code = 0,
- start_data = 0, end_data = 0, start_brk = 0, brk = 0, start_stack = 0,
- arg_start = 0, arg_end = 0, env_start = 0, env_end = 0, rss = 1350381536,
- total_vm = 0, locked_vm = 0, def_flags = 0, cpu_vm_mask = 0, swap_cnt = 0,
- swap_address = 0, segments = 0x0}
-
-
-
- This also pretty bogus. With all of the 0x80xxxxx and 0xaffffxxx
- addresses, this is looking like a stack was plonked down on top of
- these structures. Maybe it's a stack overflow from the next page::
-
-
- (gdb) p vma
- $25 = (struct vm_area_struct *) 0x507d2434
-
-
-
- That's towards the lower quarter of the page, so that would have to
- have been pretty heavy stack overflow::
-
-
- (gdb) x/100x $25
- 0x507d2434: 0x507d2434 0x00000000 0x08048000 0x080a4f8c
- 0x507d2444: 0x00000000 0x080a79e0 0x080a8c94 0x080d1000
- 0x507d2454: 0xaffffdb0 0xaffffe63 0xaffffe7a 0xaffffe7a
- 0x507d2464: 0xafffffec 0x00000062 0x0000008a 0x00000000
- 0x507d2474: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2484: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2494: 0x00000000 0x00000000 0x507d2fe0 0x00000000
- 0x507d24a4: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d24b4: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d24c4: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d24d4: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d24e4: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d24f4: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2504: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2514: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2524: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2534: 0x00000000 0x00000000 0x507d25dc 0x00000000
- 0x507d2544: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2554: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2564: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2574: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2584: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2594: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d25a4: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d25b4: 0x00000000 0x00000000 0x00000000 0x00000000
-
-
-
- It's not stack overflow. The only "stack-like" piece of this data is
- the vma_struct itself.
-
-
- At this point, I don't see any avenues to pursue, so I just have to
- admit that I have no idea what's going on. What I will do, though, is
- stick a trap on the segfault handler which will stop if it sees any
- writes to the idle thread's stack. That was the thing that happened
- first, and it may be that if I can catch it immediately, what's going
- on will be somewhat clearer.
-
-
-12.2. Episode 2: The case of the hung fsck
--------------------------------------------
-
- After setting a trap in the SEGV handler for accesses to the signal
- thread's stack, I reran the kernel.
-
-
- fsck hung again, this time by hitting the trap::
-
-
-
- Setting hostname uml [ OK ]
- Checking root filesystem
- /dev/fhd0 contains a file system with errors, check forced.
- Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780.
-
- /dev/fhd0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
- (i.e., without -a or -p options)
- [ FAILED ]
-
- *** An error occurred during the file system check.
- *** Dropping you to a shell; the system will reboot
- *** when you leave the shell.
- Give root password for maintenance
- (or type Control-D for normal startup):
-
- [root@uml /root]# fsck -y /dev/fhd0
- fsck -y /dev/fhd0
- Parallelizing fsck version 1.14 (9-Jan-1999)
- e2fsck 1.14, 9-Jan-1999 for EXT2 FS 0.5b, 95/08/09
- /dev/fhd0 contains a file system with errors, check forced.
- Pass 1: Checking inodes, blocks, and sizes
- Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780. Ignore error? yes
-
- Pass 2: Checking directory structure
- Error reading block 49405 (Attempt to read block from filesystem resulted in short read). Ignore error? yes
-
- Directory inode 11858, block 0, offset 0: directory corrupted
- Salvage? yes
-
- Missing '.' in directory inode 11858.
- Fix? yes
-
- Missing '..' in directory inode 11858.
- Fix? yes
-
- Untested (4127) [100fe44c]: trap_kern.c line 31
-
-
-
-
-
- I need to get the signal thread to detach from pid 4127 so that I can
- attach to it with gdb. This is done by sending it a SIGUSR1, which is
- caught by the signal thread, which detaches the process::
-
-
- kill -USR1 4127
-
-
-
-
-
- Now I can run gdb on it::
-
-
- ~/linux/2.3.26/um 1034: gdb linux
- GNU gdb 4.17.0.11 with Linux support
- Copyright 1998 Free Software Foundation, Inc.
- GDB is free software, covered by the GNU General Public License, and you are
- welcome to change it and/or distribute copies of it under certain conditions.
- Type "show copying" to see the conditions.
- There is absolutely no warranty for GDB. Type "show warranty" for details.
- This GDB was configured as "i386-redhat-linux"...
- (gdb) att 4127
- Attaching to program `/home/dike/linux/2.3.26/um/linux', Pid 4127
- 0x10075891 in __libc_nanosleep ()
-
-
-
-
-
- The backtrace shows that it was in a write and that the fault address
- (address in frame 3) is 0x50000800, which is right in the middle of
- the signal thread's stack page::
-
-
- (gdb) bt
- #0 0x10075891 in __libc_nanosleep ()
- #1 0x1007584d in __sleep (seconds=1000000)
- at ../sysdeps/unix/sysv/linux/sleep.c:78
- #2 0x1006ce9a in stop () at user_util.c:191
- #3 0x1006bf88 in segv (address=1342179328, is_write=2) at trap_kern.c:31
- #4 0x1006c628 in segv_handler (sc=0x5006eaf8) at trap_user.c:174
- #5 0x1006c63c in kern_segv_handler (sig=11) at trap_user.c:182
- #6 <signal handler called>
- #7 0xc0fd in ?? ()
- #8 0x10016647 in sys_write (fd=3, buf=0x80b8800 "R.", count=1024)
- at read_write.c:159
- #9 0x1006d603 in execute_syscall (syscall=4, args=0x5006ef08)
- at syscall_kern.c:254
- #10 0x1006af87 in really_do_syscall (sig=12) at syscall_user.c:35
- #11 <signal handler called>
- #12 0x400dc8b0 in ?? ()
- #13 <signal handler called>
- #14 0x400dc8b0 in ?? ()
- #15 0x80545fd in ?? ()
- #16 0x804daae in ?? ()
- #17 0x8054334 in ?? ()
- #18 0x804d23e in ?? ()
- #19 0x8049632 in ?? ()
- #20 0x80491d2 in ?? ()
- #21 0x80596b5 in ?? ()
- (gdb) p (void *)1342179328
- $3 = (void *) 0x50000800
-
-
-
- Going up the stack to the segv_handler frame and looking at where in
- the code the access happened shows that it happened near line 110 of
- block_dev.c::
-
-
-
- (gdb) up
- #1 0x1007584d in __sleep (seconds=1000000)
- at ../sysdeps/unix/sysv/linux/sleep.c:78
- ../sysdeps/unix/sysv/linux/sleep.c:78: No such file or directory.
- (gdb)
- #2 0x1006ce9a in stop () at user_util.c:191
- 191 while(1) sleep(1000000);
- (gdb)
- #3 0x1006bf88 in segv (address=1342179328, is_write=2) at trap_kern.c:31
- 31 KERN_UNTESTED();
- (gdb)
- #4 0x1006c628 in segv_handler (sc=0x5006eaf8) at trap_user.c:174
- 174 segv(sc->cr2, sc->err & 2);
- (gdb) p *sc
- $1 = {gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43,
- __dsh = 0, edi = 1342179328, esi = 134973440, ebp = 1342631484,
- esp = 1342630864, ebx = 256, edx = 0, ecx = 256, eax = 1024, trapno = 14,
- err = 6, eip = 268550834, cs = 35, __csh = 0, eflags = 66070,
- esp_at_signal = 1342630864, ss = 43, __ssh = 0, fpstate = 0x0, oldmask = 0,
- cr2 = 1342179328}
- (gdb) p (void *)268550834
- $2 = (void *) 0x1001c2b2
- (gdb) i sym $2
- block_write + 1090 in section .text
- (gdb) i line *$2
- Line 209 of "/home/dike/linux/2.3.26/um/include/asm/arch/string.h"
- starts at address 0x1001c2a1 <block_write+1073>
- and ends at 0x1001c2bf <block_write+1103>.
- (gdb) i line *0x1001c2c0
- Line 110 of "block_dev.c" starts at address 0x1001c2bf <block_write+1103>
- and ends at 0x1001c2e3 <block_write+1139>.
-
-
-
- Looking at the source shows that the fault happened during a call to
- copy_from_user to copy the data into the kernel::
-
-
- 107 count -= chars;
- 108 copy_from_user(p,buf,chars);
- 109 p += chars;
- 110 buf += chars;
-
-
-
- p is the pointer which must contain 0x50000800, since buf contains
- 0x80b8800 (frame 8 above). It is defined as::
-
-
- p = offset + bh->b_data;
-
-
-
-
-
- I need to figure out what bh is, and it just so happens that bh is
- passed as an argument to mark_buffer_uptodate and mark_buffer_dirty a
- few lines later, so I do a little disassembly::
-
-
- (gdb) disas 0x1001c2bf 0x1001c2e0
- Dump of assembler code from 0x1001c2bf to 0x1001c2d0:
- 0x1001c2bf <block_write+1103>: addl %eax,0xc(%ebp)
- 0x1001c2c2 <block_write+1106>: movl 0xfffffdd4(%ebp),%edx
- 0x1001c2c8 <block_write+1112>: btsl $0x0,0x18(%edx)
- 0x1001c2cd <block_write+1117>: btsl $0x1,0x18(%edx)
- 0x1001c2d2 <block_write+1122>: sbbl %ecx,%ecx
- 0x1001c2d4 <block_write+1124>: testl %ecx,%ecx
- 0x1001c2d6 <block_write+1126>: jne 0x1001c2e3 <block_write+1139>
- 0x1001c2d8 <block_write+1128>: pushl $0x0
- 0x1001c2da <block_write+1130>: pushl %edx
- 0x1001c2db <block_write+1131>: call 0x1001819c <__mark_buffer_dirty>
- End of assembler dump.
-
-
-
-
-
- At that point, bh is in %edx (address 0x1001c2da), which is calculated
- at 0x1001c2c2 as %ebp + 0xfffffdd4, so I figure exactly what that is,
- taking %ebp from the sigcontext_struct above::
-
-
- (gdb) p (void *)1342631484
- $5 = (void *) 0x5006ee3c
- (gdb) p 0x5006ee3c+0xfffffdd4
- $6 = 1342630928
- (gdb) p (void *)$6
- $7 = (void *) 0x5006ec10
- (gdb) p *((void **)$7)
- $8 = (void *) 0x50100200
-
-
-
-
-
- Now, I look at the structure to see what's in it, and particularly,
- what its b_data field contains::
-
-
- (gdb) p *((struct buffer_head *)0x50100200)
- $13 = {b_next = 0x50289380, b_blocknr = 49405, b_size = 1024, b_list = 0,
- b_dev = 15872, b_count = {counter = 1}, b_rdev = 15872, b_state = 24,
- b_flushtime = 0, b_next_free = 0x501001a0, b_prev_free = 0x50100260,
- b_this_page = 0x501001a0, b_reqnext = 0x0, b_pprev = 0x507fcf58,
- b_data = 0x50000800 "", b_page = 0x50004000,
- b_end_io = 0x10017f60 <end_buffer_io_sync>, b_dev_id = 0x0,
- b_rsector = 98810, b_wait = {lock = <optimized out or zero length>,
- task_list = {next = 0x50100248, prev = 0x50100248}, __magic = 1343226448,
- __creator = 0}, b_kiobuf = 0x0}
-
-
-
-
-
- The b_data field is indeed 0x50000800, so the question becomes how
- that happened. The rest of the structure looks fine, so this probably
- is not a case of data corruption. It happened on purpose somehow.
-
-
- The b_page field is a pointer to the page_struct representing the
- 0x50000000 page. Looking at it shows the kernel's idea of the state
- of that page::
-
-
-
- (gdb) p *$13.b_page
- $17 = {list = {next = 0x50004a5c, prev = 0x100c5174}, mapping = 0x0,
- index = 0, next_hash = 0x0, count = {counter = 1}, flags = 132, lru = {
- next = 0x50008460, prev = 0x50019350}, wait = {
- lock = <optimized out or zero length>, task_list = {next = 0x50004024,
- prev = 0x50004024}, __magic = 1342193708, __creator = 0},
- pprev_hash = 0x0, buffers = 0x501002c0, virtual = 1342177280,
- zone = 0x100c5160}
-
-
-
-
-
- Some sanity-checking: the virtual field shows the "virtual" address of
- this page, which in this kernel is the same as its "physical" address,
- and the page_struct itself should be mem_map[0], since it represents
- the first page of memory::
-
-
-
- (gdb) p (void *)1342177280
- $18 = (void *) 0x50000000
- (gdb) p mem_map
- $19 = (mem_map_t *) 0x50004000
-
-
-
-
-
- These check out fine.
-
-
- Now to check out the page_struct itself. In particular, the flags
- field shows whether the page is considered free or not::
-
-
- (gdb) p (void *)132
- $21 = (void *) 0x84
-
-
-
-
-
- The "reserved" bit is the high bit, which is definitely not set, so
- the kernel considers the signal stack page to be free and available to
- be used.
-
-
- At this point, I jump to conclusions and start looking at my early
- boot code, because that's where that page is supposed to be reserved.
-
-
- In my setup_arch procedure, I have the following code which looks just
- fine::
-
-
-
- bootmap_size = init_bootmem(start_pfn, end_pfn - start_pfn);
- free_bootmem(__pa(low_physmem) + bootmap_size, high_physmem - low_physmem);
-
-
-
-
-
- Two stack pages have already been allocated, and low_physmem points to
- the third page, which is the beginning of free memory.
- The init_bootmem call declares the entire memory to the boot memory
- manager, which marks it all reserved. The free_bootmem call frees up
- all of it, except for the first two pages. This looks correct to me.
-
-
- So, I decide to see init_bootmem run and make sure that it is marking
- those first two pages as reserved. I never get that far.
-
-
- Stepping into init_bootmem, and looking at bootmem_map before looking
- at what it contains shows the following::
-
-
-
- (gdb) p bootmem_map
- $3 = (void *) 0x50000000
-
-
-
-
-
- Aha! The light dawns. That first page is doing double duty as a
- stack and as the boot memory map. The last thing that the boot memory
- manager does is to free the pages used by its memory map, so this page
- is getting freed even its marked as reserved.
-
-
- The fix was to initialize the boot memory manager before allocating
- those two stack pages, and then allocate them through the boot memory
- manager. After doing this, and fixing a couple of subsequent buglets,
- the stack corruption problem disappeared.
-
-
-
-
-
-13. What to do when UML doesn't work
-=====================================
-
-
-
-
-13.1. Strange compilation errors when you build from source
-------------------------------------------------------------
-
- As of test11, it is necessary to have "ARCH=um" in the environment or
- on the make command line for all steps in building UML, including
- clean, distclean, or mrproper, config, menuconfig, or xconfig, dep,
- and linux. If you forget for any of them, the i386 build seems to
- contaminate the UML build. If this happens, start from scratch with::
-
-
- host%
- make mrproper ARCH=um
-
-
-
-
- and repeat the build process with ARCH=um on all the steps.
-
-
- See :ref:`Compiling_the_kernel_and_modules` for more details.
-
-
- Another cause of strange compilation errors is building UML in
- /usr/src/linux. If you do this, the first thing you need to do is
- clean up the mess you made. The /usr/src/linux/asm link will now
- point to /usr/src/linux/asm-um. Make it point back to
- /usr/src/linux/asm-i386. Then, move your UML pool someplace else and
- build it there. Also see below, where a more specific set of symptoms
- is described.
-
-
-
-13.3. A variety of panics and hangs with /tmp on a reiserfs filesystem
------------------------------------------------------------------------
-
- I saw this on reiserfs 3.5.21 and it seems to be fixed in 3.5.27.
- Panics preceded by::
-
-
- Detaching pid nnnn
-
-
-
- are diagnostic of this problem. This is a reiserfs bug which causes a
- thread to occasionally read stale data from a mmapped page shared with
- another thread. The fix is to upgrade the filesystem or to have /tmp
- be an ext2 filesystem.
-
-
-
- 13.4. The compile fails with errors about conflicting types for
- 'open', 'dup', and 'waitpid'
-
- This happens when you build in /usr/src/linux. The UML build makes
- the include/asm link point to include/asm-um. /usr/include/asm points
- to /usr/src/linux/include/asm, so when that link gets moved, files
- which need to include the asm-i386 versions of headers get the
- incompatible asm-um versions. The fix is to move the include/asm link
- back to include/asm-i386 and to do UML builds someplace else.
-
-
-
-13.5. UML doesn't work when /tmp is an NFS filesystem
-------------------------------------------------------
-
- This seems to be a similar situation with the ReiserFS problem above.
- Some versions of NFS seems not to handle mmap correctly, which UML
- depends on. The workaround is have /tmp be a non-NFS directory.
-
-
-13.6. UML hangs on boot when compiled with gprof support
----------------------------------------------------------
-
- If you build UML with gprof support and, early in the boot, it does
- this::
-
-
- kernel BUG at page_alloc.c:100!
-
-
-
-
- you have a buggy gcc. You can work around the problem by removing
- UM_FASTCALL from CFLAGS in arch/um/Makefile-i386. This will open up
- another bug, but that one is fairly hard to reproduce.
-
-
-
-13.7. syslogd dies with a SIGTERM on startup
----------------------------------------------
-
- The exact boot error depends on the distribution that you're booting,
- but Debian produces this::
-
-
- /etc/rc2.d/S10sysklogd: line 49: 93 Terminated
- start-stop-daemon --start --quiet --exec /sbin/syslogd -- $SYSLOGD
-
-
-
-
- This is a syslogd bug. There's a race between a parent process
- installing a signal handler and its child sending the signal.
-
-
-
-13.8. TUN/TAP networking doesn't work on a 2.4 host
-----------------------------------------------------
-
- There are a couple of problems which were reported by
- Tim Robinson <timro at trkr dot net>
-
- - It doesn't work on hosts running 2.4.7 (or thereabouts) or earlier.
- The fix is to upgrade to something more recent and then read the
- next item.
-
- - If you see::
-
-
- File descriptor in bad state
-
-
-
- when you bring up the device inside UML, you have a header mismatch
- between the original kernel and the upgraded one. Make /usr/src/linux
- point at the new headers. This will only be a problem if you build
- uml_net yourself.
-
-
-
-13.9. You can network to the host but not to other machines on the net
-=======================================================================
-
- If you can connect to the host, and the host can connect to UML, but
- you cannot connect to any other machines, then you may need to enable
- IP Masquerading on the host. Usually this is only experienced when
- using private IP addresses (192.168.x.x or 10.x.x.x) for host/UML
- networking, rather than the public address space that your host is
- connected to. UML does not enable IP Masquerading, so you will need
- to create a static rule to enable it::
-
-
- host%
- iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
-
-
-
-
- Replace eth0 with the interface that you use to talk to the rest of
- the world.
-
-
- Documentation on IP Masquerading, and SNAT, can be found at
- http://www.netfilter.org.
-
-
- If you can reach the local net, but not the outside Internet, then
- that is usually a routing problem. The UML needs a default route::
-
-
- UML#
- route add default gw gateway IP
-
-
-
-
- The gateway IP can be any machine on the local net that knows how to
- reach the outside world. Usually, this is the host or the local net-
- work's gateway.
-
-
- Occasionally, we hear from someone who can reach some machines, but
- not others on the same net, or who can reach some ports on other
- machines, but not others. These are usually caused by strange
- firewalling somewhere between the UML and the other box. You track
- this down by running tcpdump on every interface the packets travel
- over and see where they disappear. When you find a machine that takes
- the packets in, but does not send them onward, that's the culprit.
-
-
-
-13.10. I have no root and I want to scream
-===========================================
-
- Thanks to Birgit Wahlich for telling me about this strange one. It
- turns out that there's a limit of six environment variables on the
- kernel command line. When that limit is reached or exceeded, argument
- processing stops, which means that the 'root=' argument that UML
- usually adds is not seen. So, the filesystem has no idea what the
- root device is, so it panics.
-
-
- The fix is to put less stuff on the command line. Glomming all your
- setup variables into one is probably the best way to go.
-
-
-
-13.11. UML build conflict between ptrace.h and ucontext.h
-==========================================================
-
- On some older systems, /usr/include/asm/ptrace.h and
- /usr/include/sys/ucontext.h define the same names. So, when they're
- included together, the defines from one completely mess up the parsing
- of the other, producing errors like::
-
- /usr/include/sys/ucontext.h:47: parse error before
- `10`
-
-
-
-
- plus a pile of warnings.
-
-
- This is a libc botch, which has since been fixed, and I don't see any
- way around it besides upgrading.
-
-
-
-13.12. The UML BogoMips is exactly half the host's BogoMips
-------------------------------------------------------------
-
- On i386 kernels, there are two ways of running the loop that is used
- to calculate the BogoMips rating, using the TSC if it's there or using
- a one-instruction loop. The TSC produces twice the BogoMips as the
- loop. UML uses the loop, since it has nothing resembling a TSC, and
- will get almost exactly the same BogoMips as a host using the loop.
- However, on a host with a TSC, its BogoMips will be double the loop
- BogoMips, and therefore double the UML BogoMips.
-
-
-
-13.13. When you run UML, it immediately segfaults
---------------------------------------------------
-
- If the host is configured with the 2G/2G address space split, that's
- why. See ref:`UML_on_2G/2G_hosts` for the details on getting UML to
- run on your host.
-
-
-
-13.14. xterms appear, then immediately disappear
--------------------------------------------------
-
- If you're running an up to date kernel with an old release of
- uml_utilities, the port-helper program will not work properly, so
- xterms will exit straight after they appear. The solution is to
- upgrade to the latest release of uml_utilities. Usually this problem
- occurs when you have installed a packaged release of UML then compiled
- your own development kernel without upgrading the uml_utilities from
- the source distribution.
-
-
-
-13.15. Any other panic, hang, or strange behavior
---------------------------------------------------
-
- If you're seeing truly strange behavior, such as hangs or panics that
- happen in random places, or you try running the debugger to see what's
- happening and it acts strangely, then it could be a problem in the
- host kernel. If you're not running a stock Linus or -ac kernel, then
- try that. An early version of the preemption patch and a 2.4.10 SuSE
- kernel have caused very strange problems in UML.
-
-
- Otherwise, let me know about it. Send a message to one of the UML
- mailing lists - either the developer list - user-mode-linux-devel at
- lists dot sourceforge dot net (subscription info) or the user list -
- user-mode-linux-user at lists dot sourceforge do net (subscription
- info), whichever you prefer. Don't assume that everyone knows about
- it and that a fix is imminent.
-
-
- If you want to be super-helpful, read :ref:`Diagnosing_Problems` and
- follow the instructions contained therein.
-
-.. _Diagnosing_Problems:
-
-14. Diagnosing Problems
-========================
-
-
- If you get UML to crash, hang, or otherwise misbehave, you should
- report this on one of the project mailing lists, either the developer
- list - user-mode-linux-devel at lists dot sourceforge dot net
- (subscription info) or the user list - user-mode-linux-user at lists
- dot sourceforge dot net (subscription info). When you do, it is
- likely that I will want more information. So, it would be helpful to
- read the stuff below, do whatever is applicable in your case, and
- report the results to the list.
-
-
- For any diagnosis, you're going to need to build a debugging kernel.
- The binaries from this site aren't debuggable. If you haven't done
- this before, read about :ref:`Compiling_the_kernel_and_modules` and
- :ref:`Kernel_debugging` UML first.
-
-
-14.1. Case 1 : Normal kernel panics
-------------------------------------
-
- The most common case is for a normal thread to panic. To debug this,
- you will need to run it under the debugger (add 'debug' to the command
- line). An xterm will start up with gdb running inside it. Continue
- it when it stops in start_kernel and make it crash. Now ``^C gdb`` and
-
-
- If the panic was a "Kernel mode fault", then there will be a segv
- frame on the stack and I'm going to want some more information. The
- stack might look something like this::
-
-
- (UML gdb) backtrace
- #0 0x1009bf76 in __sigprocmask (how=1, set=0x5f347940, oset=0x0)
- at ../sysdeps/unix/sysv/linux/sigprocmask.c:49
- #1 0x10091411 in change_sig (signal=10, on=1) at process.c:218
- #2 0x10094785 in timer_handler (sig=26) at time_kern.c:32
- #3 0x1009bf38 in __restore ()
- at ../sysdeps/unix/sysv/linux/i386/sigaction.c:125
- #4 0x1009534c in segv (address=8, ip=268849158, is_write=2, is_user=0)
- at trap_kern.c:66
- #5 0x10095c04 in segv_handler (sig=11) at trap_user.c:285
- #6 0x1009bf38 in __restore ()
-
-
-
-
- I'm going to want to see the symbol and line information for the value
- of ip in the segv frame. In this case, you would do the following::
-
-
- (UML gdb) i sym 268849158
-
-
-
-
- and::
-
-
- (UML gdb) i line *268849158
-
-
-
-
- The reason for this is the __restore frame right above the segv_han-
- dler frame is hiding the frame that actually segfaulted. So, I have
- to get that information from the faulting ip.
-
-
-14.2. Case 2 : Tracing thread panics
--------------------------------------
-
- The less common and more painful case is when the tracing thread
- panics. In this case, the kernel debugger will be useless because it
- needs a healthy tracing thread in order to work. The first thing to
- do is get a backtrace from the tracing thread. This is done by
- figuring out what its pid is, firing up gdb, and attaching it to that
- pid. You can figure out the tracing thread pid by looking at the
- first line of the console output, which will look like this::
-
-
- tracing thread pid = 15851
-
-
-
-
- or by running ps on the host and finding the line that looks like
- this::
-
-
- jdike 15851 4.5 0.4 132568 1104 pts/0 S 21:34 0:05 ./linux [(tracing thread)]
-
-
-
-
- If the panic was 'segfault in signals', then follow the instructions
- above for collecting information about the location of the seg fault.
-
-
- If the tracing thread flaked out all by itself, then send that
- backtrace in and wait for our crack debugging team to fix the problem.
-
-
- 14.3. Case 3 : Tracing thread panics caused by other threads
-
- However, there are cases where the misbehavior of another thread
- caused the problem. The most common panic of this type is::
-
-
- wait_for_stop failed to wait for <pid> to stop with <signal number>
-
-
-
-
- In this case, you'll need to get a backtrace from the process men-
- tioned in the panic, which is complicated by the fact that the kernel
- debugger is defunct and without some fancy footwork, another gdb can't
- attach to it. So, this is how the fancy footwork goes:
-
- In a shell::
-
-
- host% kill -STOP pid
-
-
-
-
- Run gdb on the tracing thread as described in case 2 and do::
-
-
- (host gdb) call detach(pid)
-
-
- If you get a segfault, do it again. It always works the second time.
-
- Detach from the tracing thread and attach to that other thread::
-
-
- (host gdb) detach
-
-
-
-
-
-
- (host gdb) attach pid
-
-
-
-
- If gdb hangs when attaching to that process, go back to a shell and
- do::
-
-
- host%
- kill -CONT pid
-
-
-
-
- And then get the backtrace::
-
-
- (host gdb) backtrace
-
-
-
-
-
-14.4. Case 4 : Hangs
----------------------
-
- Hangs seem to be fairly rare, but they sometimes happen. When a hang
- happens, we need a backtrace from the offending process. Run the
- kernel debugger as described in case 1 and get a backtrace. If the
- current process is not the idle thread, then send in the backtrace.
- You can tell that it's the idle thread if the stack looks like this::
-
-
- #0 0x100b1401 in __libc_nanosleep ()
- #1 0x100a2885 in idle_sleep (secs=10) at time.c:122
- #2 0x100a546f in do_idle () at process_kern.c:445
- #3 0x100a5508 in cpu_idle () at process_kern.c:471
- #4 0x100ec18f in start_kernel () at init/main.c:592
- #5 0x100a3e10 in start_kernel_proc (unused=0x0) at um_arch.c:71
- #6 0x100a383f in signal_tramp (arg=0x100a3dd8) at trap_user.c:50
-
-
-
-
- If this is the case, then some other process is at fault, and went to
- sleep when it shouldn't have. Run ps on the host and figure out which
- process should not have gone to sleep and stayed asleep. Then attach
- to it with gdb and get a backtrace as described in case 3.
-
-
-
-
-
-
-15. Thanks
-===========
-
-
- A number of people have helped this project in various ways, and this
- page gives recognition where recognition is due.
-
-
- If you're listed here and you would prefer a real link on your name,
- or no link at all, instead of the despammed email address pseudo-link,
- let me know.
-
-
- If you're not listed here and you think maybe you should be, please
- let me know that as well. I try to get everyone, but sometimes my
- bookkeeping lapses and I forget about contributions.
-
-
-15.1. Code and Documentation
------------------------------
-
- Rusty Russell <rusty at linuxcare.com.au> -
-
- - wrote the HOWTO
- http://user-mode-linux.sourceforge.net/old/UserModeLinux-HOWTO.html
-
- - prodded me into making this project official and putting it on
- SourceForge
-
- - came up with the way cool UML logo
- http://user-mode-linux.sourceforge.net/uml-small.png
-
- - redid the config process
-
-
- Peter Moulder <reiter at netspace.net.au> - Fixed my config and build
- processes, and added some useful code to the block driver
-
-
- Bill Stearns <wstearns at pobox.com> -
-
- - HOWTO updates
-
- - lots of bug reports
-
- - lots of testing
-
- - dedicated a box (uml.ists.dartmouth.edu) to support UML development
-
- - wrote the mkrootfs script, which allows bootable filesystems of
- RPM-based distributions to be cranked out
-
- - cranked out a large number of filesystems with said script
-
-
- Jim Leu <jleu at mindspring.com> - Wrote the virtual ethernet driver
- and associated usermode tools
-
- Lars Brinkhoff http://lars.nocrew.org/ - Contributed the ptrace
- proxy from his own project to allow easier kernel debugging
-
-
- Andrea Arcangeli <andrea at suse.de> - Redid some of the early boot
- code so that it would work on machines with Large File Support
-
-
- Chris Emerson - Did the first UML port to Linux/ppc
-
-
- Harald Welte <laforge at gnumonks.org> - Wrote the multicast
- transport for the network driver
-
-
- Jorgen Cederlof - Added special file support to hostfs
-
-
- Greg Lonnon <glonnon at ridgerun dot com> - Changed the ubd driver
- to allow it to layer a COW file on a shared read-only filesystem and
- wrote the iomem emulation support
-
-
- Henrik Nordstrom http://hem.passagen.se/hno/ - Provided a variety
- of patches, fixes, and clues
-
-
- Lennert Buytenhek - Contributed various patches, a rewrite of the
- network driver, the first implementation of the mconsole driver, and
- did the bulk of the work needed to get SMP working again.
-
-
- Yon Uriarte - Fixed the TUN/TAP network backend while I slept.
-
-
- Adam Heath - Made a bunch of nice cleanups to the initialization code,
- plus various other small patches.
-
-
- Matt Zimmerman - Matt volunteered to be the UML Debian maintainer and
- is doing a real nice job of it. He also noticed and fixed a number of
- actually and potentially exploitable security holes in uml_net. Plus
- the occasional patch. I like patches.
-
-
- James McMechan - James seems to have taken over maintenance of the ubd
- driver and is doing a nice job of it.
-
-
- Chandan Kudige - wrote the umlgdb script which automates the reloading
- of module symbols.
-
-
- Steve Schmidtke - wrote the UML slirp transport and hostaudio drivers,
- enabling UML processes to access audio devices on the host. He also
- submitted patches for the slip transport and lots of other things.
-
-
- David Coulson http://davidcoulson.net -
-
- - Set up the http://usermodelinux.org site,
- which is a great way of keeping the UML user community on top of
- UML goings-on.
-
- - Site documentation and updates
-
- - Nifty little UML management daemon UMLd
-
- - Lots of testing and bug reports
-
-
-
-
-15.2. Flushing out bugs
-------------------------
-
-
-
- - Yuri Pudgorodsky
-
- - Gerald Britton
-
- - Ian Wehrman
-
- - Gord Lamb
-
- - Eugene Koontz
-
- - John H. Hartman
-
- - Anders Karlsson
-
- - Daniel Phillips
-
- - John Fremlin
-
- - Rainer Burgstaller
-
- - James Stevenson
-
- - Matt Clay
-
- - Cliff Jefferies
-
- - Geoff Hoff
-
- - Lennert Buytenhek
-
- - Al Viro
-
- - Frank Klingenhoefer
-
- - Livio Baldini Soares
-
- - Jon Burgess
-
- - Petru Paler
-
- - Paul
-
- - Chris Reahard
-
- - Sverker Nilsson
-
- - Gong Su
-
- - johan verrept
-
- - Bjorn Eriksson
-
- - Lorenzo Allegrucci
-
- - Muli Ben-Yehuda
-
- - David Mansfield
-
- - Howard Goff
-
- - Mike Anderson
-
- - John Byrne
-
- - Sapan J. Batia
-
- - Iris Huang
-
- - Jan Hudec
-
- - Voluspa
-
-
-
-
-15.3. Buglets and clean-ups
-----------------------------
-
-
-
- - Dave Zarzycki
-
- - Adam Lazur
-
- - Boria Feigin
-
- - Brian J. Murrell
-
- - JS
-
- - Roman Zippel
-
- - Wil Cooley
-
- - Ayelet Shemesh
-
- - Will Dyson
-
- - Sverker Nilsson
-
- - dvorak
-
- - v.naga srinivas
-
- - Shlomi Fish
-
- - Roger Binns
-
- - johan verrept
-
- - MrChuoi
-
- - Peter Cleve
-
- - Vincent Guffens
-
- - Nathan Scott
-
- - Patrick Caulfield
-
- - jbearce
-
- - Catalin Marinas
-
- - Shane Spencer
-
- - Zou Min
-
-
- - Ryan Boder
-
- - Lorenzo Colitti
-
- - Gwendal Grignou
-
- - Andre' Breiler
-
- - Tsutomu Yasuda
-
-
-
-15.4. Case Studies
--------------------
-
-
- - Jon Wright
-
- - William McEwan
-
- - Michael Richardson
-
-
-
-15.5. Other contributions
---------------------------
-
-
- Bill Carr <Bill.Carr at compaq.com> made the Red Hat mkrootfs script
- work with RH 6.2.
-
- Michael Jennings <mikejen at hevanet.com> sent in some material which
- is now gracing the top of the index page
- http://user-mode-linux.sourceforge.net/ of this site.
-
- SGI (and more specifically Ralf Baechle <ralf at
- uni-koblenz.de> ) gave me an account on oss.sgi.com.
- The bandwidth there made it possible to
- produce most of the filesystems available on the project download
- page.
-
- Laurent Bonnaud <Laurent.Bonnaud at inpg.fr> took the old grotty
- Debian filesystem that I've been distributing and updated it to 2.2.
- It is now available by itself here.
-
- Rik van Riel gave me some ftp space on ftp.nl.linux.org so I can make
- releases even when Sourceforge is broken.
-
- Rodrigo de Castro looked at my broken pte code and told me what was
- wrong with it, letting me fix a long-standing (several weeks) and
- serious set of bugs.
-
- Chris Reahard built a specialized root filesystem for running a DNS
- server jailed inside UML. It's available from the download
- http://user-mode-linux.sourceforge.net/old/dl-sf.html page in the Jail
- Filesystems section.
diff --git a/Documentation/virt/uml/user_mode_linux_howto_v2.rst b/Documentation/virt/uml/user_mode_linux_howto_v2.rst
new file mode 100644
index 000000000000..f70e6f5873c6
--- /dev/null
+++ b/Documentation/virt/uml/user_mode_linux_howto_v2.rst
@@ -0,0 +1,1208 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+#########
+UML HowTo
+#########
+
+.. contents:: :local:
+
+************
+Introduction
+************
+
+Welcome to User Mode Linux
+
+User Mode Linux is the first Open Source virtualization platform (first
+release date 1991) and second virtualization platform for an x86 PC.
+
+How is UML Different from a VM using Virtualization package X?
+==============================================================
+
+We have come to assume that virtualization also means some level of
+hardware emulation. In fact, it does not. As long as a virtualization
+package provides the OS with devices which the OS can recognize and
+has a driver for, the devices do not need to emulate real hardware.
+Most OSes today have built-in support for a number of "fake"
+devices used only under virtualization.
+User Mode Linux takes this concept to the ultimate extreme - there
+is not a single real device in sight. It is 100% artificial or if
+we use the correct term 100% paravirtual. All UML devices are abstract
+concepts which map onto something provided by the host - files, sockets,
+pipes, etc.
+
+The other major difference between UML and various virtualization
+packages is that there is a distinct difference between the way the UML
+kernel and the UML programs operate.
+The UML kernel is just a process running on Linux - same as any other
+program. It can be run by an unprivileged user and it does not require
+anything in terms of special CPU features.
+The UML userspace, however, is a bit different. The Linux kernel on the
+host machine assists UML in intercepting everything the program running
+on a UML instance is trying to do and making the UML kernel handle all
+of its requests.
+This is different from other virtualization packages which do not make any
+difference between the guest kernel and guest programs. This difference
+results in a number of advantages and disadvantages of UML over let's say
+QEMU which we will cover later in this document.
+
+
+Why Would I Want User Mode Linux?
+=================================
+
+
+* If User Mode Linux kernel crashes, your host kernel is still fine. It
+ is not accelerated in any way (vhost, kvm, etc) and it is not trying to
+ access any devices directly. It is, in fact, a process like any other.
+
+* You can run a usermode kernel as a non-root user (you may need to
+ arrange appropriate permissions for some devices).
+
+* You can run a very small VM with a minimal footprint for a specific
+ task (for example 32M or less).
+
+* You can get extremely high performance for anything which is a "kernel
+ specific task" such as forwarding, firewalling, etc while still being
+ isolated from the host kernel.
+
+* You can play with kernel concepts without breaking things.
+
+* You are not bound by "emulating" hardware, so you can try weird and
+ wonderful concepts which are very difficult to support when emulating
+ real hardware such as time travel and making your system clock
+ dependent on what UML does (very useful for things like tests).
+
+* It's fun.
+
+Why not to run UML
+==================
+
+* The syscall interception technique used by UML makes it inherently
+ slower for any userspace applications. While it can do kernel tasks
+ on par with most other virtualization packages, its userspace is
+ **slow**. The root cause is that UML has a very high cost of creating
+ new processes and threads (something most Unix/Linux applications
+ take for granted).
+
+* UML is strictly uniprocessor at present. If you want to run an
+ application which needs many CPUs to function, it is clearly the
+ wrong choice.
+
+***********************
+Building a UML instance
+***********************
+
+There is no UML installer in any distribution. While you can use off
+the shelf install media to install into a blank VM using a virtualization
+package, there is no UML equivalent. You have to use appropriate tools on
+your host to build a viable filesystem image.
+
+This is extremely easy on Debian - you can do it using debootstrap. It is
+also easy on OpenWRT - the build process can build UML images. All other
+distros - YMMV.
+
+Creating an image
+=================
+
+Create a sparse raw disk image::
+
+ # dd if=/dev/zero of=disk_image_name bs=1 count=1 seek=16G
+
+This will create a 16G disk image. The OS will initially allocate only one
+block and will allocate more as they are written by UML. As of kernel
+version 4.19 UML fully supports TRIM (as usually used by flash drives).
+Using TRIM inside the UML image by specifying discard as a mount option
+or by running ``tune2fs -o discard /dev/ubdXX`` will request UML to
+return any unused blocks to the OS.
+
+Create a filesystem on the disk image and mount it::
+
+ # mkfs.ext4 ./disk_image_name && mount ./disk_image_name /mnt
+
+This example uses ext4, any other filesystem such as ext3, btrfs, xfs,
+jfs, etc will work too.
+
+Create a minimal OS installation on the mounted filesystem::
+
+ # debootstrap buster /mnt http://deb.debian.org/debian
+
+debootstrap does not set up the root password, fstab, hostname or
+anything related to networking. It is up to the user to do that.
+
+Set the root password -t he easiest way to do that is to chroot into the
+mounted image::
+
+ # chroot /mnt
+ # passwd
+ # exit
+
+Edit key system files
+=====================
+
+UML block devices are called ubds. The fstab created by debootstrap
+will be empty and it needs an entry for the root file system::
+
+ /dev/ubd0 ext4 discard,errors=remount-ro 0 1
+
+The image hostname will be set to the same as the host on which you
+are creating it image. It is a good idea to change that to avoid
+"Oh, bummer, I rebooted the wrong machine".
+
+UML supports two classes of network devices - the older uml_net ones
+which are scheduled for obsoletion. These are called ethX. It also
+supports the newer vector IO devices which are significantly faster
+and have support for some standard virtual network encapsulations like
+Ethernet over GRE and Ethernet over L2TPv3. These are called vec0.
+
+Depending on which one is in use, ``/etc/network/interfaces`` will
+need entries like::
+
+ # legacy UML network devices
+ auto eth0
+ iface eth0 inet dhcp
+
+ # vector UML network devices
+ auto vec0
+ iface eth0 inet dhcp
+
+We now have a UML image which is nearly ready to run, all we need is a
+UML kernel and modules for it.
+
+Most distributions have a UML package. Even if you intend to use your own
+kernel, testing the image with a stock one is always a good start. These
+packages come with a set of modules which should be copied to the target
+filesystem. The location is distribution dependent. For Debian these
+reside under /usr/lib/uml/modules. Copy recursively the content of this
+directory to the mounted UML filesystem::
+
+ # cp -rax /usr/lib/uml/modules /mnt/lib/modules
+
+If you have compiled your own kernel, you need to use the usual "install
+modules to a location" procedure by running::
+
+ # make install MODULES_DIR=/mnt/lib/modules
+
+At this point the image is ready to be brought up.
+
+*************************
+Setting Up UML Networking
+*************************
+
+UML networking is designed to emulate an Ethernet connection. This
+connection may be either a point-to-point (similar to a connection
+between machines using a back-to-back cable) or a connection to a
+switch. UML supports a wide variety of means to build these
+connections to all of: local machine, remote machine(s), local and
+remote UML and other VM instances.
+
+
++-----------+--------+------------------------------------+------------+
+| Transport | Type | Capabilities | Throughput |
++===========+========+====================================+============+
+| tap | vector | checksum, tso | > 8Gbit |
++-----------+--------+------------------------------------+------------+
+| hybrid | vector | checksum, tso, multipacket rx | > 6GBit |
++-----------+--------+------------------------------------+------------+
+| raw | vector | checksum, tso, multipacket rx, tx" | > 6GBit |
++-----------+--------+------------------------------------+------------+
+| EoGRE | vector | multipacket rx, tx | > 3Gbit |
++-----------+--------+------------------------------------+------------+
+| Eol2tpv3 | vector | multipacket rx, tx | > 3Gbit |
++-----------+--------+------------------------------------+------------+
+| bess | vector | multipacket rx, tx | > 3Gbit |
++-----------+--------+------------------------------------+------------+
+| fd | vector | dependent on fd type | varies |
++-----------+--------+------------------------------------+------------+
+| tuntap | legacy | none | ~ 500Mbit |
++-----------+--------+------------------------------------+------------+
+| daemon | legacy | none | ~ 450Mbit |
++-----------+--------+------------------------------------+------------+
+| socket | legacy | none | ~ 450Mbit |
++-----------+--------+------------------------------------+------------+
+| pcap | legacy | rx only | ~ 450Mbit |
++-----------+--------+------------------------------------+------------+
+| ethertap | legacy | obsolete | ~ 500Mbit |
++-----------+--------+------------------------------------+------------+
+| vde | legacy | obsolete | ~ 500Mbit |
++-----------+--------+------------------------------------+------------+
+
+* All transports which have tso and checksum offloads can deliver speeds
+ approaching 10G on TCP streams.
+
+* All transports which have multi-packet rx and/or tx can deliver pps
+ rates of up to 1Mps or more.
+
+* All legacy transports are generally limited to ~600-700MBit and 0.05Mps
+
+* GRE and L2TPv3 allow connections to all of: local machine, remote
+ machines, remote network devices and remote UML instances.
+
+* Socket allows connections only between UML instances.
+
+* Daemon and bess require running a local switch. This switch may be
+ connected to the host as well.
+
+
+Network configuration privileges
+================================
+
+The majority of the supported networking modes need ``root`` privileges.
+For example, in the legacy tuntap networking mode, users were required
+to be part of the group associated with the tunnel device.
+
+For newer network drivers like the vector transports, ``root`` privilege
+is required to fire an ioctl to setup the tun interface and/or use
+raw sockets where needed.
+
+This can be achieved by granting the user a particular capability instead
+of running UML as root. In case of vector transport, a user can add the
+capability ``CAP_NET_ADMIN`` or ``CAP_NET_RAW``, to the uml binary.
+Thenceforth, UML can be run with normal user privilges, along with
+full networking.
+
+For example::
+
+ # sudo setcap cap_net_raw,cap_net_admin+ep linux
+
+Configuring vector transports
+===============================
+
+All vector transports support a similar syntax:
+
+If X is the interface number as in vec0, vec1, vec2, etc, the general
+syntax for options is::
+
+ vecX:transport="Transport Name",option=value,option=value,...,option=value
+
+Common options
+--------------
+
+These options are common for all transports:
+
+* ``depth=int`` - sets the queue depth for vector IO. This is the
+ amount of packets UML will attempt to read or write in a single
+ system call. The default number is 64 and is generally sufficient
+ for most applications that need throughput in the 2-4 Gbit range.
+ Higher speeds may require larger values.
+
+* ``mac=XX:XX:XX:XX:XX`` - sets the interface MAC address value.
+
+* ``gro=[0,1]`` - sets GRO on or off. Enables receive/transmit offloads.
+ The effect of this option depends on the host side support in the transport
+ which is being configured. In most cases it will enable TCP segmentation and
+ RX/TX checksumming offloads. The setting must be identical on the host side
+ and the UML side. The UML kernel will produce warnings if it is not.
+ For example, GRO is enabled by default on local machine interfaces
+ (e.g. veth pairs, bridge, etc), so it should be enabled in UML in the
+ corresponding UML transports (raw, tap, hybrid) in order for networking to
+ operate correctly.
+
+* ``mtu=int`` - sets the interface MTU
+
+* ``headroom=int`` - adjusts the default headroom (32 bytes) reserved
+ if a packet will need to be re-encapsulated into for instance VXLAN.
+
+* ``vec=0`` - disable multipacket io and fall back to packet at a
+ time mode
+
+Shared Options
+--------------
+
+* ``ifname=str`` Transports which bind to a local network interface
+ have a shared option - the name of the interface to bind to.
+
+* ``src, dst, src_port, dst_port`` - all transports which use sockets
+ which have the notion of source and destination and/or source port
+ and destination port use these to specify them.
+
+* ``v6=[0,1]`` to specify if a v6 connection is desired for all
+ transports which operate over IP. Additionally, for transports that
+ have some differences in the way they operate over v4 and v6 (for example
+ EoL2TPv3), sets the correct mode of operation. In the absense of this
+ option, the socket type is determined based on what do the src and dst
+ arguments resolve/parse to.
+
+tap transport
+-------------
+
+Example::
+
+ vecX:transport=tap,ifname=tap0,depth=128,gro=1
+
+This will connect vec0 to tap0 on the host. Tap0 must already exist (for example
+created using tunctl) and UP.
+
+tap0 can be configured as a point-to-point interface and given an ip
+address so that UML can talk to the host. Alternatively, it is possible
+to connect UML to a tap interface which is connected to a bridge.
+
+While tap relies on the vector infrastructure, it is not a true vector
+transport at this point, because Linux does not support multi-packet
+IO on tap file descriptors for normal userspace apps like UML. This
+is a privilege which is offered only to something which can hook up
+to it at kernel level via specialized interfaces like vhost-net. A
+vhost-net like helper for UML is planned at some point in the future.
+
+Privileges required: tap transport requires either:
+
+* tap interface to exist and be created persistent and owned by the
+ UML user using tunctl. Example ``tunctl -u uml-user -t tap0``
+
+* binary to have ``CAP_NET_ADMIN`` privilege
+
+hybrid transport
+----------------
+
+Example::
+
+ vecX:transport=hybrid,ifname=tap0,depth=128,gro=1
+
+This is an experimental/demo transport which couples tap for transmit
+and a raw socket for receive. The raw socket allows multi-packet
+receive resulting in significantly higher packet rates than normal tap
+
+Privileges required: hybrid requires ``CAP_NET_RAW`` capability by
+the UML user as well as the requirements for the tap transport.
+
+raw socket transport
+--------------------
+
+Example::
+
+ vecX:transport=raw,ifname=p-veth0,depth=128,gro=1
+
+
+This transport uses vector IO on raw sockets. While you can bind to any
+interface including a physical one, the most common use it to bind to
+the "peer" side of a veth pair with the other side configured on the
+host.
+
+Example host configuration for Debian:
+
+**/etc/network/interfaces**::
+
+ auto veth0
+ iface veth0 inet static
+ address 192.168.4.1
+ netmask 255.255.255.252
+ broadcast 192.168.4.3
+ pre-up ip link add veth0 type veth peer name p-veth0 && \
+ ifconfig p-veth0 up
+
+UML can now bind to p-veth0 like this::
+
+ vec0:transport=raw,ifname=p-veth0,depth=128,gro=1
+
+
+If the UML guest is configured with 192.168.4.2 and netmask 255.255.255.0
+it can talk to the host on 192.168.4.1
+
+The raw transport also provides some support for offloading some of the
+filtering to the host. The two options to control it are:
+
+* ``bpffile=str`` filename of raw bpf code to be loaded as a socket filter
+
+* ``bpfflash=int`` 0/1 allow loading of bpf from inside User Mode Linux.
+ This option allows the use of the ethtool load firmware command to
+ load bpf code.
+
+In either case the bpf code is loaded into the host kernel. While this is
+presently limited to legacy bpf syntax (not ebpf), it is still a security
+risk. It is not recommended to allow this unless the User Mode Linux
+instance is considered trusted.
+
+Privileges required: raw socket transport requires `CAP_NET_RAW`
+capability.
+
+GRE socket transport
+--------------------
+
+Example::
+
+ vecX:transport=gre,src=$src_host,dst=$dst_host
+
+
+This will configure an Ethernet over ``GRE`` (aka ``GRETAP`` or
+``GREIRB``) tunnel which will connect the UML instance to a ``GRE``
+endpoint at host dst_host. ``GRE`` supports the following additional
+options:
+
+* ``rx_key=int`` - GRE 32 bit integer key for rx packets, if set,
+ ``txkey`` must be set too
+
+* ``tx_key=int`` - GRE 32 bit integer key for tx packets, if set
+ ``rx_key`` must be set too
+
+* ``sequence=[0,1]`` - enable GRE sequence
+
+* ``pin_sequence=[0,1]`` - pretend that the sequence is always reset
+ on each packet (needed to interoperate with some really broken
+ implementations)
+
+* ``v6=[0,1]`` - force IPv4 or IPv6 sockets respectively
+
+* GRE checksum is not presently supported
+
+GRE has a number of caveats:
+
+* You can use only one GRE connection per ip address. There is no way to
+ multiplex connections as each GRE tunnel is terminated directly on
+ the UML instance.
+
+* The key is not really a security feature. While it was intended as such
+ it's "security" is laughable. It is, however, a useful feature to
+ ensure that the tunnel is not misconfigured.
+
+An example configuration for a Linux host with a local address of
+192.168.128.1 to connect to a UML instance at 192.168.129.1
+
+**/etc/network/interfaces**::
+
+ auto gt0
+ iface gt0 inet static
+ address 10.0.0.1
+ netmask 255.255.255.0
+ broadcast 10.0.0.255
+ mtu 1500
+ pre-up ip link add gt0 type gretap local 192.168.128.1 \
+ remote 192.168.129.1 || true
+ down ip link del gt0 || true
+
+Additionally, GRE has been tested versus a variety of network equipment.
+
+Privileges required: GRE requires ``CAP_NET_RAW``
+
+l2tpv3 socket transport
+-----------------------
+
+_Warning_. L2TPv3 has a "bug". It is the "bug" known as "has more
+options than GNU ls". While it has some advantages, there are usually
+easier (and less verbose) ways to connect a UML instance to something.
+For example, most devices which support L2TPv3 also support GRE.
+
+Example::
+
+ vec0:transport=l2tpv3,udp=1,src=$src_host,dst=$dst_host,srcport=$src_port,dstport=$dst_port,depth=128,rx_session=0xffffffff,tx_session=0xffff
+
+This will configure an Ethernet over L2TPv3 fixed tunnel which will
+connect the UML instance to a L2TPv3 endpoint at host $dst_host using
+the L2TPv3 UDP flavour and UDP destination port $dst_port.
+
+L2TPv3 always requires the following additional options:
+
+* ``rx_session=int`` - l2tpv3 32 bit integer session for rx packets
+
+* ``tx_session=int`` - l2tpv3 32 bit integer session for tx packets
+
+As the tunnel is fixed these are not negotiated and they are
+preconfigured on both ends.
+
+Additionally, L2TPv3 supports the following optional parameters
+
+* ``rx_cookie=int`` - l2tpv3 32 bit integer cookie for rx packets - same
+ functionality as GRE key, more to prevent misconfiguration than provide
+ actual security
+
+* ``tx_cookie=int`` - l2tpv3 32 bit integer cookie for tx packets
+
+* ``cookie64=[0,1]`` - use 64 bit cookies instead of 32 bit.
+
+* ``counter=[0,1]`` - enable l2tpv3 counter
+
+* ``pin_counter=[0,1]`` - pretend that the counter is always reset on
+ each packet (needed to interoperate with some really broken
+ implementations)
+
+* ``v6=[0,1]`` - force v6 sockets
+
+* ``udp=[0,1]`` - use raw sockets (0) or UDP (1) version of the protocol
+
+L2TPv3 has a number of caveats:
+
+* you can use only one connection per ip address in raw mode. There is
+ no way to multiplex connections as each L2TPv3 tunnel is terminated
+ directly on the UML instance. UDP mode can use different ports for
+ this purpose.
+
+Here is an example of how to configure a linux host to connect to UML
+via L2TPv3:
+
+**/etc/network/interfaces**::
+
+ auto l2tp1
+ iface l2tp1 inet static
+ address 192.168.126.1
+ netmask 255.255.255.0
+ broadcast 192.168.126.255
+ mtu 1500
+ pre-up ip l2tp add tunnel remote 127.0.0.1 \
+ local 127.0.0.1 encap udp tunnel_id 2 \
+ peer_tunnel_id 2 udp_sport 1706 udp_dport 1707 && \
+ ip l2tp add session name l2tp1 tunnel_id 2 \
+ session_id 0xffffffff peer_session_id 0xffffffff
+ down ip l2tp del session tunnel_id 2 session_id 0xffffffff && \
+ ip l2tp del tunnel tunnel_id 2
+
+
+Privileges required: L2TPv3 requires ``CAP_NET_RAW`` for raw IP mode and
+no special privileges for the UDP mode.
+
+BESS socket transport
+---------------------
+
+BESS is a high performance modular network switch.
+
+https://github.com/NetSys/bess
+
+It has support for a simple sequential packet socket mode which in the
+more recent versions is using vector IO for high performance.
+
+Example::
+
+ vecX:transport=bess,src=$unix_src,dst=$unix_dst
+
+This will configure a BESS transport using the unix_src Unix domain
+socket address as source and unix_dst socket address as destination.
+
+For BESS configuration and how to allocate a BESS Unix domain socket port
+please see the BESS documentation.
+
+https://github.com/NetSys/bess/wiki/Built-In-Modules-and-Ports
+
+BESS transport does not require any special privileges.
+
+Configuring Legacy transports
+=============================
+
+Legacy transports are now considered obsolete. Please use the vector
+versions.
+
+***********
+Running UML
+***********
+
+This section assumes that either the user-mode-linux package from the
+distribution or a custom built kernel has been installed on the host.
+
+These add an executable called linux to the system. This is the UML
+kernel. It can be run just like any other executable.
+It will take most normal linux kernel arguments as command line
+arguments. Additionally, it will need some UML specific arguments
+in order to do something useful.
+
+Arguments
+=========
+
+Mandatory Arguments:
+--------------------
+
+* ``mem=int[K,M,G]`` - amount of memory. By default bytes. It will
+ also accept K, M or G qualifiers.
+
+* ``ubdX[s,d,c,t]=`` virtual disk specification. This is not really
+ mandatory, but it is likely to be needed in nearly all cases so we can
+ specify a root file system.
+ The simplest possible image specification is the name of the image
+ file for the filesystem (created using one of the methods described
+ in `Creating an image`_)
+
+ * UBD devices support copy on write (COW). The changes are kept in
+ a separate file which can be discarded allowing a rollback to the
+ original pristine image. If COW is desired, the UBD image is
+ specified as: ``cow_file,master_image``.
+ Example:``ubd0=Filesystem.cow,Filesystem.img``
+
+ * UBD devices can be set to use synchronous IO. Any writes are
+ immediately flushed to disk. This is done by adding ``s`` after
+ the ``ubdX`` specification
+
+ * UBD performs some euristics on devices specified as a single
+ filename to make sure that a COW file has not been specified as
+ the image. To turn them off, use the ``d`` flag after ``ubdX``
+
+ * UBD supports TRIM - asking the Host OS to reclaim any unused
+ blocks in the image. To turn it off, specify the ``t`` flag after
+ ``ubdX``
+
+* ``root=`` root device - most likely ``/dev/ubd0`` (this is a Linux
+ filesystem image)
+
+Important Optional Arguments
+----------------------------
+
+If UML is run as "linux" with no extra arguments, it will try to start an
+xterm for every console configured inside the image (up to 6 in most
+linux distributions). Each console is started inside an
+xterm. This makes it nice and easy to use UML on a host with a GUI. It is,
+however, the wrong approach if UML is to be used as a testing harness or run
+in a text-only environment.
+
+In order to change this behaviour we need to specify an alternative console
+and wire it to one of the supported "line" channels. For this we need to map a
+console to use something different from the default xterm.
+
+Example which will divert console number 1 to stdin/stdout::
+
+ con1=fd:0,fd:1
+
+UML supports a wide variety of serial line channels which are specified using
+the following syntax
+
+ conX=channel_type:options[,channel_type:options]
+
+
+If the channel specification contains two parts separated by comma, the first
+one is input, the second one output.
+
+* The null channel - Discard all input or output. Example ``con=null`` will set
+ all consoles to null by default.
+
+* The fd channel - use file descriptor numbers for input/out. Example:
+ ``con1=fd:0,fd:1.``
+
+* The port channel - listen on tcp port number. Example: ``con1=port:4321``
+
+* The pty and pts channels - use system pty/pts.
+
+* The tty channel - bind to an existing system tty. Example: ``con1=/dev/tty8``
+ will make UML use the host 8th console (usually unused).
+
+* The xterm channel - this is the default - bring up an xterm on this channel
+ and direct IO to it. Note, that in order for xterm to work, the host must
+ have the UML distribution package installed. This usually contains the
+ port-helper and other utilities needed for UML to communicate with the xterm.
+ Alternatively, these need to be complied and installed from source. All
+ options applicable to consoles also apply to UML serial lines which are
+ presented as ttyS inside UML.
+
+Starting UML
+============
+
+We can now run UML.
+::
+ # linux mem=2048M umid=TEST \
+ ubd0=Filesystem.img \
+ vec0:transport=tap,ifname=tap0,depth=128,gro=1 \
+ root=/dev/ubda con=null con0=null,fd:2 con1=fd:0,fd:1
+
+This will run an instance with ``2048M RAM``, try to use the image file
+called ``Filesystem.img`` as root. It will connect to the host using tap0.
+All consoles except ``con1`` will be disabled and console 1 will
+use standard input/output making it appear in the same terminal it was started.
+
+Logging in
+============
+
+If you have not set up a password when generating the image, you will have to
+shut down the UML instance, mount the image, chroot into it and set it - as
+described in the Generating an Image section. If the password is already set,
+you can just log in.
+
+The UML Management Console
+============================
+
+In addition to managing the image from "the inside" using normal sysadmin tools,
+it is possible to perform a number of low level operations using the UML
+management console. The UML management console is a low-level interface to the
+kernel on a running UML instance, somewhat like the i386 SysRq interface. Since
+there is a full-blown operating system under UML, there is much greater
+flexibility possible than with the SysRq mechanism.
+
+There are a number of things you can do with the mconsole interface:
+
+* get the kernel version
+* add and remove devices
+* halt or reboot the machine
+* Send SysRq commands
+* Pause and resume the UML
+* Inspect processes running inside UML
+* Inspect UML internal /proc state
+
+You need the mconsole client (uml\_mconsole) which is a part of the UML
+tools package available in most Linux distritions.
+
+You also need ``CONFIG_MCONSOLE`` (under 'General Setup') enabled in the UML
+kernel. When you boot UML, you'll see a line like::
+
+ mconsole initialized on /home/jdike/.uml/umlNJ32yL/mconsole
+
+If you specify a unique machine id one the UML command line, i.e.
+``umid=debian``, you'll see this::
+
+ mconsole initialized on /home/jdike/.uml/debian/mconsole
+
+
+That file is the socket that uml_mconsole will use to communicate with
+UML. Run it with either the umid or the full path as its argument::
+
+ # uml_mconsole debian
+
+or
+
+ # uml_mconsole /home/jdike/.uml/debian/mconsole
+
+
+You'll get a prompt, at which you can run one of these commands:
+
+* version
+* help
+* halt
+* reboot
+* config
+* remove
+* sysrq
+* help
+* cad
+* stop
+* go
+* proc
+* stack
+
+version
+-------
+
+This command takes no arguments. It prints the UML version::
+
+ (mconsole) version
+ OK Linux OpenWrt 4.14.106 #0 Tue Mar 19 08:19:41 2019 x86_64
+
+
+There are a couple actual uses for this. It's a simple no-op which
+can be used to check that a UML is running. It's also a way of
+sending a device interrupt to the UML. UML mconsole is treated internally as
+a UML device.
+
+help
+----
+
+This command takes no arguments. It prints a short help screen with the
+supported mconsole commands.
+
+
+halt and reboot
+---------------
+
+These commands take no arguments. They shut the machine down immediately, with
+no syncing of disks and no clean shutdown of userspace. So, they are
+pretty close to crashing the machine::
+
+ (mconsole) halt
+ OK
+
+config
+------
+
+"config" adds a new device to the virtual machine. This is supported
+by most UML device drivers. It takes one argument, which is the
+device to add, with the same syntax as the kernel command line::
+
+ (mconsole) config ubd3=/home/jdike/incoming/roots/root_fs_debian22
+
+remove
+------
+
+"remove" deletes a device from the system. Its argument is just the
+name of the device to be removed. The device must be idle in whatever
+sense the driver considers necessary. In the case of the ubd driver,
+the removed block device must not be mounted, swapped on, or otherwise
+open, and in the case of the network driver, the device must be down::
+
+ (mconsole) remove ubd3
+
+sysrq
+-----
+
+This command takes one argument, which is a single letter. It calls the
+generic kernel's SysRq driver, which does whatever is called for by
+that argument. See the SysRq documentation in
+Documentation/admin-guide/sysrq.rst in your favorite kernel tree to
+see what letters are valid and what they do.
+
+cad
+---
+
+This invokes the ``Ctl-Alt-Del`` action in the running image. What exactly
+this ends up doing is up to init, systemd, etc. Normally, it reboots the
+machine.
+
+stop
+----
+
+This puts the UML in a loop reading mconsole requests until a 'go'
+mconsole command is received. This is very useful as a
+debugging/snapshotting tool.
+
+go
+--
+
+This resumes a UML after being paused by a 'stop' command. Note that
+when the UML has resumed, TCP connections may have timed out and if
+the UML is paused for a long period of time, crond might go a little
+crazy, running all the jobs it didn't do earlier.
+
+proc
+----
+
+This takes one argument - the name of a file in /proc which is printed
+to the mconsole standard output
+
+stack
+-----
+
+This takes one argument - the pid number of a process. Its stack is
+printed to a standard output.
+
+*******************
+Advanced UML Topics
+*******************
+
+Sharing Filesystems between Virtual Machines
+============================================
+
+Don't attempt to share filesystems simply by booting two UMLs from the
+same file. That's the same thing as booting two physical machines
+from a shared disk. It will result in filesystem corruption.
+
+Using layered block devices
+---------------------------
+
+The way to share a filesystem between two virtual machines is to use
+the copy-on-write (COW) layering capability of the ubd block driver.
+Any changed blocks are stored in the private COW file, while reads come
+from either device - the private one if the requested block is valid in
+it, the shared one if not. Using this scheme, the majority of data
+which is unchanged is shared between an arbitrary number of virtual
+machines, each of which has a much smaller file containing the changes
+that it has made. With a large number of UMLs booting from a large root
+filesystem, this leads to a huge disk space saving.
+
+Sharing file system data will also help performance, since the host will
+be able to cache the shared data using a much smaller amount of memory,
+so UML disk requests will be served from the host's memory rather than
+its disks. There is a major caveat in doing this on multisocket NUMA
+machines. On such hardware, running many UML instances with a shared
+master image and COW changes may caise issues like NMIs from excess of
+inter-socket traffic.
+
+If you are running UML on high end hardware like this, make sure to
+bind UML to a set of logical cpus residing on the same socket using the
+``taskset`` command or have a look at the "tuning" section.
+
+To add a copy-on-write layer to an existing block device file, simply
+add the name of the COW file to the appropriate ubd switch::
+
+ ubd0=root_fs_cow,root_fs_debian_22
+
+where ``root_fs_cow`` is the private COW file and ``root_fs_debian_22`` is
+the existing shared filesystem. The COW file need not exist. If it
+doesn't, the driver will create and initialize it.
+
+Disk Usage
+----------
+
+UML has TRIM support which will release any unused space in its disk
+image files to the underlying OS. It is important to use either ls -ls
+or du to verify the actual file size.
+
+COW validity.
+-------------
+
+Any changes to the master image will invalidate all COW files. If this
+happens, UML will *NOT* automatically delete any of the COW files and
+will refuse to boot. In this case the only solution is to either
+restore the old image (including its last modified timestamp) or remove
+all COW files which will result in their recreation. Any changes in
+the COW files will be lost.
+
+Cows can moo - uml_moo : Merging a COW file with its backing file
+-----------------------------------------------------------------
+
+Depending on how you use UML and COW devices, it may be advisable to
+merge the changes in the COW file into the backing file every once in
+a while.
+
+The utility that does this is uml_moo. Its usage is::
+
+ uml_moo COW_file new_backing_file
+
+
+There's no need to specify the backing file since that information is
+already in the COW file header. If you're paranoid, boot the new
+merged file, and if you're happy with it, move it over the old backing
+file.
+
+``uml_moo`` creates a new backing file by default as a safety measure.
+It also has a destructive merge option which will merge the COW file
+directly into its current backing file. This is really only usable
+when the backing file only has one COW file associated with it. If
+there are multiple COWs associated with a backing file, a -d merge of
+one of them will invalidate all of the others. However, it is
+convenient if you're short of disk space, and it should also be
+noticeably faster than a non-destructive merge.
+
+``uml_moo`` is installed with the UML distribution packages and is
+available as a part of UML utilities.
+
+Host file access
+==================
+
+If you want to access files on the host machine from inside UML, you
+can treat it as a separate machine and either nfs mount directories
+from the host or copy files into the virtual machine with scp.
+However, since UML is running on the host, it can access those
+files just like any other process and make them available inside the
+virtual machine without the need to use the network.
+This is possible with the hostfs virtual filesystem. With it, you
+can mount a host directory into the UML filesystem and access the
+files contained in it just as you would on the host.
+
+*SECURITY WARNING*
+
+Hostfs without any parameters to the UML Image will allow the image
+to mount any part of the host filesystem and write to it. Always
+confine hostfs to a specific "harmless" directory (for example ``/var/tmp``)
+if running UML. This is especially important if UML is being run as root.
+
+Using hostfs
+------------
+
+To begin with, make sure that hostfs is available inside the virtual
+machine with::
+
+ # cat /proc/filesystems
+
+``hostfs`` should be listed. If it's not, either rebuild the kernel
+with hostfs configured into it or make sure that hostfs is built as a
+module and available inside the virtual machine, and insmod it.
+
+
+Now all you need to do is run mount::
+
+ # mount none /mnt/host -t hostfs
+
+will mount the host's ``/`` on the virtual machine's ``/mnt/host``.
+If you don't want to mount the host root directory, then you can
+specify a subdirectory to mount with the -o switch to mount::
+
+ # mount none /mnt/home -t hostfs -o /home
+
+will mount the hosts's /home on the virtual machine's /mnt/home.
+
+hostfs as the root filesystem
+-----------------------------
+
+It's possible to boot from a directory hierarchy on the host using
+hostfs rather than using the standard filesystem in a file.
+To start, you need that hierarchy. The easiest way is to loop mount
+an existing root_fs file::
+
+ # mount root_fs uml_root_dir -o loop
+
+
+You need to change the filesystem type of ``/`` in ``etc/fstab`` to be
+'hostfs', so that line looks like this::
+
+ /dev/ubd/0 / hostfs defaults 1 1
+
+Then you need to chown to yourself all the files in that directory
+that are owned by root. This worked for me::
+
+ # find . -uid 0 -exec chown jdike {} \;
+
+Next, make sure that your UML kernel has hostfs compiled in, not as a
+module. Then run UML with the boot device pointing at that directory::
+
+ ubd0=/path/to/uml/root/directory
+
+UML should then boot as it does normally.
+
+Hostfs Caveats
+--------------
+
+Hostfs does not support keeping track of host filesystem changes on the
+host (outside UML). As a result, if a file is changed without UML's
+knowledge, UML will not know about it and its own in-memory cache of
+the file may be corrupt. While it is possible to fix this, it is not
+something which is being worked on at present.
+
+Tuning UML
+============
+
+UML at present is strictly uniprocessor. It will, however spin up a
+number of threads to handle various functions.
+
+The UBD driver, SIGIO and the MMU emulation do that. If the system is
+idle, these threads will be migrated to other processors on a SMP host.
+This, unfortunately, will usually result in LOWER performance because of
+all of the cache/memory synchronization traffic between cores. As a
+result, UML will usually benefit from being pinned on a single CPU
+especially on a large system. This can result in performance differences
+of 5 times or higher on some benchmarks.
+
+Similarly, on large multi-node NUMA systems UML will benefit if all of
+its memory is allocated from the same NUMA node it will run on. The
+OS will *NOT* do that by default. In order to do that, the sysadmin
+needs to create a suitable tmpfs ramdisk bound to a particular node
+and use that as the source for UML RAM allocation by specifying it
+in the TMP or TEMP environment variables. UML will look at the values
+of ``TMPDIR``, ``TMP`` or ``TEMP`` for that. If that fails, it will
+look for shmfs mounted under ``/dev/shm``. If everything else fails use
+``/tmp/`` regardless of the filesystem type used for it::
+
+ mount -t tmpfs -ompol=bind:X none /mnt/tmpfs-nodeX
+ TEMP=/mnt/tmpfs-nodeX taskset -cX linux options options options..
+
+*******************************************
+Contributing to UML and Developing with UML
+*******************************************
+
+UML is an excellent platform to develop new Linux kernel concepts -
+filesystems, devices, virtualization, etc. It provides unrivalled
+opportunities to create and test them without being constrained to
+emulating specific hardware.
+
+Example - want to try how linux will work with 4096 "proper" network
+devices?
+
+Not an issue with UML. At the same time, this is something which
+is difficult with other virtualization packages - they are
+constrained by the number of devices allowed on the hardware bus
+they are trying to emulate (for example 16 on a PCI bus in qemu).
+
+If you have something to contribute such as a patch, a bugfix, a
+new feature, please send it to ``linux-um@lists.infradead.org``
+
+Please follow all standard Linux patch guidelines such as cc-ing
+relevant maintainers and run ``./sripts/checkpatch.pl`` on your patch.
+For more details see ``Documentation/process/submitting-patches.rst``
+
+Note - the list does not accept HTML or attachments, all emails must
+be formatted as plain text.
+
+Developing always goes hand in hand with debugging. First of all,
+you can always run UML under gdb and there will be a whole section
+later on on how to do that. That, however, is not the only way to
+debug a linux kernel. Quite often adding tracing statements and/or
+using UML specific approaches such as ptracing the UML kernel process
+are significantly more informative.
+
+Tracing UML
+=============
+
+When running UML consists of a main kernel thread and a number of
+helper threads. The ones of interest for tracing are NOT the ones
+that are already ptraced by UML as a part of its MMU emulation.
+
+These are usually the first three threads visible in a ps display.
+The one with the lowest PID number and using most CPU is usually the
+kernel thread. The other threads are the disk
+(ubd) device helper thread and the sigio helper thread.
+Running ptrace on this thread usually results in the following picture::
+
+ host$ strace -p 16566
+ --- SIGIO {si_signo=SIGIO, si_code=POLL_IN, si_band=65} ---
+ epoll_wait(4, [{EPOLLIN, {u32=3721159424, u64=3721159424}}], 64, 0) = 1
+ epoll_wait(4, [], 64, 0) = 0
+ rt_sigreturn({mask=[PIPE]}) = 16967
+ ptrace(PTRACE_GETREGS, 16967, NULL, 0xd5f34f38) = 0
+ ptrace(PTRACE_GETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=832}]) = 0
+ ptrace(PTRACE_GETSIGINFO, 16967, NULL, {si_signo=SIGTRAP, si_code=0x85, si_pid=16967, si_uid=0}) = 0
+ ptrace(PTRACE_SETREGS, 16967, NULL, 0xd5f34f38) = 0
+ ptrace(PTRACE_SETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=2696}]) = 0
+ ptrace(PTRACE_SYSEMU, 16967, NULL, 0) = 0
+ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=16967, si_uid=0, si_status=SIGTRAP, si_utime=65, si_stime=89} ---
+ wait4(16967, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP | 0x80}], WSTOPPED|__WALL, NULL) = 16967
+ ptrace(PTRACE_GETREGS, 16967, NULL, 0xd5f34f38) = 0
+ ptrace(PTRACE_GETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=832}]) = 0
+ ptrace(PTRACE_GETSIGINFO, 16967, NULL, {si_signo=SIGTRAP, si_code=0x85, si_pid=16967, si_uid=0}) = 0
+ timer_settime(0, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=2830912}}, NULL) = 0
+ getpid() = 16566
+ clock_nanosleep(CLOCK_MONOTONIC, 0, {tv_sec=1, tv_nsec=0}, NULL) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
+ --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=1631716592, ptr=0x614204f0}} ---
+ rt_sigreturn({mask=[PIPE]}) = -1 EINTR (Interrupted system call)
+
+This is a typical picture from a mostly idle UML instance
+
+* UML interrupt controller uses epoll - this is UML waiting for IO
+ interrupts:
+
+ epoll_wait(4, [{EPOLLIN, {u32=3721159424, u64=3721159424}}], 64, 0) = 1
+
+* The sequence of ptrace calls is part of MMU emulation and runnin the
+ UML userspace
+* ``timer_settime`` is part of the UML high res timer subsystem mapping
+ timer requests from inside UML onto the host high resultion timers.
+* ``clock_nanosleep`` is UML going into idle (similar to the way a PC
+ will execute an ACPI idle).
+
+As you can see UML will generate quite a bit of output even in idle.The output
+can be very informative when observing IO. It shows the actual IO calls, their
+arguments and returns values.
+
+Kernel debugging
+================
+
+You can run UML under gdb now, though it will not necessarily agree to
+be started under it. If you are trying to track a runtime bug, it is
+much better to attach gdb to a running UML instance and let UML run.
+
+Assuming the same PID number as in the previous example, this would be::
+
+ # gdb -p 16566
+
+This will STOP the UML instance, so you must enter `cont` at the GDB
+command line to request it to continue. It may be a good idea to make
+this into a gdb script and pass it to gdb as an argument.
+
+Developing Device Drivers
+=========================
+
+Nearly all UML drivers are monolithic. While it is possible to build a
+UML driver as a kernel module, that limits the possible functionality
+to in-kernel only and non-UML specific. The reason for this is that
+in order to really leverage UML, one needs to write a piece of
+userspace code which maps driver concepts onto actual userspace host
+calls.
+
+This forms the so called "user" portion of the driver. While it can
+reuse a lot of kernel concepts, it is generally just another piece of
+userspace code. This portion needs some matching "kernel" code which
+resides inside the UML image and which implements the Linux kernel part.
+
+*Note: There are very few limitations in the way "kernel" and "user" interact*.
+
+UML does not have a strictly defined kernel to host API. It does not
+try to emulate a specific architecture or bus. UML's "kernel" and
+"user" can share memory, code and interact as needed to implement
+whatever design the software developer has in mind. The only
+limitations are purely technical. Due to a lot of functions and
+variables having the same names, the developer should be careful
+which includes and libraries they are trying to refer to.
+
+As a result a lot of userspace code consists of simple wrappers.
+F.e. ``os_close_file()`` is just a wrapper around ``close()``
+which ensures that the userspace function close does not clash
+with similarly named function(s) in the kernel part.
+
+Security Considerations
+-----------------------
+
+Drivers or any new functionality should default to not
+accepting arbitrary filename, bpf code or other parameters
+which can affect the host from inside the UML instance.
+For example, specifying the socket used for IPC communication
+between a driver and the host at the UML command line is OK
+security-wise. Allowing it as a loadable module parameter
+isn't.
+
+If such functionality is desireable for a particular application
+(e.g. loading BPF "firmware" for raw socket network transports),
+it should be off by default and should be explicitly turned on
+as a command line parameter at startup.
+
+Even with this in mind, the level of isolation between UML
+and the host is relatively weak. If the UML userspace is
+allowed to load arbitrary kernel drivers, an attacker can
+use this to break out of UML. Thus, if UML is used in
+a production application, it is recommended that all modules
+are loaded at boot and kernel module loading is disabled
+afterwards.
diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
index 6f9e000757fa..dd9f76a4ef29 100644
--- a/Documentation/vm/hmm.rst
+++ b/Documentation/vm/hmm.rst
@@ -1,4 +1,4 @@
-.. hmm:
+.. _hmm:
=====================================
Heterogeneous Memory Management (HMM)
@@ -271,10 +271,139 @@ map those pages from the CPU side.
Migration to and from device memory
===================================
-Because the CPU cannot access device memory, migration must use the device DMA
-engine to perform copy from and to device memory. For this we need to use
-migrate_vma_setup(), migrate_vma_pages(), and migrate_vma_finalize() helpers.
-
+Because the CPU cannot access device memory directly, the device driver must
+use hardware DMA or device specific load/store instructions to migrate data.
+The migrate_vma_setup(), migrate_vma_pages(), and migrate_vma_finalize()
+functions are designed to make drivers easier to write and to centralize common
+code across drivers.
+
+Before migrating pages to device private memory, special device private
+``struct page`` need to be created. These will be used as special "swap"
+page table entries so that a CPU process will fault if it tries to access
+a page that has been migrated to device private memory.
+
+These can be allocated and freed with::
+
+ struct resource *res;
+ struct dev_pagemap pagemap;
+
+ res = request_free_mem_region(&iomem_resource, /* number of bytes */,
+ "name of driver resource");
+ pagemap.type = MEMORY_DEVICE_PRIVATE;
+ pagemap.range.start = res->start;
+ pagemap.range.end = res->end;
+ pagemap.nr_range = 1;
+ pagemap.ops = &device_devmem_ops;
+ memremap_pages(&pagemap, numa_node_id());
+
+ memunmap_pages(&pagemap);
+ release_mem_region(pagemap.range.start, range_len(&pagemap.range));
+
+There are also devm_request_free_mem_region(), devm_memremap_pages(),
+devm_memunmap_pages(), and devm_release_mem_region() when the resources can
+be tied to a ``struct device``.
+
+The overall migration steps are similar to migrating NUMA pages within system
+memory (see :ref:`Page migration <page_migration>`) but the steps are split
+between device driver specific code and shared common code:
+
+1. ``mmap_read_lock()``
+
+ The device driver has to pass a ``struct vm_area_struct`` to
+ migrate_vma_setup() so the mmap_read_lock() or mmap_write_lock() needs to
+ be held for the duration of the migration.
+
+2. ``migrate_vma_setup(struct migrate_vma *args)``
+
+ The device driver initializes the ``struct migrate_vma`` fields and passes
+ the pointer to migrate_vma_setup(). The ``args->flags`` field is used to
+ filter which source pages should be migrated. For example, setting
+ ``MIGRATE_VMA_SELECT_SYSTEM`` will only migrate system memory and
+ ``MIGRATE_VMA_SELECT_DEVICE_PRIVATE`` will only migrate pages residing in
+ device private memory. If the latter flag is set, the ``args->pgmap_owner``
+ field is used to identify device private pages owned by the driver. This
+ avoids trying to migrate device private pages residing in other devices.
+ Currently only anonymous private VMA ranges can be migrated to or from
+ system memory and device private memory.
+
+ One of the first steps migrate_vma_setup() does is to invalidate other
+ device's MMUs with the ``mmu_notifier_invalidate_range_start(()`` and
+ ``mmu_notifier_invalidate_range_end()`` calls around the page table
+ walks to fill in the ``args->src`` array with PFNs to be migrated.
+ The ``invalidate_range_start()`` callback is passed a
+ ``struct mmu_notifier_range`` with the ``event`` field set to
+ ``MMU_NOTIFY_MIGRATE`` and the ``migrate_pgmap_owner`` field set to
+ the ``args->pgmap_owner`` field passed to migrate_vma_setup(). This is
+ allows the device driver to skip the invalidation callback and only
+ invalidate device private MMU mappings that are actually migrating.
+ This is explained more in the next section.
+
+ While walking the page tables, a ``pte_none()`` or ``is_zero_pfn()``
+ entry results in a valid "zero" PFN stored in the ``args->src`` array.
+ This lets the driver allocate device private memory and clear it instead
+ of copying a page of zeros. Valid PTE entries to system memory or
+ device private struct pages will be locked with ``lock_page()``, isolated
+ from the LRU (if system memory since device private pages are not on
+ the LRU), unmapped from the process, and a special migration PTE is
+ inserted in place of the original PTE.
+ migrate_vma_setup() also clears the ``args->dst`` array.
+
+3. The device driver allocates destination pages and copies source pages to
+ destination pages.
+
+ The driver checks each ``src`` entry to see if the ``MIGRATE_PFN_MIGRATE``
+ bit is set and skips entries that are not migrating. The device driver
+ can also choose to skip migrating a page by not filling in the ``dst``
+ array for that page.
+
+ The driver then allocates either a device private struct page or a
+ system memory page, locks the page with ``lock_page()``, and fills in the
+ ``dst`` array entry with::
+
+ dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+
+ Now that the driver knows that this page is being migrated, it can
+ invalidate device private MMU mappings and copy device private memory
+ to system memory or another device private page. The core Linux kernel
+ handles CPU page table invalidations so the device driver only has to
+ invalidate its own MMU mappings.
+
+ The driver can use ``migrate_pfn_to_page(src[i])`` to get the
+ ``struct page`` of the source and either copy the source page to the
+ destination or clear the destination device private memory if the pointer
+ is ``NULL`` meaning the source page was not populated in system memory.
+
+4. ``migrate_vma_pages()``
+
+ This step is where the migration is actually "committed".
+
+ If the source page was a ``pte_none()`` or ``is_zero_pfn()`` page, this
+ is where the newly allocated page is inserted into the CPU's page table.
+ This can fail if a CPU thread faults on the same page. However, the page
+ table is locked and only one of the new pages will be inserted.
+ The device driver will see that the ``MIGRATE_PFN_MIGRATE`` bit is cleared
+ if it loses the race.
+
+ If the source page was locked, isolated, etc. the source ``struct page``
+ information is now copied to destination ``struct page`` finalizing the
+ migration on the CPU side.
+
+5. Device driver updates device MMU page tables for pages still migrating,
+ rolling back pages not migrating.
+
+ If the ``src`` entry still has ``MIGRATE_PFN_MIGRATE`` bit set, the device
+ driver can update the device MMU and set the write enable bit if the
+ ``MIGRATE_PFN_WRITE`` bit is set.
+
+6. ``migrate_vma_finalize()``
+
+ This step replaces the special migration page table entry with the new
+ page's page table entry and releases the reference to the source and
+ destination ``struct page``.
+
+7. ``mmap_read_unlock()``
+
+ The lock can now be released.
Memory cgroup (memcg) and rss accounting
========================================
diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst
index 611140ffef7e..eff5fbd492d0 100644
--- a/Documentation/vm/index.rst
+++ b/Documentation/vm/index.rst
@@ -29,6 +29,7 @@ descriptions of data structures and algorithms.
:maxdepth: 1
active_mm
+ arch_pgtable_helpers
balance
cleancache
free_page_reporting
diff --git a/Documentation/vm/page_migration.rst b/Documentation/vm/page_migration.rst
index 68883ac485fa..91a98a6b43bb 100644
--- a/Documentation/vm/page_migration.rst
+++ b/Documentation/vm/page_migration.rst
@@ -4,25 +4,28 @@
Page migration
==============
-Page migration allows the moving of the physical location of pages between
-nodes in a numa system while the process is running. This means that the
+Page migration allows moving the physical location of pages between
+nodes in a NUMA system while the process is running. This means that the
virtual addresses that the process sees do not change. However, the
system rearranges the physical location of those pages.
-The main intend of page migration is to reduce the latency of memory access
+Also see :ref:`Heterogeneous Memory Management (HMM) <hmm>`
+for migrating pages to or from device private memory.
+
+The main intent of page migration is to reduce the latency of memory accesses
by moving pages near to the processor where the process accessing that memory
is running.
Page migration allows a process to manually relocate the node on which its
pages are located through the MF_MOVE and MF_MOVE_ALL options while setting
-a new memory policy via mbind(). The pages of process can also be relocated
+a new memory policy via mbind(). The pages of a process can also be relocated
from another process using the sys_migrate_pages() function call. The
-migrate_pages function call takes two sets of nodes and moves pages of a
+migrate_pages() function call takes two sets of nodes and moves pages of a
process that are located on the from nodes to the destination nodes.
Page migration functions are provided by the numactl package by Andi Kleen
(a version later than 0.9.3 is required. Get it from
-ftp://oss.sgi.com/www/projects/libnuma/download/). numactl provides libnuma
-which provides an interface similar to other numa functionality for page
+https://github.com/numactl/numactl.git). numactl provides libnuma
+which provides an interface similar to other NUMA functionality for page
migration. cat ``/proc/<pid>/numa_maps`` allows an easy review of where the
pages of a process are located. See also the numa_maps documentation in the
proc(5) man page.
@@ -30,19 +33,19 @@ proc(5) man page.
Manual migration is useful if for example the scheduler has relocated
a process to a processor on a distant node. A batch scheduler or an
administrator may detect the situation and move the pages of the process
-nearer to the new processor. The kernel itself does only provide
+nearer to the new processor. The kernel itself only provides
manual page migration support. Automatic page migration may be implemented
through user space processes that move pages. A special function call
"move_pages" allows the moving of individual pages within a process.
-A NUMA profiler may f.e. obtain a log showing frequent off node
+For example, A NUMA profiler may obtain a log showing frequent off-node
accesses and may use the result to move pages to more advantageous
locations.
Larger installations usually partition the system using cpusets into
sections of nodes. Paul Jackson has equipped cpusets with the ability to
move pages when a task is moved to another cpuset (See
-Documentation/admin-guide/cgroup-v1/cpusets.rst).
-Cpusets allows the automation of process locality. If a task is moved to
+:ref:`CPUSETS <cpusets>`).
+Cpusets allow the automation of process locality. If a task is moved to
a new cpuset then also all its pages are moved with it so that the
performance of the process does not sink dramatically. Also the pages
of processes in a cpuset are moved if the allowed memory nodes of a
@@ -67,9 +70,9 @@ In kernel use of migrate_pages()
Lists of pages to be migrated are generated by scanning over
pages and moving them into lists. This is done by
calling isolate_lru_page().
- Calling isolate_lru_page increases the references to the page
+ Calling isolate_lru_page() increases the references to the page
so that it cannot vanish while the page migration occurs.
- It also prevents the swapper or other scans to encounter
+ It also prevents the swapper or other scans from encountering
the page.
2. We need to have a function of type new_page_t that can be
@@ -91,23 +94,24 @@ is increased so that the page cannot be freed while page migration occurs.
Steps:
-1. Lock the page to be migrated
+1. Lock the page to be migrated.
2. Ensure that writeback is complete.
3. Lock the new page that we want to move to. It is locked so that accesses to
- this (not yet uptodate) page immediately lock while the move is in progress.
+ this (not yet uptodate) page immediately block while the move is in progress.
4. All the page table references to the page are converted to migration
entries. This decreases the mapcount of a page. If the resulting
mapcount is not zero then we do not migrate the page. All user space
- processes that attempt to access the page will now wait on the page lock.
+ processes that attempt to access the page will now wait on the page lock
+ or wait for the migration page table entry to be removed.
5. The i_pages lock is taken. This will cause all processes trying
to access the page via the mapping to block on the spinlock.
-6. The refcount of the page is examined and we back out if references remain
- otherwise we know that we are the only one referencing this page.
+6. The refcount of the page is examined and we back out if references remain.
+ Otherwise, we know that we are the only one referencing this page.
7. The radix tree is checked and if it does not contain the pointer to this
page then we back out because someone else modified the radix tree.
@@ -134,124 +138,124 @@ Steps:
15. Queued up writeback on the new page is triggered.
-16. If migration entries were page then replace them with real ptes. Doing
- so will enable access for user space processes not already waiting for
- the page lock.
+16. If migration entries were inserted into the page table, then replace them
+ with real ptes. Doing so will enable access for user space processes not
+ already waiting for the page lock.
-19. The page locks are dropped from the old and new page.
+17. The page locks are dropped from the old and new page.
Processes waiting on the page lock will redo their page faults
and will reach the new page.
-20. The new page is moved to the LRU and can be scanned by the swapper
- etc again.
+18. The new page is moved to the LRU and can be scanned by the swapper,
+ etc. again.
Non-LRU page migration
======================
-Although original migration aimed for reducing the latency of memory access
-for NUMA, compaction who want to create high-order page is also main customer.
+Although migration originally aimed for reducing the latency of memory accesses
+for NUMA, compaction also uses migration to create high-order pages.
Current problem of the implementation is that it is designed to migrate only
-*LRU* pages. However, there are potential non-lru pages which can be migrated
+*LRU* pages. However, there are potential non-LRU pages which can be migrated
in drivers, for example, zsmalloc, virtio-balloon pages.
For virtio-balloon pages, some parts of migration code path have been hooked
up and added virtio-balloon specific functions to intercept migration logics.
It's too specific to a driver so other drivers who want to make their pages
-movable would have to add own specific hooks in migration path.
+movable would have to add their own specific hooks in the migration path.
-To overclome the problem, VM supports non-LRU page migration which provides
+To overcome the problem, VM supports non-LRU page migration which provides
generic functions for non-LRU movable pages without driver specific hooks
-migration path.
+in the migration path.
-If a driver want to make own pages movable, it should define three functions
+If a driver wants to make its pages movable, it should define three functions
which are function pointers of struct address_space_operations.
1. ``bool (*isolate_page) (struct page *page, isolate_mode_t mode);``
- What VM expects on isolate_page function of driver is to return *true*
- if driver isolates page successfully. On returing true, VM marks the page
+ What VM expects from isolate_page() function of driver is to return *true*
+ if driver isolates the page successfully. On returning true, VM marks the page
as PG_isolated so concurrent isolation in several CPUs skip the page
for isolation. If a driver cannot isolate the page, it should return *false*.
Once page is successfully isolated, VM uses page.lru fields so driver
- shouldn't expect to preserve values in that fields.
+ shouldn't expect to preserve values in those fields.
2. ``int (*migratepage) (struct address_space *mapping,``
| ``struct page *newpage, struct page *oldpage, enum migrate_mode);``
- After isolation, VM calls migratepage of driver with isolated page.
- The function of migratepage is to move content of the old page to new page
+ After isolation, VM calls migratepage() of driver with the isolated page.
+ The function of migratepage() is to move the contents of the old page to the
+ new page
and set up fields of struct page newpage. Keep in mind that you should
indicate to the VM the oldpage is no longer movable via __ClearPageMovable()
- under page_lock if you migrated the oldpage successfully and returns
+ under page_lock if you migrated the oldpage successfully and returned
MIGRATEPAGE_SUCCESS. If driver cannot migrate the page at the moment, driver
can return -EAGAIN. On -EAGAIN, VM will retry page migration in a short time
- because VM interprets -EAGAIN as "temporal migration failure". On returning
- any error except -EAGAIN, VM will give up the page migration without retrying
- in this time.
+ because VM interprets -EAGAIN as "temporary migration failure". On returning
+ any error except -EAGAIN, VM will give up the page migration without
+ retrying.
- Driver shouldn't touch page.lru field VM using in the functions.
+ Driver shouldn't touch the page.lru field while in the migratepage() function.
3. ``void (*putback_page)(struct page *);``
- If migration fails on isolated page, VM should return the isolated page
- to the driver so VM calls driver's putback_page with migration failed page.
- In this function, driver should put the isolated page back to the own data
+ If migration fails on the isolated page, VM should return the isolated page
+ to the driver so VM calls the driver's putback_page() with the isolated page.
+ In this function, the driver should put the isolated page back into its own data
structure.
-4. non-lru movable page flags
+4. non-LRU movable page flags
- There are two page flags for supporting non-lru movable page.
+ There are two page flags for supporting non-LRU movable page.
* PG_movable
- Driver should use the below function to make page movable under page_lock::
+ Driver should use the function below to make page movable under page_lock::
void __SetPageMovable(struct page *page, struct address_space *mapping)
It needs argument of address_space for registering migration
family functions which will be called by VM. Exactly speaking,
- PG_movable is not a real flag of struct page. Rather than, VM
- reuses page->mapping's lower bits to represent it.
+ PG_movable is not a real flag of struct page. Rather, VM
+ reuses the page->mapping's lower bits to represent it::
-::
#define PAGE_MAPPING_MOVABLE 0x2
page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
so driver shouldn't access page->mapping directly. Instead, driver should
- use page_mapping which mask off the low two bits of page->mapping under
- page lock so it can get right struct address_space.
-
- For testing of non-lru movable page, VM supports __PageMovable function.
- However, it doesn't guarantee to identify non-lru movable page because
- page->mapping field is unified with other variables in struct page.
- As well, if driver releases the page after isolation by VM, page->mapping
- doesn't have stable value although it has PAGE_MAPPING_MOVABLE
- (Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
- page is LRU or non-lru movable once the page has been isolated. Because
- LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
- good for just peeking to test non-lru movable pages before more expensive
- checking with lock_page in pfn scanning to select victim.
-
- For guaranteeing non-lru movable page, VM provides PageMovable function.
- Unlike __PageMovable, PageMovable functions validates page->mapping and
- mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
- destroying of page->mapping.
-
- Driver using __SetPageMovable should clear the flag via __ClearMovablePage
- under page_lock before the releasing the page.
+ use page_mapping() which masks off the low two bits of page->mapping under
+ page lock so it can get the right struct address_space.
+
+ For testing of non-LRU movable pages, VM supports __PageMovable() function.
+ However, it doesn't guarantee to identify non-LRU movable pages because
+ the page->mapping field is unified with other variables in struct page.
+ If the driver releases the page after isolation by VM, page->mapping
+ doesn't have a stable value although it has PAGE_MAPPING_MOVABLE set
+ (look at __ClearPageMovable). But __PageMovable() is cheap to call whether
+ page is LRU or non-LRU movable once the page has been isolated because LRU
+ pages can never have PAGE_MAPPING_MOVABLE set in page->mapping. It is also
+ good for just peeking to test non-LRU movable pages before more expensive
+ checking with lock_page() in pfn scanning to select a victim.
+
+ For guaranteeing non-LRU movable page, VM provides PageMovable() function.
+ Unlike __PageMovable(), PageMovable() validates page->mapping and
+ mapping->a_ops->isolate_page under lock_page(). The lock_page() prevents
+ sudden destroying of page->mapping.
+
+ Drivers using __SetPageMovable() should clear the flag via
+ __ClearMovablePage() under page_lock() before the releasing the page.
* PG_isolated
To prevent concurrent isolation among several CPUs, VM marks isolated page
- as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru
- movable page, it can skip it. Driver doesn't need to manipulate the flag
- because VM will set/clear it automatically. Keep in mind that if driver
- sees PG_isolated page, it means the page have been isolated by VM so it
- shouldn't touch page.lru field.
- PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag
- for own purpose.
+ as PG_isolated under lock_page(). So if a CPU encounters PG_isolated
+ non-LRU movable page, it can skip it. Driver doesn't need to manipulate the
+ flag because VM will set/clear it automatically. Keep in mind that if the
+ driver sees a PG_isolated page, it means the page has been isolated by the
+ VM so it shouldn't touch the page.lru field.
+ The PG_isolated flag is aliased with the PG_reclaim flag so drivers
+ shouldn't use PG_isolated for its own purposes.
Monitoring Migration
=====================
@@ -266,8 +270,8 @@ The following events (counters) can be used to monitor page migration.
512.
2. PGMIGRATE_FAIL: Normal page migration failure. Same counting rules as for
- _SUCCESS, above: this will be increased by the number of subpages, if it was
- a THP.
+ PGMIGRATE_SUCCESS, above: this will be increased by the number of subpages,
+ if it was a THP.
3. THP_MIGRATION_SUCCESS: A THP was migrated without being split.
diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst
index 849fad6893ef..54f13ad5fc17 100644
--- a/Documentation/watch_queue.rst
+++ b/Documentation/watch_queue.rst
@@ -103,8 +103,10 @@ watch that specific key).
To manage a watch list, the following functions are provided:
- * ``void init_watch_list(struct watch_list *wlist,
- void (*release_watch)(struct watch *wlist));``
+ * ::
+
+ void init_watch_list(struct watch_list *wlist,
+ void (*release_watch)(struct watch *wlist));
Initialise a watch list. If ``release_watch`` is not NULL, then this
indicates a function that should be called when the watch_list object is
@@ -179,9 +181,11 @@ The following functions are provided to manage watches:
driver-settable fields in the watch struct must have been set before this
is called.
- * ``int remove_watch_from_object(struct watch_list *wlist,
- struct watch_queue *wqueue,
- u64 id, false);``
+ * ::
+
+ int remove_watch_from_object(struct watch_list *wlist,
+ struct watch_queue *wqueue,
+ u64 id, false);
Remove a watch from a watch list, where the watch must match the specified
watch queue (``wqueue``) and object identifier (``id``). A notification
diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst
index 7fafc7ac00d7..abb9fc164657 100644
--- a/Documentation/x86/boot.rst
+++ b/Documentation/x86/boot.rst
@@ -1342,8 +1342,8 @@ follow::
In addition to read/modify/write the setup header of the struct
boot_params as that of 16-bit boot protocol, the boot loader should
-also fill the additional fields of the struct boot_params as that
-described in zero-page.txt.
+also fill the additional fields of the struct boot_params as
+described in chapter :doc:`zero-page`.
After setting up the struct boot_params, the boot loader can load the
32/64-bit kernel in the same way as that of 16-bit boot protocol.
@@ -1379,7 +1379,7 @@ can be calculated as follows::
In addition to read/modify/write the setup header of the struct
boot_params as that of 16-bit boot protocol, the boot loader should
also fill the additional fields of the struct boot_params as described
-in zero-page.txt.
+in chapter :doc:`zero-page`.
After setting up the struct boot_params, the boot loader can load
64-bit kernel in the same way as that of 16-bit boot protocol, but
diff --git a/Documentation/x86/cpuinfo.rst b/Documentation/x86/cpuinfo.rst
new file mode 100644
index 000000000000..5d54c39a063f
--- /dev/null
+++ b/Documentation/x86/cpuinfo.rst
@@ -0,0 +1,155 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+x86 Feature Flags
+=================
+
+Introduction
+============
+
+On x86, flags appearing in /proc/cpuinfo have an X86_FEATURE definition
+in arch/x86/include/asm/cpufeatures.h. If the kernel cares about a feature
+or KVM want to expose the feature to a KVM guest, it can and should have
+an X86_FEATURE_* defined. These flags represent hardware features as
+well as software features.
+
+If users want to know if a feature is available on a given system, they
+try to find the flag in /proc/cpuinfo. If a given flag is present, it
+means that the kernel supports it and is currently making it available.
+If such flag represents a hardware feature, it also means that the
+hardware supports it.
+
+If the expected flag does not appear in /proc/cpuinfo, things are murkier.
+Users need to find out the reason why the flag is missing and find the way
+how to enable it, which is not always easy. There are several factors that
+can explain missing flags: the expected feature failed to enable, the feature
+is missing in hardware, platform firmware did not enable it, the feature is
+disabled at build or run time, an old kernel is in use, or the kernel does
+not support the feature and thus has not enabled it. In general, /proc/cpuinfo
+shows features which the kernel supports. For a full list of CPUID flags
+which the CPU supports, use tools/arch/x86/kcpuid.
+
+How are feature flags created?
+==============================
+
+a: Feature flags can be derived from the contents of CPUID leaves.
+------------------------------------------------------------------
+These feature definitions are organized mirroring the layout of CPUID
+leaves and grouped in words with offsets as mapped in enum cpuid_leafs
+in cpufeatures.h (see arch/x86/include/asm/cpufeatures.h for details).
+If a feature is defined with a X86_FEATURE_<name> definition in
+cpufeatures.h, and if it is detected at run time, the flags will be
+displayed accordingly in /proc/cpuinfo. For example, the flag "avx2"
+comes from X86_FEATURE_AVX2 in cpufeatures.h.
+
+b: Flags can be from scattered CPUID-based features.
+----------------------------------------------------
+Hardware features enumerated in sparsely populated CPUID leaves get
+software-defined values. Still, CPUID needs to be queried to determine
+if a given feature is present. This is done in init_scattered_cpuid_features().
+For instance, X86_FEATURE_CQM_LLC is defined as 11*32 + 0 and its presence is
+checked at runtime in the respective CPUID leaf [EAX=f, ECX=0] bit EDX[1].
+
+The intent of scattering CPUID leaves is to not bloat struct
+cpuinfo_x86.x86_capability[] unnecessarily. For instance, the CPUID leaf
+[EAX=7, ECX=0] has 30 features and is dense, but the CPUID leaf [EAX=7, EAX=1]
+has only one feature and would waste 31 bits of space in the x86_capability[]
+array. Since there is a struct cpuinfo_x86 for each possible CPU, the wasted
+memory is not trivial.
+
+c: Flags can be created synthetically under certain conditions for hardware features.
+-------------------------------------------------------------------------------------
+Examples of conditions include whether certain features are present in
+MSR_IA32_CORE_CAPS or specific CPU models are identified. If the needed
+conditions are met, the features are enabled by the set_cpu_cap or
+setup_force_cpu_cap macros. For example, if bit 5 is set in MSR_IA32_CORE_CAPS,
+the feature X86_FEATURE_SPLIT_LOCK_DETECT will be enabled and
+"split_lock_detect" will be displayed. The flag "ring3mwait" will be
+displayed only when running on INTEL_FAM6_XEON_PHI_[KNL|KNM] processors.
+
+d: Flags can represent purely software features.
+------------------------------------------------
+These flags do not represent hardware features. Instead, they represent a
+software feature implemented in the kernel. For example, Kernel Page Table
+Isolation is purely software feature and its feature flag X86_FEATURE_PTI is
+also defined in cpufeatures.h.
+
+Naming of Flags
+===============
+
+The script arch/x86/kernel/cpu/mkcapflags.sh processes the
+#define X86_FEATURE_<name> from cpufeatures.h and generates the
+x86_cap/bug_flags[] arrays in kernel/cpu/capflags.c. The names in the
+resulting x86_cap/bug_flags[] are used to populate /proc/cpuinfo. The naming
+of flags in the x86_cap/bug_flags[] are as follows:
+
+a: The name of the flag is from the string in X86_FEATURE_<name> by default.
+----------------------------------------------------------------------------
+By default, the flag <name> in /proc/cpuinfo is extracted from the respective
+X86_FEATURE_<name> in cpufeatures.h. For example, the flag "avx2" is from
+X86_FEATURE_AVX2.
+
+b: The naming can be overridden.
+--------------------------------
+If the comment on the line for the #define X86_FEATURE_* starts with a
+double-quote character (""), the string inside the double-quote characters
+will be the name of the flags. For example, the flag "sse4_1" comes from
+the comment "sse4_1" following the X86_FEATURE_XMM4_1 definition.
+
+There are situations in which overriding the displayed name of the flag is
+needed. For instance, /proc/cpuinfo is a userspace interface and must remain
+constant. If, for some reason, the naming of X86_FEATURE_<name> changes, one
+shall override the new naming with the name already used in /proc/cpuinfo.
+
+c: The naming override can be "", which means it will not appear in /proc/cpuinfo.
+----------------------------------------------------------------------------------
+The feature shall be omitted from /proc/cpuinfo if it does not make sense for
+the feature to be exposed to userspace. For example, X86_FEATURE_ALWAYS is
+defined in cpufeatures.h but that flag is an internal kernel feature used
+in the alternative runtime patching functionality. So, its name is overridden
+with "". Its flag will not appear in /proc/cpuinfo.
+
+Flags are missing when one or more of these happen
+==================================================
+
+a: The hardware does not enumerate support for it.
+--------------------------------------------------
+For example, when a new kernel is running on old hardware or the feature is
+not enabled by boot firmware. Even if the hardware is new, there might be a
+problem enabling the feature at run time, the flag will not be displayed.
+
+b: The kernel does not know about the flag.
+-------------------------------------------
+For example, when an old kernel is running on new hardware.
+
+c: The kernel disabled support for it at compile-time.
+------------------------------------------------------
+For example, if 5-level-paging is not enabled when building (i.e.,
+CONFIG_X86_5LEVEL is not selected) the flag "la57" will not show up [#f1]_.
+Even though the feature will still be detected via CPUID, the kernel disables
+it by clearing via setup_clear_cpu_cap(X86_FEATURE_LA57).
+
+d: The feature is disabled at boot-time.
+----------------------------------------
+A feature can be disabled either using a command-line parameter or because
+it failed to be enabled. The command-line parameter clearcpuid= can be used
+to disable features using the feature number as defined in
+/arch/x86/include/asm/cpufeatures.h. For instance, User Mode Instruction
+Protection can be disabled using clearcpuid=514. The number 514 is calculated
+from #define X86_FEATURE_UMIP (16*32 + 2).
+
+In addition, there exists a variety of custom command-line parameters that
+disable specific features. The list of parameters includes, but is not limited
+to, nofsgsbase, nosmap, and nosmep. 5-level paging can also be disabled using
+"no5lvl". SMAP and SMEP are disabled with the aforementioned parameters,
+respectively.
+
+e: The feature was known to be non-functional.
+----------------------------------------------
+The feature was known to be non-functional because a dependency was
+missing at runtime. For example, AVX flags will not show up if XSAVE feature
+is disabled since they depend on XSAVE feature. Another example would be broken
+CPUs and them missing microcode patches. Due to that, the kernel decides not to
+enable a feature.
+
+.. [#f1] 5-level paging uses linear address of 57 bits.
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 265d9e9a093b..740ee7f87898 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -9,6 +9,7 @@ x86-specific Documentation
:numbered:
boot
+ cpuinfo
topology
exception-tables
kernel-stacks
@@ -30,3 +31,4 @@ x86-specific Documentation
usb-legacy-support
i386/index
x86_64/index
+ sva
diff --git a/Documentation/x86/resctrl_ui.rst b/Documentation/x86/resctrl_ui.rst
index 5368cedfb530..e59b7b93a9b4 100644
--- a/Documentation/x86/resctrl_ui.rst
+++ b/Documentation/x86/resctrl_ui.rst
@@ -138,6 +138,18 @@ with respect to allocation:
non-linear. This field is purely informational
only.
+"thread_throttle_mode":
+ Indicator on Intel systems of how tasks running on threads
+ of a physical core are throttled in cases where they
+ request different memory bandwidth percentages:
+
+ "max":
+ the smallest percentage is applied
+ to all threads
+ "per-thread":
+ bandwidth percentages are directly applied to
+ the threads running on the core
+
If RDT monitoring is available there will be an "L3_MON" directory
with the following files:
@@ -364,8 +376,10 @@ to the next control step available on the hardware.
The bandwidth throttling is a core specific mechanism on some of Intel
SKUs. Using a high bandwidth and a low bandwidth setting on two threads
-sharing a core will result in both threads being throttled to use the
-low bandwidth. The fact that Memory bandwidth allocation(MBA) is a core
+sharing a core may result in both threads being throttled to use the
+low bandwidth (see "thread_throttle_mode").
+
+The fact that Memory bandwidth allocation(MBA) may be a core
specific mechanism where as memory bandwidth monitoring(MBM) is done at
the package level may lead to confusion when users try to apply control
via the MBA and then monitor the bandwidth to see if the controls are
diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst
new file mode 100644
index 000000000000..076efd51ef1f
--- /dev/null
+++ b/Documentation/x86/sva.rst
@@ -0,0 +1,257 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================================
+Shared Virtual Addressing (SVA) with ENQCMD
+===========================================
+
+Background
+==========
+
+Shared Virtual Addressing (SVA) allows the processor and device to use the
+same virtual addresses avoiding the need for software to translate virtual
+addresses to physical addresses. SVA is what PCIe calls Shared Virtual
+Memory (SVM).
+
+In addition to the convenience of using application virtual addresses
+by the device, it also doesn't require pinning pages for DMA.
+PCIe Address Translation Services (ATS) along with Page Request Interface
+(PRI) allow devices to function much the same way as the CPU handling
+application page-faults. For more information please refer to the PCIe
+specification Chapter 10: ATS Specification.
+
+Use of SVA requires IOMMU support in the platform. IOMMU is also
+required to support the PCIe features ATS and PRI. ATS allows devices
+to cache translations for virtual addresses. The IOMMU driver uses the
+mmu_notifier() support to keep the device TLB cache and the CPU cache in
+sync. When an ATS lookup fails for a virtual address, the device should
+use the PRI in order to request the virtual address to be paged into the
+CPU page tables. The device must use ATS again in order the fetch the
+translation before use.
+
+Shared Hardware Workqueues
+==========================
+
+Unlike Single Root I/O Virtualization (SR-IOV), Scalable IOV (SIOV) permits
+the use of Shared Work Queues (SWQ) by both applications and Virtual
+Machines (VM's). This allows better hardware utilization vs. hard
+partitioning resources that could result in under utilization. In order to
+allow the hardware to distinguish the context for which work is being
+executed in the hardware by SWQ interface, SIOV uses Process Address Space
+ID (PASID), which is a 20-bit number defined by the PCIe SIG.
+
+PASID value is encoded in all transactions from the device. This allows the
+IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe
+Resource Identifier (RID) which is the Bus/Device/Function.
+
+
+ENQCMD
+======
+
+ENQCMD is a new instruction on Intel platforms that atomically submits a
+work descriptor to a device. The descriptor includes the operation to be
+performed, virtual addresses of all parameters, virtual address of a completion
+record, and the PASID (process address space ID) of the current process.
+
+ENQCMD works with non-posted semantics and carries a status back if the
+command was accepted by hardware. This allows the submitter to know if the
+submission needs to be retried or other device specific mechanisms to
+implement fairness or ensure forward progress should be provided.
+
+ENQCMD is the glue that ensures applications can directly submit commands
+to the hardware and also permits hardware to be aware of application context
+to perform I/O operations via use of PASID.
+
+Process Address Space Tagging
+=============================
+
+A new thread-scoped MSR (IA32_PASID) provides the connection between
+user processes and the rest of the hardware. When an application first
+accesses an SVA-capable device, this MSR is initialized with a newly
+allocated PASID. The driver for the device calls an IOMMU-specific API
+that sets up the routing for DMA and page-requests.
+
+For example, the Intel Data Streaming Accelerator (DSA) uses
+iommu_sva_bind_device(), which will do the following:
+
+- Allocate the PASID, and program the process page-table (%cr3 register) in the
+ PASID context entries.
+- Register for mmu_notifier() to track any page-table invalidations to keep
+ the device TLB in sync. For example, when a page-table entry is invalidated,
+ the IOMMU propagates the invalidation to the device TLB. This will force any
+ future access by the device to this virtual address to participate in
+ ATS. If the IOMMU responds with proper response that a page is not
+ present, the device would request the page to be paged in via the PCIe PRI
+ protocol before performing I/O.
+
+This MSR is managed with the XSAVE feature set as "supervisor state" to
+ensure the MSR is updated during context switch.
+
+PASID Management
+================
+
+The kernel must allocate a PASID on behalf of each process which will use
+ENQCMD and program it into the new MSR to communicate the process identity to
+platform hardware. ENQCMD uses the PASID stored in this MSR to tag requests
+from this process. When a user submits a work descriptor to a device using the
+ENQCMD instruction, the PASID field in the descriptor is auto-filled with the
+value from MSR_IA32_PASID. Requests for DMA from the device are also tagged
+with the same PASID. The platform IOMMU uses the PASID in the transaction to
+perform address translation. The IOMMU APIs setup the corresponding PASID
+entry in IOMMU with the process address used by the CPU (e.g. %cr3 register in
+x86).
+
+The MSR must be configured on each logical CPU before any application
+thread can interact with a device. Threads that belong to the same
+process share the same page tables, thus the same MSR value.
+
+PASID is cleared when a process is created. The PASID allocation and MSR
+programming may occur long after a process and its threads have been created.
+One thread must call iommu_sva_bind_device() to allocate the PASID for the
+process. If a thread uses ENQCMD without the MSR first being populated, a #GP
+will be raised. The kernel will update the PASID MSR with the PASID for all
+threads in the process. A single process PASID can be used simultaneously
+with multiple devices since they all share the same address space.
+
+One thread can call iommu_sva_unbind_device() to free the allocated PASID.
+The kernel will clear the PASID MSR for all threads belonging to the process.
+
+New threads inherit the MSR value from the parent.
+
+Relationships
+=============
+
+ * Each process has many threads, but only one PASID.
+ * Devices have a limited number (~10's to 1000's) of hardware workqueues.
+ The device driver manages allocating hardware workqueues.
+ * A single mmap() maps a single hardware workqueue as a "portal" and
+ each portal maps down to a single workqueue.
+ * For each device with which a process interacts, there must be
+ one or more mmap()'d portals.
+ * Many threads within a process can share a single portal to access
+ a single device.
+ * Multiple processes can separately mmap() the same portal, in
+ which case they still share one device hardware workqueue.
+ * The single process-wide PASID is used by all threads to interact
+ with all devices. There is not, for instance, a PASID for each
+ thread or each thread<->device pair.
+
+FAQ
+===
+
+* What is SVA/SVM?
+
+Shared Virtual Addressing (SVA) permits I/O hardware and the processor to
+work in the same address space, i.e., to share it. Some call it Shared
+Virtual Memory (SVM), but Linux community wanted to avoid confusing it with
+POSIX Shared Memory and Secure Virtual Machines which were terms already in
+circulation.
+
+* What is a PASID?
+
+A Process Address Space ID (PASID) is a PCIe-defined Transaction Layer Packet
+(TLP) prefix. A PASID is a 20-bit number allocated and managed by the OS.
+PASID is included in all transactions between the platform and the device.
+
+* How are shared workqueues different?
+
+Traditionally, in order for userspace applications to interact with hardware,
+there is a separate hardware instance required per process. For example,
+consider doorbells as a mechanism of informing hardware about work to process.
+Each doorbell is required to be spaced 4k (or page-size) apart for process
+isolation. This requires hardware to provision that space and reserve it in
+MMIO. This doesn't scale as the number of threads becomes quite large. The
+hardware also manages the queue depth for Shared Work Queues (SWQ), and
+consumers don't need to track queue depth. If there is no space to accept
+a command, the device will return an error indicating retry.
+
+A user should check Deferrable Memory Write (DMWr) capability on the device
+and only submits ENQCMD when the device supports it. In the new DMWr PCIe
+terminology, devices need to support DMWr completer capability. In addition,
+it requires all switch ports to support DMWr routing and must be enabled by
+the PCIe subsystem, much like how PCIe atomic operations are managed for
+instance.
+
+SWQ allows hardware to provision just a single address in the device. When
+used with ENQCMD to submit work, the device can distinguish the process
+submitting the work since it will include the PASID assigned to that
+process. This helps the device scale to a large number of processes.
+
+* Is this the same as a user space device driver?
+
+Communicating with the device via the shared workqueue is much simpler
+than a full blown user space driver. The kernel driver does all the
+initialization of the hardware. User space only needs to worry about
+submitting work and processing completions.
+
+* Is this the same as SR-IOV?
+
+Single Root I/O Virtualization (SR-IOV) focuses on providing independent
+hardware interfaces for virtualizing hardware. Hence, it's required to be
+almost fully functional interface to software supporting the traditional
+BARs, space for interrupts via MSI-X, its own register layout.
+Virtual Functions (VFs) are assisted by the Physical Function (PF)
+driver.
+
+Scalable I/O Virtualization builds on the PASID concept to create device
+instances for virtualization. SIOV requires host software to assist in
+creating virtual devices; each virtual device is represented by a PASID
+along with the bus/device/function of the device. This allows device
+hardware to optimize device resource creation and can grow dynamically on
+demand. SR-IOV creation and management is very static in nature. Consult
+references below for more details.
+
+* Why not just create a virtual function for each app?
+
+Creating PCIe SR-IOV type Virtual Functions (VF) is expensive. VFs require
+duplicated hardware for PCI config space and interrupts such as MSI-X.
+Resources such as interrupts have to be hard partitioned between VFs at
+creation time, and cannot scale dynamically on demand. The VFs are not
+completely independent from the Physical Function (PF). Most VFs require
+some communication and assistance from the PF driver. SIOV, in contrast,
+creates a software-defined device where all the configuration and control
+aspects are mediated via the slow path. The work submission and completion
+happen without any mediation.
+
+* Does this support virtualization?
+
+ENQCMD can be used from within a guest VM. In these cases, the VMM helps
+with setting up a translation table to translate from Guest PASID to Host
+PASID. Please consult the ENQCMD instruction set reference for more
+details.
+
+* Does memory need to be pinned?
+
+When devices support SVA along with platform hardware such as IOMMU
+supporting such devices, there is no need to pin memory for DMA purposes.
+Devices that support SVA also support other PCIe features that remove the
+pinning requirement for memory.
+
+Device TLB support - Device requests the IOMMU to lookup an address before
+use via Address Translation Service (ATS) requests. If the mapping exists
+but there is no page allocated by the OS, IOMMU hardware returns that no
+mapping exists.
+
+Device requests the virtual address to be mapped via Page Request
+Interface (PRI). Once the OS has successfully completed the mapping, it
+returns the response back to the device. The device requests again for
+a translation and continues.
+
+IOMMU works with the OS in managing consistency of page-tables with the
+device. When removing pages, it interacts with the device to remove any
+device TLB entry that might have been cached before removing the mappings from
+the OS.
+
+References
+==========
+
+VT-D:
+https://01.org/blogs/ashokraj/2018/recent-enhancements-intel-virtualization-technology-directed-i/o-intel-vt-d
+
+SIOV:
+https://01.org/blogs/2019/assignable-interfaces-intel-scalable-i/o-virtualization-linux
+
+ENQCMD in ISE:
+https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
+
+DSA spec:
+https://software.intel.com/sites/default/files/341204-intel-data-streaming-accelerator-spec.pdf