diff options
Diffstat (limited to 'Documentation')
265 files changed, 7838 insertions, 6077 deletions
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX deleted file mode 100644 index 2754fe83f0d4..000000000000 --- a/Documentation/00-INDEX +++ /dev/null @@ -1,428 +0,0 @@ - -This is a brief list of all the files in ./linux/Documentation and what -they contain. If you add a documentation file, please list it here in -alphabetical order as well, or risk being hunted down like a rabid dog. -Please keep the descriptions small enough to fit on one line. - Thanks -- Paul G. - -Following translations are available on the WWW: - - - Japanese, maintained by the JF Project (jf@listserv.linux.or.jp), at - http://linuxjf.sourceforge.jp/ - -00-INDEX - - this file. -ABI/ - - info on kernel <-> userspace ABI and relative interface stability. -CodingStyle - - nothing here, just a pointer to process/coding-style.rst. -DMA-API.txt - - DMA API, pci_ API & extensions for non-consistent memory machines. -DMA-API-HOWTO.txt - - Dynamic DMA mapping Guide -DMA-ISA-LPC.txt - - How to do DMA with ISA (and LPC) devices. -DMA-attributes.txt - - listing of the various possible attributes a DMA region can have -EDID/ - - directory with info on customizing EDID for broken gfx/displays. -IPMI.txt - - info on Linux Intelligent Platform Management Interface (IPMI) Driver. -IRQ-affinity.txt - - how to select which CPU(s) handle which interrupt events on SMP. -IRQ-domain.txt - - info on interrupt numbering and setting up IRQ domains. -IRQ.txt - - description of what an IRQ is. -Intel-IOMMU.txt - - basic info on the Intel IOMMU virtualization support. -Makefile - - It's not of interest for those who aren't touching the build system. -PCI/ - - info related to PCI drivers. -RCU/ - - directory with info on RCU (read-copy update). -SAK.txt - - info on Secure Attention Keys. -SM501.txt - - Silicon Motion SM501 multimedia companion chip -SubmittingPatches - - nothing here, just a pointer to process/coding-style.rst. -accounting/ - - documentation on accounting and taskstats. -acpi/ - - info on ACPI-specific hooks in the kernel. -admin-guide/ - - info related to Linux users and system admins. -aoe/ - - description of AoE (ATA over Ethernet) along with config examples. -arm/ - - directory with info about Linux on the ARM architecture. -arm64/ - - directory with info about Linux on the 64 bit ARM architecture. -auxdisplay/ - - misc. LCD driver documentation (cfag12864b, ks0108). -backlight/ - - directory with info on controlling backlights in flat panel displays -block/ - - info on the Block I/O (BIO) layer. -blockdev/ - - info on block devices & drivers -bt8xxgpio.txt - - info on how to modify a bt8xx video card for GPIO usage. -btmrvl.txt - - info on Marvell Bluetooth driver usage. -bus-devices/ - - directory with info on TI GPMC (General Purpose Memory Controller) -bus-virt-phys-mapping.txt - - how to access I/O mapped memory from within device drivers. -cdrom/ - - directory with information on the CD-ROM drivers that Linux has. -cgroup-v1/ - - cgroups v1 features, including cpusets and memory controller. -cma/ - - Continuous Memory Area (CMA) debugfs interface. -conf.py - - It's not of interest for those who aren't touching the build system. -connector/ - - docs on the netlink based userspace<->kernel space communication mod. -console/ - - documentation on Linux console drivers. -core-api/ - - documentation on kernel core components. -cpu-freq/ - - info on CPU frequency and voltage scaling. -cpu-hotplug.txt - - document describing CPU hotplug support in the Linux kernel. -cpu-load.txt - - document describing how CPU load statistics are collected. -cpuidle/ - - info on CPU_IDLE, CPU idle state management subsystem. -cputopology.txt - - documentation on how CPU topology info is exported via sysfs. -crc32.txt - - brief tutorial on CRC computation -crypto/ - - directory with info on the Crypto API. -dcdbas.txt - - information on the Dell Systems Management Base Driver. -debugging-modules.txt - - some notes on debugging modules after Linux 2.6.3. -debugging-via-ohci1394.txt - - how to use firewire like a hardware debugger memory reader. -dell_rbu.txt - - document demonstrating the use of the Dell Remote BIOS Update driver. -dev-tools/ - - directory with info on development tools for the kernel. -device-mapper/ - - directory with info on Device Mapper. -dmaengine/ - - the DMA engine and controller API guides. -devicetree/ - - directory with info on device tree files used by OF/PowerPC/ARM -digsig.txt - -info on the Digital Signature Verification API -dma-buf-sharing.txt - - the DMA Buffer Sharing API Guide -docutils.conf - - nothing here. Just a configuration file for docutils. -dontdiff - - file containing a list of files that should never be diff'ed. -driver-api/ - - the Linux driver implementer's API guide. -driver-model/ - - directory with info about Linux driver model. -early-userspace/ - - info about initramfs, klibc, and userspace early during boot. -efi-stub.txt - - How to use the EFI boot stub to bypass GRUB or elilo on EFI systems. -eisa.txt - - info on EISA bus support. -extcon/ - - directory with porting guide for Android kernel switch driver. -isa.txt - - info on EISA bus support. -fault-injection/ - - dir with docs about the fault injection capabilities infrastructure. -fb/ - - directory with info on the frame buffer graphics abstraction layer. -features/ - - status of feature implementation on different architectures. -filesystems/ - - info on the vfs and the various filesystems that Linux supports. -firmware_class/ - - request_firmware() hotplug interface info. -flexible-arrays.txt - - how to make use of flexible sized arrays in linux -fmc/ - - information about the FMC bus abstraction -fpga/ - - FPGA Manager Core. -futex-requeue-pi.txt - - info on requeueing of tasks from a non-PI futex to a PI futex -gcc-plugins.txt - - GCC plugin infrastructure. -gpio/ - - gpio related documentation -gpu/ - - directory with information on GPU driver developer's guide. -hid/ - - directory with information on human interface devices -highuid.txt - - notes on the change from 16 bit to 32 bit user/group IDs. -hwspinlock.txt - - hardware spinlock provides hardware assistance for synchronization -timers/ - - info on the timer related topics -hw_random.txt - - info on Linux support for random number generator in i8xx chipsets. -hwmon/ - - directory with docs on various hardware monitoring drivers. -i2c/ - - directory with info about the I2C bus/protocol (2 wire, kHz speed). -x86/i386/ - - directory with info about Linux on Intel 32 bit architecture. -ia64/ - - directory with info about Linux on Intel 64 bit architecture. -ide/ - - Information regarding the Enhanced IDE drive. -iio/ - - info on industrial IIO configfs support. -index.rst - - main index for the documentation at ReST format. -infiniband/ - - directory with documents concerning Linux InfiniBand support. -input/ - - info on Linux input device support. -intel_txt.txt - - info on intel Trusted Execution Technology (intel TXT). -io-mapping.txt - - description of io_mapping functions in linux/io-mapping.h -io_ordering.txt - - info on ordering I/O writes to memory-mapped addresses. -ioctl/ - - directory with documents describing various IOCTL calls. -iostats.txt - - info on I/O statistics Linux kernel provides. -irqflags-tracing.txt - - how to use the irq-flags tracing feature. -isapnp.txt - - info on Linux ISA Plug & Play support. -isdn/ - - directory with info on the Linux ISDN support, and supported cards. -kbuild/ - - directory with info about the kernel build process. -kdump/ - - directory with mini HowTo on getting the crash dump code to work. -doc-guide/ - - how to write and format reStructuredText kernel documentation -kernel-per-CPU-kthreads.txt - - List of all per-CPU kthreads and how they introduce jitter. -kobject.txt - - info of the kobject infrastructure of the Linux kernel. -kprobes.txt - - documents the kernel probes debugging feature. -kref.txt - - docs on adding reference counters (krefs) to kernel objects. -laptops/ - - directory with laptop related info and laptop driver documentation. -ldm.txt - - a brief description of LDM (Windows Dynamic Disks). -leds/ - - directory with info about LED handling under Linux. -livepatch/ - - info on kernel live patching. -locking/ - - directory with info about kernel locking primitives -lockup-watchdogs.txt - - info on soft and hard lockup detectors (aka nmi_watchdog). -logo.gif - - full colour GIF image of Linux logo (penguin - Tux). -logo.txt - - info on creator of above logo & site to get additional images from. -lsm.txt - - Linux Security Modules: General Security Hooks for Linux -lzo.txt - - kernel LZO decompressor input formats -m68k/ - - directory with info about Linux on Motorola 68k architecture. -mailbox.txt - - How to write drivers for the common mailbox framework (IPC). -md/ - - directory with info about Linux Software RAID -media/ - - info on media drivers: uAPI, kAPI and driver documentation. -memory-barriers.txt - - info on Linux kernel memory barriers. -memory-devices/ - - directory with info on parts like the Texas Instruments EMIF driver -memory-hotplug.txt - - Hotpluggable memory support, how to use and current status. -men-chameleon-bus.txt - - info on MEN chameleon bus. -mic/ - - Intel Many Integrated Core (MIC) architecture device driver. -mips/ - - directory with info about Linux on MIPS architecture. -misc-devices/ - - directory with info about devices using the misc dev subsystem -mmc/ - - directory with info about the MMC subsystem -mtd/ - - directory with info about memory technology devices (flash) -namespaces/ - - directory with various information about namespaces -netlabel/ - - directory with information on the NetLabel subsystem. -networking/ - - directory with info on various aspects of networking with Linux. -nfc/ - - directory relating info about Near Field Communications support. -nios2/ - - Linux on the Nios II architecture. -nommu-mmap.txt - - documentation about no-mmu memory mapping support. -numastat.txt - - info on how to read Numa policy hit/miss statistics in sysfs. -ntb.txt - - info on Non-Transparent Bridge (NTB) drivers. -nvdimm/ - - info on non-volatile devices. -nvmem/ - - info on non volatile memory framework. -output/ - - default directory where html/LaTeX/pdf files will be written. -padata.txt - - An introduction to the "padata" parallel execution API -parisc/ - - directory with info on using Linux on PA-RISC architecture. -parport-lowlevel.txt - - description and usage of the low level parallel port functions. -pcmcia/ - - info on the Linux PCMCIA driver. -percpu-rw-semaphore.txt - - RCU based read-write semaphore optimized for locking for reading -perf/ - - info about the APM X-Gene SoC Performance Monitoring Unit (PMU). -phy/ - - ino on Samsung USB 2.0 PHY adaptation layer. -phy.txt - - Description of the generic PHY framework. -pi-futex.txt - - documentation on lightweight priority inheritance futexes. -pinctrl.txt - - info on pinctrl subsystem and the PINMUX/PINCONF and drivers -platform/ - - List of supported hardware by compal and Dell laptop. -pnp.txt - - Linux Plug and Play documentation. -power/ - - directory with info on Linux PCI power management. -powerpc/ - - directory with info on using Linux with the PowerPC. -prctl/ - - directory with info on the priveledge control subsystem -preempt-locking.txt - - info on locking under a preemptive kernel. -process/ - - how to work with the mainline kernel development process. -pps/ - - directory with information on the pulse-per-second support -pti/ - - directory with info on Intel MID PTI. -ptp/ - - directory with info on support for IEEE 1588 PTP clocks in Linux. -pwm.txt - - info on the pulse width modulation driver subsystem -rapidio/ - - directory with info on RapidIO packet-based fabric interconnect -rbtree.txt - - info on what red-black trees are and what they are for. -remoteproc.txt - - info on how to handle remote processor (e.g. AMP) offloads/usage. -rfkill.txt - - info on the radio frequency kill switch subsystem/support. -robust-futex-ABI.txt - - documentation of the robust futex ABI. -robust-futexes.txt - - a description of what robust futexes are. -rpmsg.txt - - info on the Remote Processor Messaging (rpmsg) Framework -rtc.txt - - notes on how to use the Real Time Clock (aka CMOS clock) driver. -s390/ - - directory with info on using Linux on the IBM S390. -scheduler/ - - directory with info on the scheduler. -scsi/ - - directory with info on Linux scsi support. -security/ - - directory that contains security-related info -serial/ - - directory with info on the low level serial API. -sgi-ioc4.txt - - description of the SGI IOC4 PCI (multi function) device. -sh/ - - directory with info on porting Linux to a new architecture. -smsc_ece1099.txt - -info on the smsc Keyboard Scan Expansion/GPIO Expansion device. -sound/ - - directory with info on sound card support. -spi/ - - overview of Linux kernel Serial Peripheral Interface (SPI) support. -sphinx/ - - no documentation here, just files required by Sphinx toolchain. -sphinx-static/ - - no documentation here, just files required by Sphinx toolchain. -static-keys.txt - - info on how static keys allow debug code in hotpaths via patching -svga.txt - - short guide on selecting video modes at boot via VGA BIOS. -sync_file.txt - - Sync file API guide. -sysctl/ - - directory with info on the /proc/sys/* files. -target/ - - directory with info on generating TCM v4 fabric .ko modules -tee.txt - - info on the TEE subsystem and drivers -this_cpu_ops.txt - - List rationale behind and the way to use this_cpu operations. -thermal/ - - directory with information on managing thermal issues (CPU/temp) -trace/ - - directory with info on tracing technologies within linux -translations/ - - translations of this document from English to another language -unaligned-memory-access.txt - - info on how to avoid arch breaking unaligned memory access in code. -unshare.txt - - description of the Linux unshare system call. -usb/ - - directory with info regarding the Universal Serial Bus. -vfio.txt - - info on Virtual Function I/O used in guest/hypervisor instances. -video-output.txt - - sysfs class driver interface to enable/disable a video output device. -virtual/ - - directory with information on the various linux virtualizations. -vm/ - - directory with info on the Linux vm code. -w1/ - - directory with documents regarding the 1-wire (w1) subsystem. -watchdog/ - - how to auto-reboot Linux if it has "fallen and can't get up". ;-) -wimax/ - - directory with info about Intel Wireless Wimax Connections -core-api/workqueue.rst - - information on the Concurrency Managed Workqueue implementation -x86/x86_64/ - - directory with info on Linux support for AMD x86-64 (Hammer) machines. -xillybus.txt - - Overview and basic ui of xillybus driver -xtensa/ - - directory with documents relating to arch/xtensa port/implementation -xz.txt - - how to make use of the XZ data compression within linux kernel -zorro.txt - - info on writing drivers for Zorro bus devices found on Amigas. diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index 44d4b2be92fd..8bfee557e50e 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -323,3 +323,27 @@ Description: This is similar to /sys/bus/pci/drivers_autoprobe, but affects only the VFs associated with a specific PF. + +What: /sys/bus/pci/devices/.../p2pmem/size +Date: November 2017 +Contact: Logan Gunthorpe <logang@deltatee.com> +Description: + If the device has any Peer-to-Peer memory registered, this + file contains the total amount of memory that the device + provides (in decimal). + +What: /sys/bus/pci/devices/.../p2pmem/available +Date: November 2017 +Contact: Logan Gunthorpe <logang@deltatee.com> +Description: + If the device has any Peer-to-Peer memory registered, this + file contains the amount of memory that has not been + allocated (in decimal). + +What: /sys/bus/pci/devices/.../p2pmem/published +Date: November 2017 +Contact: Logan Gunthorpe <logang@deltatee.com> +Description: + If the device has any Peer-to-Peer memory registered, this + file contains a '1' if the memory has been published for + use outside the driver that owns the device. diff --git a/Documentation/ABI/testing/sysfs-class-lcd-s6e63m0 b/Documentation/ABI/testing/sysfs-class-lcd-s6e63m0 deleted file mode 100644 index ae0a2d3dcc07..000000000000 --- a/Documentation/ABI/testing/sysfs-class-lcd-s6e63m0 +++ /dev/null @@ -1,27 +0,0 @@ -sysfs interface for the S6E63M0 AMOLED LCD panel driver -------------------------------------------------------- - -What: /sys/class/lcd/<lcd>/gamma_mode -Date: May, 2010 -KernelVersion: v2.6.35 -Contact: dri-devel@lists.freedesktop.org -Description: - (RW) Read or write the gamma mode. Following three modes are - supported: - 0 - gamma value 2.2, - 1 - gamma value 1.9 and - 2 - gamma value 1.7. - - -What: /sys/class/lcd/<lcd>/gamma_table -Date: May, 2010 -KernelVersion: v2.6.35 -Contact: dri-devel@lists.freedesktop.org -Description: - (RO) Displays the size of the gamma table i.e. the number of - gamma modes available. - -This is a backlight lcd driver. These interfaces are an extension to the API -documented in Documentation/ABI/testing/sysfs-class-lcd and in -Documentation/ABI/stable/sysfs-class-backlight (under -/sys/class/backlight/<backlight>/). diff --git a/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx new file mode 100644 index 000000000000..45b1e605d355 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx @@ -0,0 +1,22 @@ +What: /sys/class/leds/<led>/hw_pattern +Date: September 2018 +KernelVersion: 4.20 +Description: + Specify a hardware pattern for the SC27XX LED. For the SC27XX + LED controller, it only supports 4 stages to make a single + hardware pattern, which is used to configure the rise time, + high time, fall time and low time for the breathing mode. + + For the breathing mode, the SC27XX LED only expects one brightness + for the high stage. To be compatible with the hardware pattern + format, we should set brightness as 0 for rise stage, fall + stage and low stage. + + Min stage duration: 125 ms + Max stage duration: 31875 ms + + Since the stage duration step is 125 ms, the duration should be + a multiplier of 125, like 125ms, 250ms, 375ms, 500ms ... 31875ms. + + Thus the format of the hardware pattern values should be: + "0 rise_duration brightness high_duration 0 fall_duration 0 low_duration". diff --git a/Documentation/ABI/testing/sysfs-class-led-trigger-pattern b/Documentation/ABI/testing/sysfs-class-led-trigger-pattern new file mode 100644 index 000000000000..fb3d1e03b881 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-led-trigger-pattern @@ -0,0 +1,82 @@ +What: /sys/class/leds/<led>/pattern +Date: September 2018 +KernelVersion: 4.20 +Description: + Specify a software pattern for the LED, that supports altering + the brightness for the specified duration with one software + timer. It can do gradual dimming and step change of brightness. + + The pattern is given by a series of tuples, of brightness and + duration (ms). The LED is expected to traverse the series and + each brightness value for the specified duration. Duration of + 0 means brightness should immediately change to new value, and + writing malformed pattern deactivates any active one. + + 1. For gradual dimming, the dimming interval now is set as 50 + milliseconds. So the tuple with duration less than dimming + interval (50ms) is treated as a step change of brightness, + i.e. the subsequent brightness will be applied without adding + intervening dimming intervals. + + The gradual dimming format of the software pattern values should be: + "brightness_1 duration_1 brightness_2 duration_2 brightness_3 + duration_3 ...". For example: + + echo 0 1000 255 2000 > pattern + + It will make the LED go gradually from zero-intensity to max (255) + intensity in 1000 milliseconds, then back to zero intensity in 2000 + milliseconds: + + LED brightness + ^ + 255-| / \ / \ / + | / \ / \ / + | / \ / \ / + | / \ / \ / + 0-| / \/ \/ + +---0----1----2----3----4----5----6------------> time (s) + + 2. To make the LED go instantly from one brigntess value to another, + we should use use zero-time lengths (the brightness must be same as + the previous tuple's). So the format should be: + "brightness_1 duration_1 brightness_1 0 brightness_2 duration_2 + brightness_2 0 ...". For example: + + echo 0 1000 0 0 255 2000 255 0 > pattern + + It will make the LED stay off for one second, then stay at max brightness + for two seconds: + + LED brightness + ^ + 255-| +---------+ +---------+ + | | | | | + | | | | | + | | | | | + 0-| -----+ +----+ +---- + +---0----1----2----3----4----5----6------------> time (s) + +What: /sys/class/leds/<led>/hw_pattern +Date: September 2018 +KernelVersion: 4.20 +Description: + Specify a hardware pattern for the LED, for LED hardware that + supports autonomously controlling brightness over time, according + to some preprogrammed hardware patterns. It deactivates any active + software pattern. + + Since different LED hardware can have different semantics of + hardware patterns, each driver is expected to provide its own + description for the hardware patterns in their ABI documentation + file. + +What: /sys/class/leds/<led>/repeat +Date: September 2018 +KernelVersion: 4.20 +Description: + Specify a pattern repeat number. -1 means repeat indefinitely, + other negative numbers and number 0 are invalid. + + This file will always return the originally written repeat + number. diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net index 2f1788111cd9..e2e0fe553ad8 100644 --- a/Documentation/ABI/testing/sysfs-class-net +++ b/Documentation/ABI/testing/sysfs-class-net @@ -117,7 +117,7 @@ Description: full: full duplex Note: This attribute is only valid for interfaces that implement - the ethtool get_settings method (mostly Ethernet). + the ethtool get_link_ksettings method (mostly Ethernet). What: /sys/class/net/<iface>/flags Date: April 2005 @@ -224,7 +224,7 @@ Description: an integer representing the link speed in Mbits/sec. Note: this attribute is only valid for interfaces that implement - the ethtool get_settings method (mostly Ethernet ). + the ethtool get_link_ksettings method (mostly Ethernet). What: /sys/class/net/<iface>/tx_queue_len Date: April 2005 diff --git a/Documentation/ABI/testing/sysfs-class-net-dsa b/Documentation/ABI/testing/sysfs-class-net-dsa new file mode 100644 index 000000000000..f240221e071e --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-net-dsa @@ -0,0 +1,7 @@ +What: /sys/class/net/<iface>/tagging +Date: August 2018 +KernelVersion: 4.20 +Contact: netdev@vger.kernel.org +Description: + String indicating the type of tagging protocol used by the + DSA slave network device. diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs index 94a24aedcdb2..3ac41774ad3c 100644 --- a/Documentation/ABI/testing/sysfs-fs-f2fs +++ b/Documentation/ABI/testing/sysfs-fs-f2fs @@ -121,7 +121,22 @@ What: /sys/fs/f2fs/<disk>/idle_interval Date: January 2016 Contact: "Jaegeuk Kim" <jaegeuk@kernel.org> Description: - Controls the idle timing. + Controls the idle timing for all paths other than + discard and gc path. + +What: /sys/fs/f2fs/<disk>/discard_idle_interval +Date: September 2018 +Contact: "Chao Yu" <yuchao0@huawei.com> +Contact: "Sahitya Tummala" <stummala@codeaurora.org> +Description: + Controls the idle timing for discard path. + +What: /sys/fs/f2fs/<disk>/gc_idle_interval +Date: September 2018 +Contact: "Chao Yu" <yuchao0@huawei.com> +Contact: "Sahitya Tummala" <stummala@codeaurora.org> +Description: + Controls the idle timing for gc path. What: /sys/fs/f2fs/<disk>/iostat_enable Date: August 2017 diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power index 2f813d644c69..18b7dc929234 100644 --- a/Documentation/ABI/testing/sysfs-power +++ b/Documentation/ABI/testing/sysfs-power @@ -99,7 +99,7 @@ Description: this file, the suspend image will be as small as possible. Reading from this file will display the current image size - limit, which is set to 500 MB by default. + limit, which is set to around 2/5 of available RAM by default. What: /sys/power/pm_trace Date: August 2006 diff --git a/Documentation/PCI/00-INDEX b/Documentation/PCI/00-INDEX deleted file mode 100644 index 206b1d5c1e71..000000000000 --- a/Documentation/PCI/00-INDEX +++ /dev/null @@ -1,26 +0,0 @@ -00-INDEX - - this file -acpi-info.txt - - info on how PCI host bridges are represented in ACPI -MSI-HOWTO.txt - - the Message Signaled Interrupts (MSI) Driver Guide HOWTO and FAQ. -PCIEBUS-HOWTO.txt - - a guide describing the PCI Express Port Bus driver -pci-error-recovery.txt - - info on PCI error recovery -pci-iov-howto.txt - - the PCI Express I/O Virtualization HOWTO -pci.txt - - info on the PCI subsystem for device driver authors -pcieaer-howto.txt - - the PCI Express Advanced Error Reporting Driver Guide HOWTO -endpoint/pci-endpoint.txt - - guide to add endpoint controller driver and endpoint function driver. -endpoint/pci-endpoint-cfs.txt - - guide to use configfs to configure the PCI endpoint function. -endpoint/pci-test-function.txt - - specification of *PCI test* function device. -endpoint/pci-test-howto.txt - - userguide for PCI endpoint test function. -endpoint/function/binding/ - - binding documentation for PCI endpoint function diff --git a/Documentation/PCI/endpoint/pci-test-howto.txt b/Documentation/PCI/endpoint/pci-test-howto.txt index e40cf0fb58d7..040479f437a5 100644 --- a/Documentation/PCI/endpoint/pci-test-howto.txt +++ b/Documentation/PCI/endpoint/pci-test-howto.txt @@ -99,17 +99,20 @@ Note that the devices listed here correspond to the value populated in 1.4 above 2.2 Using Endpoint Test function Device pcitest.sh added in tools/pci/ can be used to run all the default PCI endpoint -tests. Before pcitest.sh can be used pcitest.c should be compiled using the -following commands. +tests. To compile this tool the following commands should be used: - cd <kernel-dir> - make headers_install ARCH=arm - arm-linux-gnueabihf-gcc -Iusr/include tools/pci/pcitest.c -o pcitest - cp pcitest <rootfs>/usr/sbin/ - cp tools/pci/pcitest.sh <rootfs> + # cd <kernel-dir> + # make -C tools/pci + +or if you desire to compile and install in your system: + + # cd <kernel-dir> + # make -C tools/pci install + +The tool and script will be located in <rootfs>/usr/bin/ 2.2.1 pcitest.sh Output - # ./pcitest.sh + # pcitest.sh BAR tests BAR0: OKAY diff --git a/Documentation/PCI/pci-error-recovery.txt b/Documentation/PCI/pci-error-recovery.txt index 688b69121e82..0b6bb3ef449e 100644 --- a/Documentation/PCI/pci-error-recovery.txt +++ b/Documentation/PCI/pci-error-recovery.txt @@ -110,7 +110,7 @@ The actual steps taken by a platform to recover from a PCI error event will be platform-dependent, but will follow the general sequence described below. -STEP 0: Error Event: ERR_NONFATAL +STEP 0: Error Event ------------------- A PCI bus error is detected by the PCI hardware. On powerpc, the slot is isolated, in that all I/O is blocked: all reads return 0xffffffff, @@ -228,7 +228,13 @@ proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations). If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform proceeds to STEP 4 (Slot Reset) -STEP 3: Slot Reset +STEP 3: Link Reset +------------------ +The platform resets the link. This is a PCI-Express specific step +and is done whenever a fatal error has been detected that can be +"solved" by resetting the link. + +STEP 4: Slot Reset ------------------ In response to a return value of PCI_ERS_RESULT_NEED_RESET, the @@ -314,7 +320,7 @@ Failure). >>> However, it probably should. -STEP 4: Resume Operations +STEP 5: Resume Operations ------------------------- The platform will call the resume() callback on all affected device drivers if all drivers on the segment have returned @@ -326,7 +332,7 @@ a result code. At this point, if a new error happens, the platform will restart a new error recovery sequence. -STEP 5: Permanent Failure +STEP 6: Permanent Failure ------------------------- A "permanent failure" has occurred, and the platform cannot recover the device. The platform will call error_detected() with a @@ -349,27 +355,6 @@ errors. See the discussion in powerpc/eeh-pci-error-recovery.txt for additional detail on real-life experience of the causes of software errors. -STEP 0: Error Event: ERR_FATAL -------------------- -PCI bus error is detected by the PCI hardware. On powerpc, the slot is -isolated, in that all I/O is blocked: all reads return 0xffffffff, all -writes are ignored. - -STEP 1: Remove devices --------------------- -Platform removes the devices depending on the error agent, it could be -this port for all subordinates or upstream component (likely downstream -port) - -STEP 2: Reset link --------------------- -The platform resets the link. This is a PCI-Express specific step and is -done whenever a fatal error has been detected that can be "solved" by -resetting the link. - -STEP 3: Re-enumerate the devices --------------------- -Initiates the re-enumeration. Conclusion; General Remarks --------------------------- diff --git a/Documentation/RCU/00-INDEX b/Documentation/RCU/00-INDEX deleted file mode 100644 index f46980c060aa..000000000000 --- a/Documentation/RCU/00-INDEX +++ /dev/null @@ -1,34 +0,0 @@ -00-INDEX - - This file -arrayRCU.txt - - Using RCU to Protect Read-Mostly Arrays -checklist.txt - - Review Checklist for RCU Patches -listRCU.txt - - Using RCU to Protect Read-Mostly Linked Lists -lockdep.txt - - RCU and lockdep checking -lockdep-splat.txt - - RCU Lockdep splats explained. -NMI-RCU.txt - - Using RCU to Protect Dynamic NMI Handlers -rcu_dereference.txt - - Proper care and feeding of return values from rcu_dereference() -rcubarrier.txt - - RCU and Unloadable Modules -rculist_nulls.txt - - RCU list primitives for use with SLAB_TYPESAFE_BY_RCU -rcuref.txt - - Reference-count design for elements of lists/arrays protected by RCU -rcu.txt - - RCU Concepts -RTFP.txt - - List of RCU papers (bibliography) going back to 1980. -stallwarn.txt - - RCU CPU stall warnings (module parameter rcu_cpu_stall_suppress) -torture.txt - - RCU Torture Test Operation (CONFIG_RCU_TORTURE_TEST) -UP.txt - - RCU on Uniprocessor Systems -whatisRCU.txt - - What is RCU? diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.html b/Documentation/RCU/Design/Data-Structures/Data-Structures.html index f5120a00f511..1d2051c0c3fc 100644 --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.html +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.html @@ -1227,9 +1227,11 @@ to overflow the counter, this approach corrects the CPU enters the idle loop from process context. </p><p>The <tt>->dynticks</tt> field counts the corresponding -CPU's transitions to and from dyntick-idle mode, so that this counter -has an even value when the CPU is in dyntick-idle mode and an odd -value otherwise. +CPU's transitions to and from either dyntick-idle or user mode, so +that this counter has an even value when the CPU is in dyntick-idle +mode or user mode and an odd value otherwise. The transitions to/from +user mode need to be counted for user mode adaptive-ticks support +(see timers/NO_HZ.txt). </p><p>The <tt>->rcu_need_heavy_qs</tt> field is used to record the fact that the RCU core code would really like to @@ -1372,8 +1374,7 @@ that is, if the CPU is currently idle. Accessor Functions</a></h3> <p>The following listing shows the -<tt>rcu_get_root()</tt>, <tt>rcu_for_each_node_breadth_first</tt>, -<tt>rcu_for_each_nonleaf_node_breadth_first()</tt>, and +<tt>rcu_get_root()</tt>, <tt>rcu_for_each_node_breadth_first</tt> and <tt>rcu_for_each_leaf_node()</tt> function and macros: <pre> @@ -1386,13 +1387,9 @@ Accessor Functions</a></h3> 7 for ((rnp) = &(rsp)->node[0]; \ 8 (rnp) < &(rsp)->node[NUM_RCU_NODES]; (rnp)++) 9 - 10 #define rcu_for_each_nonleaf_node_breadth_first(rsp, rnp) \ - 11 for ((rnp) = &(rsp)->node[0]; \ - 12 (rnp) < (rsp)->level[NUM_RCU_LVLS - 1]; (rnp)++) - 13 - 14 #define rcu_for_each_leaf_node(rsp, rnp) \ - 15 for ((rnp) = (rsp)->level[NUM_RCU_LVLS - 1]; \ - 16 (rnp) < &(rsp)->node[NUM_RCU_NODES]; (rnp)++) + 10 #define rcu_for_each_leaf_node(rsp, rnp) \ + 11 for ((rnp) = (rsp)->level[NUM_RCU_LVLS - 1]; \ + 12 (rnp) < &(rsp)->node[NUM_RCU_NODES]; (rnp)++) </pre> <p>The <tt>rcu_get_root()</tt> simply returns a pointer to the @@ -1405,10 +1402,7 @@ macro takes advantage of the layout of the <tt>rcu_node</tt> structures in the <tt>rcu_state</tt> structure's <tt>->node[]</tt> array, performing a breadth-first traversal by simply traversing the array in order. -The <tt>rcu_for_each_nonleaf_node_breadth_first()</tt> macro operates -similarly, but traverses only the first part of the array, thus excluding -the leaf <tt>rcu_node</tt> structures. -Finally, the <tt>rcu_for_each_leaf_node()</tt> macro traverses only +Similarly, the <tt>rcu_for_each_leaf_node()</tt> macro traverses only the last part of the array, thus traversing only the leaf <tt>rcu_node</tt> structures. @@ -1416,15 +1410,14 @@ the last part of the array, thus traversing only the leaf <tr><th> </th></tr> <tr><th align="left">Quick Quiz:</th></tr> <tr><td> - What do <tt>rcu_for_each_nonleaf_node_breadth_first()</tt> and + What does <tt>rcu_for_each_leaf_node()</tt> do if the <tt>rcu_node</tt> tree contains only a single node? </td></tr> <tr><th align="left">Answer:</th></tr> <tr><td bgcolor="#ffffff"><font color="ffffff"> In the single-node case, - <tt>rcu_for_each_nonleaf_node_breadth_first()</tt> is a no-op - and <tt>rcu_for_each_leaf_node()</tt> traverses the single node. + <tt>rcu_for_each_leaf_node()</tt> traverses the single node. </font></td></tr> <tr><td> </td></tr> </table> diff --git a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html index 7394f034be65..e62c7c34a369 100644 --- a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html +++ b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html @@ -12,10 +12,9 @@ high efficiency and minimal disturbance, expedited grace periods accept lower efficiency and significant disturbance to attain shorter latencies. <p> -There are three flavors of RCU (RCU-bh, RCU-preempt, and RCU-sched), -but only two flavors of expedited grace periods because the RCU-bh -expedited grace period maps onto the RCU-sched expedited grace period. -Each of the remaining two implementations is covered in its own section. +There are two flavors of RCU (RCU-preempt and RCU-sched), with an earlier +third RCU-bh flavor having been implemented in terms of the other two. +Each of the two implementations is covered in its own section. <ol> <li> <a href="#Expedited Grace Period Design"> @@ -158,7 +157,7 @@ whether or not the current CPU is in an RCU read-side critical section. The best that <tt>sync_sched_exp_handler()</tt> can do is to check for idle, on the off-chance that the CPU went idle while the IPI was in flight. -If the CPU is idle, then tt>sync_sched_exp_handler()</tt> reports +If the CPU is idle, then <tt>sync_sched_exp_handler()</tt> reports the quiescent state. <p> diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index 49690228b1c6..43c4e2f05f40 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -1306,8 +1306,6 @@ doing so would degrade real-time response. <p> This non-requirement appeared with preemptible RCU. -If you need a grace period that waits on non-preemptible code regions, use -<a href="#Sched Flavor">RCU-sched</a>. <h2><a name="Parallelism Facts of Life">Parallelism Facts of Life</a></h2> @@ -2165,14 +2163,9 @@ however, this is not a panacea because there would be severe restrictions on what operations those callbacks could invoke. <p> -Perhaps surprisingly, <tt>synchronize_rcu()</tt>, -<a href="#Bottom-Half Flavor"><tt>synchronize_rcu_bh()</tt></a> -(<a href="#Bottom-Half Flavor">discussed below</a>), -<a href="#Sched Flavor"><tt>synchronize_sched()</tt></a>, +Perhaps surprisingly, <tt>synchronize_rcu()</tt> and <tt>synchronize_rcu_expedited()</tt>, -<tt>synchronize_rcu_bh_expedited()</tt>, and -<tt>synchronize_sched_expedited()</tt> -will all operate normally +will operate normally during very early boot, the reason being that there is only one CPU and preemption is disabled. This means that the call <tt>synchronize_rcu()</tt> (or friends) @@ -2269,12 +2262,23 @@ Thankfully, RCU update-side primitives, including The name notwithstanding, some Linux-kernel architectures can have nested NMIs, which RCU must handle correctly. Andy Lutomirski -<a href="https://lkml.kernel.org/g/CALCETrXLq1y7e_dKFPgou-FKHB6Pu-r8+t-6Ds+8=va7anBWDA@mail.gmail.com">surprised me</a> +<a href="https://lkml.kernel.org/r/CALCETrXLq1y7e_dKFPgou-FKHB6Pu-r8+t-6Ds+8=va7anBWDA@mail.gmail.com">surprised me</a> with this requirement; he also kindly surprised me with -<a href="https://lkml.kernel.org/g/CALCETrXSY9JpW3uE6H8WYk81sg56qasA2aqmjMPsq5dOtzso=g@mail.gmail.com">an algorithm</a> +<a href="https://lkml.kernel.org/r/CALCETrXSY9JpW3uE6H8WYk81sg56qasA2aqmjMPsq5dOtzso=g@mail.gmail.com">an algorithm</a> that meets this requirement. +<p> +Furthermore, NMI handlers can be interrupted by what appear to RCU +to be normal interrupts. +One way that this can happen is for code that directly invokes +<tt>rcu_irq_enter()</tt> and </tt>rcu_irq_exit()</tt> to be called +from an NMI handler. +This astonishing fact of life prompted the current code structure, +which has <tt>rcu_irq_enter()</tt> invoking <tt>rcu_nmi_enter()</tt> +and <tt>rcu_irq_exit()</tt> invoking <tt>rcu_nmi_exit()</tt>. +And yes, I also learned of this requirement the hard way. + <h3><a name="Loadable Modules">Loadable Modules</a></h3> <p> @@ -2394,30 +2398,9 @@ when invoked from a CPU-hotplug notifier. <p> RCU depends on the scheduler, and the scheduler uses RCU to protect some of its data structures. -This means the scheduler is forbidden from acquiring -the runqueue locks and the priority-inheritance locks -in the middle of an outermost RCU read-side critical section unless either -(1) it releases them before exiting that same -RCU read-side critical section, or -(2) interrupts are disabled across -that entire RCU read-side critical section. -This same prohibition also applies (recursively!) to any lock that is acquired -while holding any lock to which this prohibition applies. -Adhering to this rule prevents preemptible RCU from invoking -<tt>rcu_read_unlock_special()</tt> while either runqueue or -priority-inheritance locks are held, thus avoiding deadlock. - -<p> -Prior to v4.4, it was only necessary to disable preemption across -RCU read-side critical sections that acquired scheduler locks. -In v4.4, expedited grace periods started using IPIs, and these -IPIs could force a <tt>rcu_read_unlock()</tt> to take the slowpath. -Therefore, this expedited-grace-period change required disabling of -interrupts, not just preemption. - -<p> -For RCU's part, the preemptible-RCU <tt>rcu_read_unlock()</tt> -implementation must be written carefully to avoid similar deadlocks. +The preemptible-RCU <tt>rcu_read_unlock()</tt> +implementation must therefore be written carefully to avoid deadlocks +involving the scheduler's runqueue and priority-inheritance locks. In particular, <tt>rcu_read_unlock()</tt> must tolerate an interrupt where the interrupt handler invokes both <tt>rcu_read_lock()</tt> and <tt>rcu_read_unlock()</tt>. @@ -2426,7 +2409,7 @@ negative nesting levels to avoid destructive recursion via interrupt handler's use of RCU. <p> -This pair of mutual scheduler-RCU requirements came as a +This scheduler-RCU requirement came as a <a href="https://lwn.net/Articles/453002/">complete surprise</a>. <p> @@ -2437,9 +2420,28 @@ when running context-switch-heavy workloads when built with <tt>CONFIG_NO_HZ_FULL=y</tt> <a href="http://www.rdrop.com/users/paulmck/scalability/paper/BareMetal.2015.01.15b.pdf">did come as a surprise [PDF]</a>. RCU has made good progress towards meeting this requirement, even -for context-switch-have <tt>CONFIG_NO_HZ_FULL=y</tt> workloads, +for context-switch-heavy <tt>CONFIG_NO_HZ_FULL=y</tt> workloads, but there is room for further improvement. +<p> +In the past, it was forbidden to disable interrupts across an +<tt>rcu_read_unlock()</tt> unless that interrupt-disabled region +of code also included the matching <tt>rcu_read_lock()</tt>. +Violating this restriction could result in deadlocks involving the +scheduler's runqueue and priority-inheritance spinlocks. +This restriction was lifted when interrupt-disabled calls to +<tt>rcu_read_unlock()</tt> started deferring the reporting of +the resulting RCU-preempt quiescent state until the end of that +interrupts-disabled region. +This deferred reporting means that the scheduler's runqueue and +priority-inheritance locks cannot be held while reporting an RCU-preempt +quiescent state, which lifts the earlier restriction, at least from +a deadlock perspective. +Unfortunately, real-time systems using RCU priority boosting may +need this restriction to remain in effect because deferred +quiescent-state reporting also defers deboosting, which in turn +degrades real-time latencies. + <h3><a name="Tracing and RCU">Tracing and RCU</a></h3> <p> @@ -2850,15 +2852,22 @@ The other four flavors are listed below, with requirements for each described in a separate section. <ol> -<li> <a href="#Bottom-Half Flavor">Bottom-Half Flavor</a> -<li> <a href="#Sched Flavor">Sched Flavor</a> +<li> <a href="#Bottom-Half Flavor">Bottom-Half Flavor (Historical)</a> +<li> <a href="#Sched Flavor">Sched Flavor (Historical)</a> <li> <a href="#Sleepable RCU">Sleepable RCU</a> <li> <a href="#Tasks RCU">Tasks RCU</a> -<li> <a href="#Waiting for Multiple Grace Periods"> - Waiting for Multiple Grace Periods</a> </ol> -<h3><a name="Bottom-Half Flavor">Bottom-Half Flavor</a></h3> +<h3><a name="Bottom-Half Flavor">Bottom-Half Flavor (Historical)</a></h3> + +<p> +The RCU-bh flavor of RCU has since been expressed in terms of +the other RCU flavors as part of a consolidation of the three +flavors into a single flavor. +The read-side API remains, and continues to disable softirq and to +be accounted for by lockdep. +Much of the material in this section is therefore strictly historical +in nature. <p> The softirq-disable (AKA “bottom-half”, @@ -2918,8 +2927,20 @@ includes <tt>call_rcu_bh()</tt>, <tt>rcu_barrier_bh()</tt>, and <tt>rcu_read_lock_bh_held()</tt>. +However, the update-side APIs are now simple wrappers for other RCU +flavors, namely RCU-sched in CONFIG_PREEMPT=n kernels and RCU-preempt +otherwise. + +<h3><a name="Sched Flavor">Sched Flavor (Historical)</a></h3> -<h3><a name="Sched Flavor">Sched Flavor</a></h3> +<p> +The RCU-sched flavor of RCU has since been expressed in terms of +the other RCU flavors as part of a consolidation of the three +flavors into a single flavor. +The read-side API remains, and continues to disable preemption and to +be accounted for by lockdep. +Much of the material in this section is therefore strictly historical +in nature. <p> Before preemptible RCU, waiting for an RCU grace period had the @@ -3139,94 +3160,14 @@ The tasks-RCU API is quite compact, consisting only of <tt>call_rcu_tasks()</tt>, <tt>synchronize_rcu_tasks()</tt>, and <tt>rcu_barrier_tasks()</tt>. - -<h3><a name="Waiting for Multiple Grace Periods"> -Waiting for Multiple Grace Periods</a></h3> - -<p> -Perhaps you have an RCU protected data structure that is accessed from -RCU read-side critical sections, from softirq handlers, and from -hardware interrupt handlers. -That is three flavors of RCU, the normal flavor, the bottom-half flavor, -and the sched flavor. -How to wait for a compound grace period? - -<p> -The best approach is usually to “just say no!” and -insert <tt>rcu_read_lock()</tt> and <tt>rcu_read_unlock()</tt> -around each RCU read-side critical section, regardless of what -environment it happens to be in. -But suppose that some of the RCU read-side critical sections are -on extremely hot code paths, and that use of <tt>CONFIG_PREEMPT=n</tt> -is not a viable option, so that <tt>rcu_read_lock()</tt> and -<tt>rcu_read_unlock()</tt> are not free. -What then? - -<p> -You <i>could</i> wait on all three grace periods in succession, as follows: - -<blockquote> -<pre> - 1 synchronize_rcu(); - 2 synchronize_rcu_bh(); - 3 synchronize_sched(); -</pre> -</blockquote> - -<p> -This works, but triples the update-side latency penalty. -In cases where this is not acceptable, <tt>synchronize_rcu_mult()</tt> -may be used to wait on all three flavors of grace period concurrently: - -<blockquote> -<pre> - 1 synchronize_rcu_mult(call_rcu, call_rcu_bh, call_rcu_sched); -</pre> -</blockquote> - -<p> -But what if it is necessary to also wait on SRCU? -This can be done as follows: - -<blockquote> -<pre> - 1 static void call_my_srcu(struct rcu_head *head, - 2 void (*func)(struct rcu_head *head)) - 3 { - 4 call_srcu(&my_srcu, head, func); - 5 } - 6 - 7 synchronize_rcu_mult(call_rcu, call_rcu_bh, call_rcu_sched, call_my_srcu); -</pre> -</blockquote> - -<p> -If you needed to wait on multiple different flavors of SRCU -(but why???), you would need to create a wrapper function resembling -<tt>call_my_srcu()</tt> for each SRCU flavor. - -<table> -<tr><th> </th></tr> -<tr><th align="left">Quick Quiz:</th></tr> -<tr><td> - But what if I need to wait for multiple RCU flavors, but I also need - the grace periods to be expedited? -</td></tr> -<tr><th align="left">Answer:</th></tr> -<tr><td bgcolor="#ffffff"><font color="ffffff"> - If you are using expedited grace periods, there should be less penalty - for waiting on them in succession. - But if that is nevertheless a problem, you can use workqueues - or multiple kthreads to wait on the various expedited grace - periods concurrently. -</font></td></tr> -<tr><td> </td></tr> -</table> - -<p> -Again, it is usually better to adjust the RCU read-side critical sections -to use a single flavor of RCU, but when this is not feasible, you can use -<tt>synchronize_rcu_mult()</tt>. +In <tt>CONFIG_PREEMPT=n</tt> kernels, trampolines cannot be preempted, +so these APIs map to +<tt>call_rcu()</tt>, +<tt>synchronize_rcu()</tt>, and +<tt>rcu_barrier()</tt>, respectively. +In <tt>CONFIG_PREEMPT=y</tt> kernels, trampolines can be preempted, +and these three APIs are therefore implemented by separate functions +that check for voluntary context switches. <h2><a name="Possible Future Changes">Possible Future Changes</a></h2> @@ -3238,12 +3179,6 @@ grace-period state machine so as to avoid the need for the additional latency. <p> -Expedited grace periods scan the CPUs, so their latency and overhead -increases with increasing numbers of CPUs. -If this becomes a serious problem on large systems, it will be necessary -to do some redesign to avoid this scalability problem. - -<p> RCU disables CPU hotplug in a few places, perhaps most notably in the <tt>rcu_barrier()</tt> operations. If there is a strong reason to use <tt>rcu_barrier()</tt> in CPU-hotplug @@ -3288,11 +3223,6 @@ require extremely good demonstration of need and full exploration of alternatives. <p> -There is an embarrassingly large number of flavors of RCU, and this -number has been increasing over time. -Perhaps it will be possible to combine some at some future date. - -<p> RCU's various kthreads are reasonably recent additions. It is quite likely that adjustments will be required to more gracefully handle extreme loads. diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt index 7d4ae110c2c9..721b3e426515 100644 --- a/Documentation/RCU/rcu.txt +++ b/Documentation/RCU/rcu.txt @@ -87,7 +87,3 @@ o Where can I find more information on RCU? See the RTFP.txt file in this directory. Or point your browser at http://www.rdrop.com/users/paulmck/RCU/. - -o What are all these files in this directory? - - See 00-INDEX for the list. diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index f99cf11b314b..491043fd976f 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -16,12 +16,9 @@ o A CPU looping in an RCU read-side critical section. o A CPU looping with interrupts disabled. -o A CPU looping with preemption disabled. This condition can - result in RCU-sched stalls and, if ksoftirqd is in use, RCU-bh - stalls. +o A CPU looping with preemption disabled. -o A CPU looping with bottom halves disabled. This condition can - result in RCU-sched and RCU-bh stalls. +o A CPU looping with bottom halves disabled. o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel without invoking schedule(). If the looping in the kernel is @@ -87,9 +84,9 @@ o A hardware failure. This is quite unlikely, but has occurred This resulted in a series of RCU CPU stall warnings, eventually leading the realization that the CPU had failed. -The RCU, RCU-sched, RCU-bh, and RCU-tasks implementations have CPU stall -warning. Note that SRCU does -not- have CPU stall warnings. Please note -that RCU only detects CPU stalls when there is a grace period in progress. +The RCU, RCU-sched, and RCU-tasks implementations have CPU stall warning. +Note that SRCU does -not- have CPU stall warnings. Please note that +RCU only detects CPU stalls when there is a grace period in progress. No grace period, no CPU stall warnings. To diagnose the cause of the stall, inspect the stack traces. diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index c2a7facf7ff9..86d82f7f3500 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -934,7 +934,8 @@ c. Do you need to treat NMI handlers, hardirq handlers, d. Do you need RCU grace periods to complete even in the face of softirq monopolization of one or more of the CPUs? For example, is your code subject to network-based denial-of-service - attacks? If so, you need RCU-bh. + attacks? If so, you should disable softirq across your readers, + for example, by using rcu_read_lock_bh(). e. Is your workload too update-intensive for normal use of RCU, but inappropriate for other synchronization mechanisms? diff --git a/Documentation/admin-guide/LSM/Yama.rst b/Documentation/admin-guide/LSM/Yama.rst index 13468ea696b7..d0a060de3973 100644 --- a/Documentation/admin-guide/LSM/Yama.rst +++ b/Documentation/admin-guide/LSM/Yama.rst @@ -64,8 +64,8 @@ The sysctl settings (writable only with ``CAP_SYS_PTRACE``) are: Using ``PTRACE_TRACEME`` is unchanged. 2 - admin-only attach: - only processes with ``CAP_SYS_PTRACE`` may use ptrace - with ``PTRACE_ATTACH``, or through children calling ``PTRACE_TRACEME``. + only processes with ``CAP_SYS_PTRACE`` may use ptrace, either with + ``PTRACE_ATTACH`` or through children calling ``PTRACE_TRACEME``. 3 - no attach: no processes may use ptrace with ``PTRACE_ATTACH`` nor via diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst index 15ea785b2dfa..0797eec76be1 100644 --- a/Documentation/admin-guide/README.rst +++ b/Documentation/admin-guide/README.rst @@ -51,8 +51,7 @@ Documentation - There are various README files in the Documentation/ subdirectory: these typically contain kernel-specific installation notes for some - drivers for example. See Documentation/00-INDEX for a list of what - is contained in each file. Please read the + drivers for example. Please read the :ref:`Documentation/process/changes.rst <changes>` file, as it contains information about the problems, which may result by upgrading your kernel. diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 184193bcb262..caf36105a1c7 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1857,8 +1857,10 @@ following two functions. wbc_init_bio(@wbc, @bio) Should be called for each bio carrying writeback data and - associates the bio with the inode's owner cgroup. Can be - called anytime between bio allocation and submission. + associates the bio with the inode's owner cgroup and the + corresponding request queue. This must be called after + a queue (device) has been associated with the bio and + before submission. wbc_account_io(@wbc, @page, @bytes) Should be called for each data segment being written out. @@ -1877,7 +1879,7 @@ the configuration, the bio may be executed at a lower priority and if the writeback session is holding shared resources, e.g. a journal entry, may lead to priority inversion. There is no one easy solution for the problem. Filesystems can try to work around specific problem -cases by skipping wbc_init_bio() or using bio_associate_blkcg() +cases by skipping wbc_init_bio() or using bio_associate_create_blkg() directly. diff --git a/Documentation/admin-guide/ext4.rst b/Documentation/admin-guide/ext4.rst new file mode 100644 index 000000000000..e506d3dae510 --- /dev/null +++ b/Documentation/admin-guide/ext4.rst @@ -0,0 +1,574 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================== +ext4 General Information +======================== + +Ext4 is an advanced level of the ext3 filesystem which incorporates +scalability and reliability enhancements for supporting large filesystems +(64 bit) in keeping with increasing disk capacities and state-of-the-art +feature requirements. + +Mailing list: linux-ext4@vger.kernel.org +Web site: http://ext4.wiki.kernel.org + + +Quick usage instructions +======================== + +Note: More extensive information for getting started with ext4 can be +found at the ext4 wiki site at the URL: +http://ext4.wiki.kernel.org/index.php/Ext4_Howto + + - The latest version of e2fsprogs can be found at: + + https://www.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/ + + or + + http://sourceforge.net/project/showfiles.php?group_id=2406 + + or grab the latest git repository from: + + https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git + + - Create a new filesystem using the ext4 filesystem type: + + # mke2fs -t ext4 /dev/hda1 + + Or to configure an existing ext3 filesystem to support extents: + + # tune2fs -O extents /dev/hda1 + + If the filesystem was created with 128 byte inodes, it can be + converted to use 256 byte for greater efficiency via: + + # tune2fs -I 256 /dev/hda1 + + - Mounting: + + # mount -t ext4 /dev/hda1 /wherever + + - When comparing performance with other filesystems, it's always + important to try multiple workloads; very often a subtle change in a + workload parameter can completely change the ranking of which + filesystems do well compared to others. When comparing versus ext3, + note that ext4 enables write barriers by default, while ext3 does + not enable write barriers by default. So it is useful to use + explicitly specify whether barriers are enabled or not when via the + '-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems + for a fair comparison. When tuning ext3 for best benchmark numbers, + it is often worthwhile to try changing the data journaling mode; '-o + data=writeback' can be faster for some workloads. (Note however that + running mounted with data=writeback can potentially leave stale data + exposed in recently written files in case of an unclean shutdown, + which could be a security exposure in some situations.) Configuring + the filesystem with a large journal can also be helpful for + metadata-intensive workloads. + +Features +======== + +Currently Available +------------------- + +* ability to use filesystems > 16TB (e2fsprogs support not available yet) +* extent format reduces metadata overhead (RAM, IO for access, transactions) +* extent format more robust in face of on-disk corruption due to magics, +* internal redundancy in tree +* improved file allocation (multi-block alloc) +* lift 32000 subdirectory limit imposed by i_links_count[1] +* nsec timestamps for mtime, atime, ctime, create time +* inode version field on disk (NFSv4, Lustre) +* reduced e2fsck time via uninit_bg feature +* journal checksumming for robustness, performance +* persistent file preallocation (e.g for streaming media, databases) +* ability to pack bitmaps and inode tables into larger virtual groups via the + flex_bg feature +* large file support +* inode allocation using large virtual block groups via flex_bg +* delayed allocation +* large block (up to pagesize) support +* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force + the ordering) + +[1] Filesystems with a block size of 1k may see a limit imposed by the +directory hash tree having a maximum depth of two. + +Options +======= + +When mounting an ext4 filesystem, the following option are accepted: +(*) == default + + ro + Mount filesystem read only. Note that ext4 will replay the journal (and + thus write to the partition) even when mounted "read only". The mount + options "ro,noload" can be used to prevent writes to the filesystem. + + journal_checksum + Enable checksumming of the journal transactions. This will allow the + recovery code in e2fsck and the kernel to detect corruption in the + kernel. It is a compatible change and will be ignored by older + kernels. + + journal_async_commit + Commit block can be written to disk without waiting for descriptor + blocks. If enabled older kernels cannot mount the device. This will + enable 'journal_checksum' internally. + + journal_path=path, journal_dev=devnum + When the external journal device's major/minor numbers have changed, + these options allow the user to specify the new journal location. The + journal device is identified through either its new major/minor numbers + encoded in devnum, or via a path to the device. + + norecovery, noload + Don't load the journal on mounting. Note that if the filesystem was + not unmounted cleanly, skipping the journal replay will lead to the + filesystem containing inconsistencies that can lead to any number of + problems. + + data=journal + All data are committed into the journal prior to being written into the + main file system. Enabling this mode will disable delayed allocation + and O_DIRECT support. + + data=ordered (*) + All data are forced directly out to the main file system prior to its + metadata being committed to the journal. + + data=writeback + Data ordering is not preserved, data may be written into the main file + system after its metadata has been committed to the journal. + + commit=nrsec (*) + Ext4 can be told to sync all its data and metadata every 'nrsec' + seconds. The default value is 5 seconds. This means that if you lose + your power, you will lose as much as the latest 5 seconds of work (your + filesystem will not be damaged though, thanks to the journaling). This + default value (or any low value) will hurt performance, but it's good + for data-safety. Setting it to 0 will have the same effect as leaving + it at the default (5 seconds). Setting it to very large values will + improve performance. + + barrier=<0|1(*)>, barrier(*), nobarrier + This enables/disables the use of write barriers in the jbd code. + barrier=0 disables, barrier=1 enables. This also requires an IO stack + which can support barriers, and if jbd gets an error on a barrier + write, it will disable again with a warning. Write barriers enforce + proper on-disk ordering of journal commits, making volatile disk write + caches safe to use, at some performance penalty. If your disks are + battery-backed in one way or another, disabling barriers may safely + improve performance. The mount options "barrier" and "nobarrier" can + also be used to enable or disable barriers, for consistency with other + ext4 mount options. + + inode_readahead_blks=n + This tuning parameter controls the maximum number of inode table blocks + that ext4's inode table readahead algorithm will pre-read into the + buffer cache. The default value is 32 blocks. + + nouser_xattr + Disables Extended User Attributes. See the attr(5) manual page for + more information about extended attributes. + + noacl + This option disables POSIX Access Control List support. If ACL support + is enabled in the kernel configuration (CONFIG_EXT4_FS_POSIX_ACL), ACL + is enabled by default on mount. See the acl(5) manual page for more + information about acl. + + bsddf (*) + Make 'df' act like BSD. + + minixdf + Make 'df' act like Minix. + + debug + Extra debugging information is sent to syslog. + + abort + Simulate the effects of calling ext4_abort() for debugging purposes. + This is normally used while remounting a filesystem which is already + mounted. + + errors=remount-ro + Remount the filesystem read-only on an error. + + errors=continue + Keep going on a filesystem error. + + errors=panic + Panic and halt the machine if an error occurs. (These mount options + override the errors behavior specified in the superblock, which can be + configured using tune2fs) + + data_err=ignore(*) + Just print an error message if an error occurs in a file data buffer in + ordered mode. + data_err=abort + Abort the journal if an error occurs in a file data buffer in ordered + mode. + + grpid | bsdgroups + New objects have the group ID of their parent. + + nogrpid (*) | sysvgroups + New objects have the group ID of their creator. + + resgid=n + The group ID which may use the reserved blocks. + + resuid=n + The user ID which may use the reserved blocks. + + sb= + Use alternate superblock at this location. + + quota, noquota, grpquota, usrquota + These options are ignored by the filesystem. They are used only by + quota tools to recognize volumes where quota should be turned on. See + documentation in the quota-tools package for more details + (http://sourceforge.net/projects/linuxquota). + + jqfmt=<quota type>, usrjquota=<file>, grpjquota=<file> + These options tell filesystem details about quota so that quota + information can be properly updated during journal replay. They replace + the above quota options. See documentation in the quota-tools package + for more details (http://sourceforge.net/projects/linuxquota). + + stripe=n + Number of filesystem blocks that mballoc will try to use for allocation + size and alignment. For RAID5/6 systems this should be the number of + data disks * RAID chunk size in file system blocks. + + delalloc (*) + Defer block allocation until just before ext4 writes out the block(s) + in question. This allows ext4 to better allocation decisions more + efficiently. + + nodelalloc + Disable delayed allocation. Blocks are allocated when the data is + copied from userspace to the page cache, either via the write(2) system + call or when an mmap'ed page which was previously unallocated is + written for the first time. + + max_batch_time=usec + Maximum amount of time ext4 should wait for additional filesystem + operations to be batch together with a synchronous write operation. + Since a synchronous write operation is going to force a commit and then + a wait for the I/O complete, it doesn't cost much, and can be a huge + throughput win, we wait for a small amount of time to see if any other + transactions can piggyback on the synchronous write. The algorithm + used is designed to automatically tune for the speed of the disk, by + measuring the amount of time (on average) that it takes to finish + committing a transaction. Call this time the "commit time". If the + time that the transaction has been running is less than the commit + time, ext4 will try sleeping for the commit time to see if other + operations will join the transaction. The commit time is capped by + the max_batch_time, which defaults to 15000us (15ms). This + optimization can be turned off entirely by setting max_batch_time to 0. + + min_batch_time=usec + This parameter sets the commit time (as described above) to be at least + min_batch_time. It defaults to zero microseconds. Increasing this + parameter may improve the throughput of multi-threaded, synchronous + workloads on very fast disks, at the cost of increasing latency. + + journal_ioprio=prio + The I/O priority (from 0 to 7, where 0 is the highest priority) which + should be used for I/O operations submitted by kjournald2 during a + commit operation. This defaults to 3, which is a slightly higher + priority than the default I/O priority. + + auto_da_alloc(*), noauto_da_alloc + Many broken applications don't use fsync() when replacing existing + files via patterns such as fd = open("foo.new")/write(fd,..)/close(fd)/ + rename("foo.new", "foo"), or worse yet, fd = open("foo", + O_TRUNC)/write(fd,..)/close(fd). If auto_da_alloc is enabled, ext4 + will detect the replace-via-rename and replace-via-truncate patterns + and force that any delayed allocation blocks are allocated such that at + the next journal commit, in the default data=ordered mode, the data + blocks of the new file are forced to disk before the rename() operation + is committed. This provides roughly the same level of guarantees as + ext3, and avoids the "zero-length" problem that can happen when a + system crashes before the delayed allocation blocks are forced to disk. + + noinit_itable + Do not initialize any uninitialized inode table blocks in the + background. This feature may be used by installation CD's so that the + install process can complete as quickly as possible; the inode table + initialization process would then be deferred until the next time the + file system is unmounted. + + init_itable=n + The lazy itable init code will wait n times the number of milliseconds + it took to zero out the previous block group's inode table. This + minimizes the impact on the system performance while file system's + inode table is being initialized. + + discard, nodiscard(*) + Controls whether ext4 should issue discard/TRIM commands to the + underlying block device when blocks are freed. This is useful for SSD + devices and sparse/thinly-provisioned LUNs, but it is off by default + until sufficient testing has been done. + + nouid32 + Disables 32-bit UIDs and GIDs. This is for interoperability with + older kernels which only store and expect 16-bit values. + + block_validity(*), noblock_validity + These options enable or disable the in-kernel facility for tracking + filesystem metadata blocks within internal data structures. This + allows multi- block allocator and other routines to notice bugs or + corrupted allocation bitmaps which cause blocks to be allocated which + overlap with filesystem metadata blocks. + + dioread_lock, dioread_nolock + Controls whether or not ext4 should use the DIO read locking. If the + dioread_nolock option is specified ext4 will allocate uninitialized + extent before buffer write and convert the extent to initialized after + IO completes. This approach allows ext4 code to avoid using inode + mutex, which improves scalability on high speed storages. However this + does not work with data journaling and dioread_nolock option will be + ignored with kernel warning. Note that dioread_nolock code path is only + used for extent-based files. Because of the restrictions this options + comprises it is off by default (e.g. dioread_lock). + + max_dir_size_kb=n + This limits the size of directories so that any attempt to expand them + beyond the specified limit in kilobytes will cause an ENOSPC error. + This is useful in memory constrained environments, where a very large + directory can cause severe performance problems or even provoke the Out + Of Memory killer. (For example, if there is only 512mb memory + available, a 176mb directory may seriously cramp the system's style.) + + i_version + Enable 64-bit inode version support. This option is off by default. + + dax + Use direct access (no page cache). See + Documentation/filesystems/dax.txt. Note that this option is + incompatible with data=journal. + +Data Mode +========= +There are 3 different data modes: + +* writeback mode + + In data=writeback mode, ext4 does not journal data at all. This mode provides + a similar level of journaling as that of XFS, JFS, and ReiserFS in its default + mode - metadata journaling. A crash+recovery can cause incorrect data to + appear in files which were written shortly before the crash. This mode will + typically provide the best ext4 performance. + +* ordered mode + + In data=ordered mode, ext4 only officially journals metadata, but it logically + groups metadata information related to data changes with the data blocks into + a single unit called a transaction. When it's time to write the new metadata + out to disk, the associated data blocks are written first. In general, this + mode performs slightly slower than writeback but significantly faster than + journal mode. + +* journal mode + + data=journal mode provides full data and metadata journaling. All new data is + written to the journal first, and then to its final location. In the event of + a crash, the journal can be replayed, bringing both data and metadata into a + consistent state. This mode is the slowest except when data needs to be read + from and written to disk at the same time where it outperforms all others + modes. Enabling this mode will disable delayed allocation and O_DIRECT + support. + +/proc entries +============= + +Information about mounted ext4 file systems can be found in +/proc/fs/ext4. Each mounted filesystem will have a directory in +/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or +/proc/fs/ext4/dm-0). The files in each per-device directory are shown +in table below. + +Files in /proc/fs/ext4/<devname> + + mb_groups + details of multiblock allocator buddy cache of free blocks + +/sys entries +============ + +Information about mounted ext4 file systems can be found in +/sys/fs/ext4. Each mounted filesystem will have a directory in +/sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or +/sys/fs/ext4/dm-0). The files in each per-device directory are shown +in table below. + +Files in /sys/fs/ext4/<devname>: + +(see also Documentation/ABI/testing/sysfs-fs-ext4) + + delayed_allocation_blocks + This file is read-only and shows the number of blocks that are dirty in + the page cache, but which do not have their location in the filesystem + allocated yet. + + inode_goal + Tuning parameter which (if non-zero) controls the goal inode used by + the inode allocator in preference to all other allocation heuristics. + This is intended for debugging use only, and should be 0 on production + systems. + + inode_readahead_blks + Tuning parameter which controls the maximum number of inode table + blocks that ext4's inode table readahead algorithm will pre-read into + the buffer cache. + + lifetime_write_kbytes + This file is read-only and shows the number of kilobytes of data that + have been written to this filesystem since it was created. + + max_writeback_mb_bump + The maximum number of megabytes the writeback code will try to write + out before move on to another inode. + + mb_group_prealloc + The multiblock allocator will round up allocation requests to a + multiple of this tuning parameter if the stripe size is not set in the + ext4 superblock + + mb_max_to_scan + The maximum number of extents the multiblock allocator will search to + find the best extent. + + mb_min_to_scan + The minimum number of extents the multiblock allocator will search to + find the best extent. + + mb_order2_req + Tuning parameter which controls the minimum size for requests (as a + power of 2) where the buddy cache is used. + + mb_stats + Controls whether the multiblock allocator should collect statistics, + which are shown during the unmount. 1 means to collect statistics, 0 + means not to collect statistics. + + mb_stream_req + Files which have fewer blocks than this tunable parameter will have + their blocks allocated out of a block group specific preallocation + pool, so that small files are packed closely together. Each large file + will have its blocks allocated out of its own unique preallocation + pool. + + session_write_kbytes + This file is read-only and shows the number of kilobytes of data that + have been written to this filesystem since it was mounted. + + reserved_clusters + This is RW file and contains number of reserved clusters in the file + system which will be used in the specific situations to avoid costly + zeroout, unexpected ENOSPC, or possible data loss. The default is 2% or + 4096 clusters, whichever is smaller and this can be changed however it + can never exceed number of clusters in the file system. If there is not + enough space for the reserved space when mounting the file mount will + _not_ fail. + +Ioctls +====== + +There is some Ext4 specific functionality which can be accessed by applications +through the system call interfaces. The list of all Ext4 specific ioctls are +shown in the table below. + +Table of Ext4 specific ioctls + + EXT4_IOC_GETFLAGS + Get additional attributes associated with inode. The ioctl argument is + an integer bitfield, with bit values described in ext4.h. This ioctl is + an alias for FS_IOC_GETFLAGS. + + EXT4_IOC_SETFLAGS + Set additional attributes associated with inode. The ioctl argument is + an integer bitfield, with bit values described in ext4.h. This ioctl is + an alias for FS_IOC_SETFLAGS. + + EXT4_IOC_GETVERSION, EXT4_IOC_GETVERSION_OLD + Get the inode i_generation number stored for each inode. The + i_generation number is normally changed only when new inode is created + and it is particularly useful for network filesystems. The '_OLD' + version of this ioctl is an alias for FS_IOC_GETVERSION. + + EXT4_IOC_SETVERSION, EXT4_IOC_SETVERSION_OLD + Set the inode i_generation number stored for each inode. The '_OLD' + version of this ioctl is an alias for FS_IOC_SETVERSION. + + EXT4_IOC_GROUP_EXTEND + This ioctl has the same purpose as the resize mount option. It allows + to resize filesystem to the end of the last existing block group, + further resize has to be done with resize2fs, either online, or + offline. The argument points to the unsigned logn number representing + the filesystem new block count. + + EXT4_IOC_MOVE_EXT + Move the block extents from orig_fd (the one this ioctl is pointing to) + to the donor_fd (the one specified in move_extent structure passed as + an argument to this ioctl). Then, exchange inode metadata between + orig_fd and donor_fd. This is especially useful for online + defragmentation, because the allocator has the opportunity to allocate + moved blocks better, ideally into one contiguous extent. + + EXT4_IOC_GROUP_ADD + Add a new group descriptor to an existing or new group descriptor + block. The new group descriptor is described by ext4_new_group_input + structure, which is passed as an argument to this ioctl. This is + especially useful in conjunction with EXT4_IOC_GROUP_EXTEND, which + allows online resize of the filesystem to the end of the last existing + block group. Those two ioctls combined is used in userspace online + resize tool (e.g. resize2fs). + + EXT4_IOC_MIGRATE + This ioctl operates on the filesystem itself. It converts (migrates) + ext3 indirect block mapped inode to ext4 extent mapped inode by walking + through indirect block mapping of the original inode and converting + contiguous block ranges into ext4 extents of the temporary inode. Then, + inodes are swapped. This ioctl might help, when migrating from ext3 to + ext4 filesystem, however suggestion is to create fresh ext4 filesystem + and copy data from the backup. Note, that filesystem has to support + extents for this ioctl to work. + + EXT4_IOC_ALLOC_DA_BLKS + Force all of the delay allocated blocks to be allocated to preserve + application-expected ext3 behaviour. Note that this will also start + triggering a write of the data blocks, but this behaviour may change in + the future as it is not necessary and has been done this way only for + sake of simplicity. + + EXT4_IOC_RESIZE_FS + Resize the filesystem to a new size. The number of blocks of resized + filesystem is passed in via 64 bit integer argument. The kernel + allocates bitmaps and inode table, the userspace tool thus just passes + the new number of blocks. + + EXT4_IOC_SWAP_BOOT + Swap i_blocks and associated attributes (like i_blocks, i_size, + i_flags, ...) from the specified inode with inode EXT4_BOOT_LOADER_INO + (#5). This is typically used to store a boot loader in a secure part of + the filesystem, where it can't be changed by a normal user by accident. + The data blocks of the previous boot loader will be associated with the + given inode. + +References +========== + +kernel source: <file:fs/ext4/> + <file:fs/jbd2/> + +programs: http://e2fsprogs.sourceforge.net/ + +useful links: http://fedoraproject.org/wiki/ext3-devel + http://www.bullopensource.org/ext4/ + http://ext4.wiki.kernel.org/index.php/Main_Page + http://fedoraproject.org/wiki/Features/Ext4 diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index 0873685bab0f..965745d5fb9a 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -71,6 +71,7 @@ configure specific aspects of kernel behavior to your liking. java ras bcache + ext4 pm/index thunderbolt LSM/index diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 92eb1f42240d..e129cd8a6dcc 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -856,6 +856,11 @@ causing system reset or hang due to sending INIT from AP to BSP. + disable_counter_freezing [HW] + Disable Intel PMU counter freezing feature. + The feature only exists starting from + Arch Perfmon v4 (Skylake and newer). + disable_ddw [PPC/PSERIES] Disable Dynamic DMA Window support. Use this if to workaround buggy firmware. @@ -1385,6 +1390,11 @@ hvc_iucv_allow= [S390] Comma-separated list of z/VM user IDs. If specified, z/VM IUCV HVC accepts connections from listed z/VM user IDs only. + + hv_nopvspin [X86,HYPER_V] Disables the paravirt spinlock optimizations + which allow the hypervisor to 'idle' the + guest on lock contention. + keep_bootcon [KNL] Do not unregister boot console at start. This is only useful for debugging when something happens in the window @@ -1754,7 +1764,7 @@ Format: { "0" | "1" } 0 - Use IOMMU translation for DMA. 1 - Bypass the IOMMU for DMA. - unset - Use IOMMU translation for DMA. + unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH. io7= [HW] IO7 for Marvel based alpha systems See comment before marvel_specify_io7 in @@ -2274,6 +2284,8 @@ ltpc= [NET] Format: <io>,<irq>,<dma> + lsm.debug [SECURITY] Enable LSM initialization debugging output. + machvec= [IA-64] Force the use of a particular machine-vector (machvec) in a generic kernel. Example: machvec=hpzx1_swiotlb @@ -3540,14 +3552,14 @@ In kernels built with CONFIG_RCU_NOCB_CPU=y, set the specified list of CPUs to be no-callback CPUs. - Invocation of these CPUs' RCU callbacks will - be offloaded to "rcuox/N" kthreads created for - that purpose, where "x" is "b" for RCU-bh, "p" - for RCU-preempt, and "s" for RCU-sched, and "N" - is the CPU number. This reduces OS jitter on the - offloaded CPUs, which can be useful for HPC and - real-time workloads. It can also improve energy - efficiency for asymmetric multiprocessors. + Invocation of these CPUs' RCU callbacks will be + offloaded to "rcuox/N" kthreads created for that + purpose, where "x" is "p" for RCU-preempt, and + "s" for RCU-sched, and "N" is the CPU number. + This reduces OS jitter on the offloaded CPUs, + which can be useful for HPC and real-time + workloads. It can also improve energy efficiency + for asymmetric multiprocessors. rcu_nocb_poll [KNL] Rather than requiring that offloaded CPUs @@ -3601,7 +3613,14 @@ Set required age in jiffies for a given grace period before RCU starts soliciting quiescent-state help from - rcu_note_context_switch(). + rcu_note_context_switch(). If not specified, the + kernel will calculate a value based on the most + recent settings of rcutree.jiffies_till_first_fqs + and rcutree.jiffies_till_next_fqs. + This calculated value may be viewed in + rcutree.jiffies_to_sched_qs. Any attempt to + set rcutree.jiffies_to_sched_qs will be + cheerfully overwritten. rcutree.jiffies_till_first_fqs= [KNL] Set delay from grace-period initialization to @@ -3869,12 +3888,6 @@ rcupdate.rcu_self_test= [KNL] Run the RCU early boot self tests - rcupdate.rcu_self_test_bh= [KNL] - Run the RCU bh early boot self tests - - rcupdate.rcu_self_test_sched= [KNL] - Run the RCU sched early boot self tests - rdinit= [KNL] Format: <full_path> Run specified binary instead of /init from the ramdisk, diff --git a/Documentation/admin-guide/l1tf.rst b/Documentation/admin-guide/l1tf.rst index bae52b845de0..b85dd80510b0 100644 --- a/Documentation/admin-guide/l1tf.rst +++ b/Documentation/admin-guide/l1tf.rst @@ -553,7 +553,7 @@ When nested virtualization is in use, three operating systems are involved: the bare metal hypervisor, the nested hypervisor and the nested virtual machine. VMENTER operations from the nested hypervisor into the nested guest will always be processed by the bare metal hypervisor. If KVM is the -bare metal hypervisor it wiil: +bare metal hypervisor it will: - Flush the L1D cache on every switch from the nested hypervisor to the nested virtual machine, so that the nested hypervisor's secrets are not diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst index ceead68c2df7..8edb35f11317 100644 --- a/Documentation/admin-guide/mm/index.rst +++ b/Documentation/admin-guide/mm/index.rst @@ -29,6 +29,7 @@ the Linux memory management. hugetlbpage idle_page_tracking ksm + memory-hotplug numa_memory_policy pagemap soft-dirty diff --git a/Documentation/memory-hotplug.txt b/Documentation/admin-guide/mm/memory-hotplug.rst index 7f49ebf3ddb2..25157aec5b31 100644 --- a/Documentation/memory-hotplug.txt +++ b/Documentation/admin-guide/mm/memory-hotplug.rst @@ -1,3 +1,5 @@ +.. _admin_guide_memory_hotplug: + ============== Memory Hotplug ============== @@ -9,39 +11,19 @@ This document is about memory hotplug including how-to-use and current status. Because Memory Hotplug is still under development, contents of this text will be changed often. -.. CONTENTS - - 1. Introduction - 1.1 purpose of memory hotplug - 1.2. Phases of memory hotplug - 1.3. Unit of Memory online/offline operation - 2. Kernel Configuration - 3. sysfs files for memory hotplug - 4. Physical memory hot-add phase - 4.1 Hardware(Firmware) Support - 4.2 Notify memory hot-add event by hand - 5. Logical Memory hot-add phase - 5.1. State of memory - 5.2. How to online memory - 6. Logical memory remove - 6.1 Memory offline and ZONE_MOVABLE - 6.2. How to offline memory - 7. Physical memory remove - 8. Memory hotplug event notifier - 9. Future Work List - +.. contents:: :local: .. note:: (1) x86_64's has special implementation for memory hotplug. This text does not describe it. - (2) This text assumes that sysfs is mounted at /sys. + (2) This text assumes that sysfs is mounted at ``/sys``. Introduction ============ -purpose of memory hotplug +Purpose of memory hotplug ------------------------- Memory Hotplug allows users to increase/decrease the amount of memory. @@ -57,7 +39,6 @@ hardware which supports memory power management. Linux memory hotplug is designed for both purpose. - Phases of memory hotplug ------------------------ @@ -92,7 +73,6 @@ phase by hand. (However, if you writes udev's hotplug scripts for memory hotplug, these phases can be execute in seamless way.) - Unit of Memory online/offline operation --------------------------------------- @@ -107,10 +87,9 @@ unit upon which memory online/offline operations are to be performed. The default size of a memory block is the same as memory section size unless an architecture specifies otherwise. (see :ref:`memory_hotplug_sysfs_files`.) -To determine the size (in bytes) of a memory block please read this file: - -/sys/devices/system/memory/block_size_bytes +To determine the size (in bytes) of a memory block please read this file:: + /sys/devices/system/memory/block_size_bytes Kernel Configuration ==================== @@ -119,22 +98,22 @@ To use memory hotplug feature, kernel must be compiled with following config options. - For all memory hotplug: - - Memory model -> Sparse Memory (CONFIG_SPARSEMEM) - - Allow for memory hot-add (CONFIG_MEMORY_HOTPLUG) + - Memory model -> Sparse Memory (``CONFIG_SPARSEMEM``) + - Allow for memory hot-add (``CONFIG_MEMORY_HOTPLUG``) - To enable memory removal, the following are also necessary: - - Allow for memory hot remove (CONFIG_MEMORY_HOTREMOVE) - - Page Migration (CONFIG_MIGRATION) + - Allow for memory hot remove (``CONFIG_MEMORY_HOTREMOVE``) + - Page Migration (``CONFIG_MIGRATION``) - For ACPI memory hotplug, the following are also necessary: - - Memory hotplug (under ACPI Support menu) (CONFIG_ACPI_HOTPLUG_MEMORY) + - Memory hotplug (under ACPI Support menu) (``CONFIG_ACPI_HOTPLUG_MEMORY``) - This option can be kernel module. - As a related configuration, if your box has a feature of NUMA-node hotplug via ACPI, then this option is necessary too. - ACPI0004,PNP0A05 and PNP0A06 Container Driver (under ACPI Support menu) - (CONFIG_ACPI_CONTAINER). + (``CONFIG_ACPI_CONTAINER``). This option can be kernel module too. @@ -145,10 +124,11 @@ sysfs files for memory hotplug ============================== All memory blocks have their device information in sysfs. Each memory block -is described under /sys/devices/system/memory as: +is described under ``/sys/devices/system/memory`` as:: /sys/devices/system/memory/memoryXXX - (XXX is the memory block id.) + +where XXX is the memory block id. For the memory block covered by the sysfs directory. It is expected that all memory sections in this range are present and no memory holes exist in the @@ -157,7 +137,7 @@ the existence of one should not affect the hotplug capabilities of the memory block. For example, assume 1GiB memory block size. A device for a memory starting at -0x100000000 is /sys/device/system/memory/memory4:: +0x100000000 is ``/sys/device/system/memory/memory4``:: (0x100000000 / 1Gib = 4) @@ -165,11 +145,11 @@ This device covers address range [0x100000000 ... 0x140000000) Under each memory block, you can see 5 files: -- /sys/devices/system/memory/memoryXXX/phys_index -- /sys/devices/system/memory/memoryXXX/phys_device -- /sys/devices/system/memory/memoryXXX/state -- /sys/devices/system/memory/memoryXXX/removable -- /sys/devices/system/memory/memoryXXX/valid_zones +- ``/sys/devices/system/memory/memoryXXX/phys_index`` +- ``/sys/devices/system/memory/memoryXXX/phys_device`` +- ``/sys/devices/system/memory/memoryXXX/state`` +- ``/sys/devices/system/memory/memoryXXX/removable`` +- ``/sys/devices/system/memory/memoryXXX/valid_zones`` =================== ============================================================ ``phys_index`` read-only and contains memory block id, same as XXX. @@ -207,13 +187,15 @@ Under each memory block, you can see 5 files: These directories/files appear after physical memory hotplug phase. If CONFIG_NUMA is enabled the memoryXXX/ directories can also be accessed -via symbolic links located in the /sys/devices/system/node/node* directories. +via symbolic links located in the ``/sys/devices/system/node/node*`` directories. + +For example:: -For example: -/sys/devices/system/node/node0/memory9 -> ../../memory/memory9 + /sys/devices/system/node/node0/memory9 -> ../../memory/memory9 -A backlink will also be created: -/sys/devices/system/memory/memory9/node0 -> ../../node/node0 +A backlink will also be created:: + + /sys/devices/system/memory/memory9/node0 -> ../../node/node0 .. _memory_hotplug_physical_mem: @@ -240,7 +222,6 @@ If firmware supports NUMA-node hotplug, and defines an object _HID "ACPI0004", calls hotplug code for all of objects which are defined in it. If memory device is found, memory hotplug code will be called. - Notify memory hot-add event by hand ----------------------------------- @@ -251,8 +232,9 @@ CONFIG_ARCH_MEMORY_PROBE and can be configured on powerpc, sh, and x86 if hotplug is supported, although for x86 this should be handled by ACPI notification. -Probe interface is located at -/sys/devices/system/memory/probe +Probe interface is located at:: + + /sys/devices/system/memory/probe You can tell the physical address of new memory to the kernel by:: @@ -263,7 +245,6 @@ memory_block_size] memory range is hot-added. In this case, hotplug script is not called (in current implementation). You'll have to online memory by yourself. Please see :ref:`memory_hotplug_how_to_online_memory`. - Logical Memory hot-add phase ============================ @@ -301,7 +282,7 @@ This sets a global policy and impacts all memory blocks that will subsequently be hotplugged. Currently offline blocks keep their state. It is possible, under certain circumstances, that some memory blocks will be added but will fail to online. User space tools can check their "state" files -(/sys/devices/system/memory/memoryXXX/state) and try to online them manually. +(``/sys/devices/system/memory/memoryXXX/state``) and try to online them manually. If the automatic onlining wasn't requested, failed, or some memory block was offlined it is possible to change the individual block's state by writing to the @@ -334,8 +315,6 @@ available memory will be increased. This may be changed in future. - - Logical memory remove ===================== @@ -413,88 +392,6 @@ Need more implementation yet.... - Notification completion of remove works by OS to firmware. - Guard from remove if not yet. -Memory hotplug event notifier -============================= - -Hotplugging events are sent to a notification queue. - -There are six types of notification defined in include/linux/memory.h: - -MEM_GOING_ONLINE - Generated before new memory becomes available in order to be able to - prepare subsystems to handle memory. The page allocator is still unable - to allocate from the new memory. - -MEM_CANCEL_ONLINE - Generated if MEMORY_GOING_ONLINE fails. - -MEM_ONLINE - Generated when memory has successfully brought online. The callback may - allocate pages from the new memory. - -MEM_GOING_OFFLINE - Generated to begin the process of offlining memory. Allocations are no - longer possible from the memory but some of the memory to be offlined - is still in use. The callback can be used to free memory known to a - subsystem from the indicated memory block. - -MEM_CANCEL_OFFLINE - Generated if MEMORY_GOING_OFFLINE fails. Memory is available again from - the memory block that we attempted to offline. - -MEM_OFFLINE - Generated after offlining memory is complete. - -A callback routine can be registered by calling:: - - hotplug_memory_notifier(callback_func, priority) - -Callback functions with higher values of priority are called before callback -functions with lower values. - -A callback function must have the following prototype:: - - int callback_func( - struct notifier_block *self, unsigned long action, void *arg); - -The first argument of the callback function (self) is a pointer to the block -of the notifier chain that points to the callback function itself. -The second argument (action) is one of the event types described above. -The third argument (arg) passes a pointer of struct memory_notify:: - - struct memory_notify { - unsigned long start_pfn; - unsigned long nr_pages; - int status_change_nid_normal; - int status_change_nid_high; - int status_change_nid; - } - -- start_pfn is start_pfn of online/offline memory. -- nr_pages is # of pages of online/offline memory. -- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask - is (will be) set/clear, if this is -1, then nodemask status is not changed. -- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask - is (will be) set/clear, if this is -1, then nodemask status is not changed. -- status_change_nid is set node id when N_MEMORY of nodemask is (will be) - set/clear. It means a new(memoryless) node gets new memory by online and a - node loses all memory. If this is -1, then nodemask status is not changed. - - If status_changed_nid* >= 0, callback should create/discard structures for the - node if necessary. - -The callback routine shall return one of the values -NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP -defined in include/linux/notifier.h - -NOTIFY_DONE and NOTIFY_OK have no effect on the further processing. - -NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE, -MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops -further processing of the notification queue. - -NOTIFY_STOP stops further processing of the notification queue. - Future Work =========== diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst index 8f1d3de449b5..ac6f5c597a56 100644 --- a/Documentation/admin-guide/pm/intel_pstate.rst +++ b/Documentation/admin-guide/pm/intel_pstate.rst @@ -465,6 +465,13 @@ Next, the following policy attributes have special meaning if policy for the time interval between the last two invocations of the driver's utilization update callback by the CPU scheduler for that CPU. +One more policy attribute is present if the `HWP feature is enabled in the +processor <Active Mode With HWP_>`_: + +``base_frequency`` + Shows the base frequency of the CPU. Any frequency above this will be + in the turbo frequency range. + The meaning of these attributes in the `passive mode <Passive Mode_>`_ is the same as for other scaling drivers. diff --git a/Documentation/arm/00-INDEX b/Documentation/arm/00-INDEX deleted file mode 100644 index b6e69fd371c4..000000000000 --- a/Documentation/arm/00-INDEX +++ /dev/null @@ -1,50 +0,0 @@ -00-INDEX - - this file -Booting - - requirements for booting -CCN.txt - - Cache Coherent Network ring-bus and perf PMU driver. -Interrupts - - ARM Interrupt subsystem documentation -IXP4xx - - Intel IXP4xx Network processor. -Netwinder - - Netwinder specific documentation -Porting - - Symbol definitions for porting Linux to a new ARM machine. -Setup - - Kernel initialization parameters on ARM Linux -README - - General ARM documentation -SA1100/ - - SA1100 documentation -Samsung-S3C24XX/ - - S3C24XX ARM Linux Overview -SPEAr/ - - ST SPEAr platform Linux Overview -VFP/ - - Release notes for Linux Kernel Vector Floating Point support code -cluster-pm-race-avoidance.txt - - Algorithm for CPU and Cluster setup/teardown -empeg/ - - Ltd's Empeg MP3 Car Audio Player -firmware.txt - - Secure firmware registration and calling. -kernel_mode_neon.txt - - How to use NEON instructions in kernel mode -kernel_user_helpers.txt - - Helper functions in kernel space made available for userspace. -mem_alignment - - alignment abort handler documentation -memory.txt - - description of the virtual memory layout -nwfpe/ - - NWFPE floating point emulator documentation -swp_emulation - - SWP/SWPB emulation handler/logging description -tcm.txt - - ARM Tightly Coupled Memory -uefi.txt - - [U]EFI configuration and runtime services documentation -vlocks.txt - - Voting locks, low-level mechanism relying on memory system atomic writes. diff --git a/Documentation/arm64/elf_hwcaps.txt b/Documentation/arm64/elf_hwcaps.txt index d6aff2c5e9e2..ea819ae024dd 100644 --- a/Documentation/arm64/elf_hwcaps.txt +++ b/Documentation/arm64/elf_hwcaps.txt @@ -78,11 +78,11 @@ HWCAP_EVTSTRM HWCAP_AES - Functionality implied by ID_AA64ISAR1_EL1.AES == 0b0001. + Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0001. HWCAP_PMULL - Functionality implied by ID_AA64ISAR1_EL1.AES == 0b0010. + Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0010. HWCAP_SHA1 @@ -153,7 +153,7 @@ HWCAP_ASIMDDP HWCAP_SHA512 - Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0002. + Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0010. HWCAP_SVE @@ -173,8 +173,12 @@ HWCAP_USCAT HWCAP_ILRCPC - Functionality implied by ID_AA64ISR1_EL1.LRCPC == 0b0002. + Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0010. HWCAP_FLAGM Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0001. + +HWCAP_SSBS + + Functionality implied by ID_AA64PFR1_EL1.SSBS == 0b0010. diff --git a/Documentation/arm64/hugetlbpage.txt b/Documentation/arm64/hugetlbpage.txt new file mode 100644 index 000000000000..cfae87dc653b --- /dev/null +++ b/Documentation/arm64/hugetlbpage.txt @@ -0,0 +1,38 @@ +HugeTLBpage on ARM64 +==================== + +Hugepage relies on making efficient use of TLBs to improve performance of +address translations. The benefit depends on both - + + - the size of hugepages + - size of entries supported by the TLBs + +The ARM64 port supports two flavours of hugepages. + +1) Block mappings at the pud/pmd level +-------------------------------------- + +These are regular hugepages where a pmd or a pud page table entry points to a +block of memory. Regardless of the supported size of entries in TLB, block +mappings reduce the depth of page table walk needed to translate hugepage +addresses. + +2) Using the Contiguous bit +--------------------------- + +The architecture provides a contiguous bit in the translation table entries +(D4.5.3, ARM DDI 0487C.a) that hints to the MMU to indicate that it is one of a +contiguous set of entries that can be cached in a single TLB entry. + +The contiguous bit is used in Linux to increase the mapping size at the pmd and +pte (last) level. The number of supported contiguous entries varies by page size +and level of the page table. + + +The following hugepage sizes are supported - + + CONT PTE PMD CONT PMD PUD + -------- --- -------- --- + 4K: 64K 2M 32M 1G + 16K: 2M 32M 1G + 64K: 2M 512M 16G diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt index 3b2f2dd82225..76ccded8b74c 100644 --- a/Documentation/arm64/silicon-errata.txt +++ b/Documentation/arm64/silicon-errata.txt @@ -56,6 +56,7 @@ stable kernels. | ARM | Cortex-A72 | #853709 | N/A | | ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 | | ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 | +| ARM | Cortex-A76 | #1188873 | ARM64_ERRATUM_1188873 | | ARM | MMU-500 | #841119,#826419 | N/A | | | | | | | Cavium | ThunderX ITS | #22375, #24313 | CAVIUM_ERRATUM_22375 | diff --git a/Documentation/block/00-INDEX b/Documentation/block/00-INDEX deleted file mode 100644 index 8d55b4bbb5e2..000000000000 --- a/Documentation/block/00-INDEX +++ /dev/null @@ -1,34 +0,0 @@ -00-INDEX - - This file -bfq-iosched.txt - - BFQ IO scheduler and its tunables -biodoc.txt - - Notes on the Generic Block Layer Rewrite in Linux 2.5 -biovecs.txt - - Immutable biovecs and biovec iterators -capability.txt - - Generic Block Device Capability (/sys/block/<device>/capability) -cfq-iosched.txt - - CFQ IO scheduler tunables -cmdline-partition.txt - - how to specify block device partitions on kernel command line -data-integrity.txt - - Block data integrity -deadline-iosched.txt - - Deadline IO scheduler tunables -ioprio.txt - - Block io priorities (in CFQ scheduler) -pr.txt - - Block layer support for Persistent Reservations -null_blk.txt - - Null block for block-layer benchmarking. -queue-sysfs.txt - - Queue's sysfs entries -request.txt - - The members of struct request (in include/linux/blkdev.h) -stat.txt - - Block layer statistics in /sys/block/<device>/stat -switching-sched.txt - - Switching I/O schedulers at runtime -writeback_cache_control.txt - - Control of volatile write back caches diff --git a/Documentation/blockdev/00-INDEX b/Documentation/blockdev/00-INDEX deleted file mode 100644 index c08df56dd91b..000000000000 --- a/Documentation/blockdev/00-INDEX +++ /dev/null @@ -1,18 +0,0 @@ -00-INDEX - - this file -README.DAC960 - - info on Mylex DAC960/DAC1100 PCI RAID Controller Driver for Linux. -cciss.txt - - info, major/minor #'s for Compaq's SMART Array Controllers. -cpqarray.txt - - info on using Compaq's SMART2 Intelligent Disk Array Controllers. -floppy.txt - - notes and driver options for the floppy disk driver. -mflash.txt - - info on mGine m(g)flash driver for linux. -nbd.txt - - info on a TCP implementation of a network block device. -paride.txt - - information about the parallel port IDE subsystem. -ramdisk.txt - - short guide on how to set up and use the RAM disk. diff --git a/Documentation/blockdev/README.DAC960 b/Documentation/blockdev/README.DAC960 deleted file mode 100644 index bd85fb9dc6e5..000000000000 --- a/Documentation/blockdev/README.DAC960 +++ /dev/null @@ -1,756 +0,0 @@ - Linux Driver for Mylex DAC960/AcceleRAID/eXtremeRAID PCI RAID Controllers - - Version 2.2.11 for Linux 2.2.19 - Version 2.4.11 for Linux 2.4.12 - - PRODUCTION RELEASE - - 11 October 2001 - - Leonard N. Zubkoff - Dandelion Digital - lnz@dandelion.com - - Copyright 1998-2001 by Leonard N. Zubkoff <lnz@dandelion.com> - - - INTRODUCTION - -Mylex, Inc. designs and manufactures a variety of high performance PCI RAID -controllers. Mylex Corporation is located at 34551 Ardenwood Blvd., Fremont, -California 94555, USA and can be reached at 510.796.6100 or on the World Wide -Web at http://www.mylex.com. Mylex Technical Support can be reached by -electronic mail at mylexsup@us.ibm.com, by voice at 510.608.2400, or by FAX at -510.745.7715. Contact information for offices in Europe and Japan is available -on their Web site. - -The latest information on Linux support for DAC960 PCI RAID Controllers, as -well as the most recent release of this driver, will always be available from -my Linux Home Page at URL "http://www.dandelion.com/Linux/". The Linux DAC960 -driver supports all current Mylex PCI RAID controllers including the new -eXtremeRAID 2000/3000 and AcceleRAID 352/170/160 models which have an entirely -new firmware interface from the older eXtremeRAID 1100, AcceleRAID 150/200/250, -and DAC960PJ/PG/PU/PD/PL. See below for a complete controller list as well as -minimum firmware version requirements. For simplicity, in most places this -documentation refers to DAC960 generically rather than explicitly listing all -the supported models. - -Driver bug reports should be sent via electronic mail to "lnz@dandelion.com". -Please include with the bug report the complete configuration messages reported -by the driver at startup, along with any subsequent system messages relevant to -the controller's operation, and a detailed description of your system's -hardware configuration. Driver bugs are actually quite rare; if you encounter -problems with disks being marked offline, for example, please contact Mylex -Technical Support as the problem is related to the hardware configuration -rather than the Linux driver. - -Please consult the RAID controller documentation for detailed information -regarding installation and configuration of the controllers. This document -primarily provides information specific to the Linux support. - - - DRIVER FEATURES - -The DAC960 RAID controllers are supported solely as high performance RAID -controllers, not as interfaces to arbitrary SCSI devices. The Linux DAC960 -driver operates at the block device level, the same level as the SCSI and IDE -drivers. Unlike other RAID controllers currently supported on Linux, the -DAC960 driver is not dependent on the SCSI subsystem, and hence avoids all the -complexity and unnecessary code that would be associated with an implementation -as a SCSI driver. The DAC960 driver is designed for as high a performance as -possible with no compromises or extra code for compatibility with lower -performance devices. The DAC960 driver includes extensive error logging and -online configuration management capabilities. Except for initial configuration -of the controller and adding new disk drives, most everything can be handled -from Linux while the system is operational. - -The DAC960 driver is architected to support up to 8 controllers per system. -Each DAC960 parallel SCSI controller can support up to 15 disk drives per -channel, for a maximum of 60 drives on a four channel controller; the fibre -channel eXtremeRAID 3000 controller supports up to 125 disk drives per loop for -a total of 250 drives. The drives installed on a controller are divided into -one or more "Drive Groups", and then each Drive Group is subdivided further -into 1 to 32 "Logical Drives". Each Logical Drive has a specific RAID Level -and caching policy associated with it, and it appears to Linux as a single -block device. Logical Drives are further subdivided into up to 7 partitions -through the normal Linux and PC disk partitioning schemes. Logical Drives are -also known as "System Drives", and Drive Groups are also called "Packs". Both -terms are in use in the Mylex documentation; I have chosen to standardize on -the more generic "Logical Drive" and "Drive Group". - -DAC960 RAID disk devices are named in the style of the obsolete Device File -System (DEVFS). The device corresponding to Logical Drive D on Controller C -is referred to as /dev/rd/cCdD, and the partitions are called /dev/rd/cCdDp1 -through /dev/rd/cCdDp7. For example, partition 3 of Logical Drive 5 on -Controller 2 is referred to as /dev/rd/c2d5p3. Note that unlike with SCSI -disks the device names will not change in the event of a disk drive failure. -The DAC960 driver is assigned major numbers 48 - 55 with one major number per -controller. The 8 bits of minor number are divided into 5 bits for the Logical -Drive and 3 bits for the partition. - - - SUPPORTED DAC960/AcceleRAID/eXtremeRAID PCI RAID CONTROLLERS - -The following list comprises the supported DAC960, AcceleRAID, and eXtremeRAID -PCI RAID Controllers as of the date of this document. It is recommended that -anyone purchasing a Mylex PCI RAID Controller not in the following table -contact the author beforehand to verify that it is or will be supported. - -eXtremeRAID 3000 - 1 Wide Ultra-2/LVD SCSI channel - 2 External Fibre FC-AL channels - 233MHz StrongARM SA 110 Processor - 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) - 32MB/64MB ECC SDRAM Memory - -eXtremeRAID 2000 - 4 Wide Ultra-160 LVD SCSI channels - 233MHz StrongARM SA 110 Processor - 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) - 32MB/64MB ECC SDRAM Memory - -AcceleRAID 352 - 2 Wide Ultra-160 LVD SCSI channels - 100MHz Intel i960RN RISC Processor - 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) - 32MB/64MB ECC SDRAM Memory - -AcceleRAID 170 - 1 Wide Ultra-160 LVD SCSI channel - 100MHz Intel i960RM RISC Processor - 16MB/32MB/64MB ECC SDRAM Memory - -AcceleRAID 160 (AcceleRAID 170LP) - 1 Wide Ultra-160 LVD SCSI channel - 100MHz Intel i960RS RISC Processor - Built in 16M ECC SDRAM Memory - PCI Low Profile Form Factor - fit for 2U height - -eXtremeRAID 1100 (DAC1164P) - 3 Wide Ultra-2/LVD SCSI channels - 233MHz StrongARM SA 110 Processor - 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) - 16MB/32MB/64MB Parity SDRAM Memory with Battery Backup - -AcceleRAID 250 (DAC960PTL1) - Uses onboard Symbios SCSI chips on certain motherboards - Also includes one onboard Wide Ultra-2/LVD SCSI Channel - 66MHz Intel i960RD RISC Processor - 4MB/8MB/16MB/32MB/64MB/128MB ECC EDO Memory - -AcceleRAID 200 (DAC960PTL0) - Uses onboard Symbios SCSI chips on certain motherboards - Includes no onboard SCSI Channels - 66MHz Intel i960RD RISC Processor - 4MB/8MB/16MB/32MB/64MB/128MB ECC EDO Memory - -AcceleRAID 150 (DAC960PRL) - Uses onboard Symbios SCSI chips on certain motherboards - Also includes one onboard Wide Ultra-2/LVD SCSI Channel - 33MHz Intel i960RP RISC Processor - 4MB Parity EDO Memory - -DAC960PJ 1/2/3 Wide Ultra SCSI-3 Channels - 66MHz Intel i960RD RISC Processor - 4MB/8MB/16MB/32MB/64MB/128MB ECC EDO Memory - -DAC960PG 1/2/3 Wide Ultra SCSI-3 Channels - 33MHz Intel i960RP RISC Processor - 4MB/8MB ECC EDO Memory - -DAC960PU 1/2/3 Wide Ultra SCSI-3 Channels - Intel i960CF RISC Processor - 4MB/8MB EDRAM or 2MB/4MB/8MB/16MB/32MB DRAM Memory - -DAC960PD 1/2/3 Wide Fast SCSI-2 Channels - Intel i960CF RISC Processor - 4MB/8MB EDRAM or 2MB/4MB/8MB/16MB/32MB DRAM Memory - -DAC960PL 1/2/3 Wide Fast SCSI-2 Channels - Intel i960 RISC Processor - 2MB/4MB/8MB/16MB/32MB DRAM Memory - -DAC960P 1/2/3 Wide Fast SCSI-2 Channels - Intel i960 RISC Processor - 2MB/4MB/8MB/16MB/32MB DRAM Memory - -For the eXtremeRAID 2000/3000 and AcceleRAID 352/170/160, firmware version -6.00-01 or above is required. - -For the eXtremeRAID 1100, firmware version 5.06-0-52 or above is required. - -For the AcceleRAID 250, 200, and 150, firmware version 4.06-0-57 or above is -required. - -For the DAC960PJ and DAC960PG, firmware version 4.06-0-00 or above is required. - -For the DAC960PU, DAC960PD, DAC960PL, and DAC960P, either firmware version -3.51-0-04 or above is required (for dual Flash ROM controllers), or firmware -version 2.73-0-00 or above is required (for single Flash ROM controllers) - -Please note that not all SCSI disk drives are suitable for use with DAC960 -controllers, and only particular firmware versions of any given model may -actually function correctly. Similarly, not all motherboards have a BIOS that -properly initializes the AcceleRAID 250, AcceleRAID 200, AcceleRAID 150, -DAC960PJ, and DAC960PG because the Intel i960RD/RP is a multi-function device. -If in doubt, contact Mylex RAID Technical Support (mylexsup@us.ibm.com) to -verify compatibility. Mylex makes available a hard disk compatibility list at -http://www.mylex.com/support/hdcomp/hd-lists.html. - - - DRIVER INSTALLATION - -This distribution was prepared for Linux kernel version 2.2.19 or 2.4.12. - -To install the DAC960 RAID driver, you may use the following commands, -replacing "/usr/src" with wherever you keep your Linux kernel source tree: - - cd /usr/src - tar -xvzf DAC960-2.2.11.tar.gz (or DAC960-2.4.11.tar.gz) - mv README.DAC960 linux/Documentation - mv DAC960.[ch] linux/drivers/block - patch -p0 < DAC960.patch (if DAC960.patch is included) - cd linux - make config - make bzImage (or zImage) - -Then install "arch/x86/boot/bzImage" or "arch/x86/boot/zImage" as your -standard kernel, run lilo if appropriate, and reboot. - -To create the necessary devices in /dev, the "make_rd" script included in -"DAC960-Utilities.tar.gz" from http://www.dandelion.com/Linux/ may be used. -LILO 21 and FDISK v2.9 include DAC960 support; also included in this archive -are patches to LILO 20 and FDISK v2.8 that add DAC960 support, along with -statically linked executables of LILO and FDISK. This modified version of LILO -will allow booting from a DAC960 controller and/or mounting the root file -system from a DAC960. - -Red Hat Linux 6.0 and SuSE Linux 6.1 include support for Mylex PCI RAID -controllers. Installing directly onto a DAC960 may be problematic from other -Linux distributions until their installation utilities are updated. - - - INSTALLATION NOTES - -Before installing Linux or adding DAC960 logical drives to an existing Linux -system, the controller must first be configured to provide one or more logical -drives using the BIOS Configuration Utility or DACCF. Please note that since -there are only at most 6 usable partitions on each logical drive, systems -requiring more partitions should subdivide a drive group into multiple logical -drives, each of which can have up to 6 usable partitions. Also, note that with -large disk arrays it is advisable to enable the 8GB BIOS Geometry (255/63) -rather than accepting the default 2GB BIOS Geometry (128/32); failing to so do -will cause the logical drive geometry to have more than 65535 cylinders which -will make it impossible for FDISK to be used properly. The 8GB BIOS Geometry -can be enabled by configuring the DAC960 BIOS, which is accessible via Alt-M -during the BIOS initialization sequence. - -For maximum performance and the most efficient E2FSCK performance, it is -recommended that EXT2 file systems be built with a 4KB block size and 16 block -stride to match the DAC960 controller's 64KB default stripe size. The command -"mke2fs -b 4096 -R stride=16 <device>" is appropriate. Unless there will be a -large number of small files on the file systems, it is also beneficial to add -the "-i 16384" option to increase the bytes per inode parameter thereby -reducing the file system metadata. Finally, on systems that will only be run -with Linux 2.2 or later kernels it is beneficial to enable sparse superblocks -with the "-s 1" option. - - - DAC960 ANNOUNCEMENTS MAILING LIST - -The DAC960 Announcements Mailing List provides a forum for informing Linux -users of new driver releases and other announcements regarding Linux support -for DAC960 PCI RAID Controllers. To join the mailing list, send a message to -"dac960-announce-request@dandelion.com" with the line "subscribe" in the -message body. - - - CONTROLLER CONFIGURATION AND STATUS MONITORING - -The DAC960 RAID controllers running firmware 4.06 or above include a Background -Initialization facility so that system downtime is minimized both for initial -installation and subsequent configuration of additional storage. The BIOS -Configuration Utility (accessible via Alt-R during the BIOS initialization -sequence) is used to quickly configure the controller, and then the logical -drives that have been created are available for immediate use even while they -are still being initialized by the controller. The primary need for online -configuration and status monitoring is then to avoid system downtime when disk -drives fail and must be replaced. Mylex's online monitoring and configuration -utilities are being ported to Linux and will become available at some point in -the future. Note that with a SAF-TE (SCSI Accessed Fault-Tolerant Enclosure) -enclosure, the controller is able to rebuild failed drives automatically as -soon as a drive replacement is made available. - -The primary interfaces for controller configuration and status monitoring are -special files created in the /proc/rd/... hierarchy along with the normal -system console logging mechanism. Whenever the system is operating, the DAC960 -driver queries each controller for status information every 10 seconds, and -checks for additional conditions every 60 seconds. The initial status of each -controller is always available for controller N in /proc/rd/cN/initial_status, -and the current status as of the last status monitoring query is available in -/proc/rd/cN/current_status. In addition, status changes are also logged by the -driver to the system console and will appear in the log files maintained by -syslog. The progress of asynchronous rebuild or consistency check operations -is also available in /proc/rd/cN/current_status, and progress messages are -logged to the system console at most every 60 seconds. - -Starting with the 2.2.3/2.0.3 versions of the driver, the status information -available in /proc/rd/cN/initial_status and /proc/rd/cN/current_status has been -augmented to include the vendor, model, revision, and serial number (if -available) for each physical device found connected to the controller: - -***** DAC960 RAID Driver Version 2.2.3 of 19 August 1999 ***** -Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> -Configuring Mylex DAC960PRL PCI RAID Controller - Firmware Version: 4.07-0-07, Channels: 1, Memory Size: 16MB - PCI Bus: 1, Device: 4, Function: 1, I/O Address: Unassigned - PCI Address: 0xFE300000 mapped at 0xA0800000, IRQ Channel: 21 - Controller Queue Depth: 128, Maximum Blocks per Command: 128 - Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 - Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 - SAF-TE Enclosure Management Enabled - Physical Devices: - 0:0 Vendor: IBM Model: DRVS09D Revision: 0270 - Serial Number: 68016775HA - Disk Status: Online, 17928192 blocks - 0:1 Vendor: IBM Model: DRVS09D Revision: 0270 - Serial Number: 68004E53HA - Disk Status: Online, 17928192 blocks - 0:2 Vendor: IBM Model: DRVS09D Revision: 0270 - Serial Number: 13013935HA - Disk Status: Online, 17928192 blocks - 0:3 Vendor: IBM Model: DRVS09D Revision: 0270 - Serial Number: 13016897HA - Disk Status: Online, 17928192 blocks - 0:4 Vendor: IBM Model: DRVS09D Revision: 0270 - Serial Number: 68019905HA - Disk Status: Online, 17928192 blocks - 0:5 Vendor: IBM Model: DRVS09D Revision: 0270 - Serial Number: 68012753HA - Disk Status: Online, 17928192 blocks - 0:6 Vendor: ESG-SHV Model: SCA HSBP M6 Revision: 0.61 - Logical Drives: - /dev/rd/c0d0: RAID-5, Online, 89640960 blocks, Write Thru - No Rebuild or Consistency Check in Progress - -To simplify the monitoring process for custom software, the special file -/proc/rd/status returns "OK" when all DAC960 controllers in the system are -operating normally and no failures have occurred, or "ALERT" if any logical -drives are offline or critical or any non-standby physical drives are dead. - -Configuration commands for controller N are available via the special file -/proc/rd/cN/user_command. A human readable command can be written to this -special file to initiate a configuration operation, and the results of the -operation can then be read back from the special file in addition to being -logged to the system console. The shell command sequence - - echo "<configuration-command>" > /proc/rd/c0/user_command - cat /proc/rd/c0/user_command - -is typically used to execute configuration commands. The configuration -commands are: - - flush-cache - - The "flush-cache" command flushes the controller's cache. The system - automatically flushes the cache at shutdown or if the driver module is - unloaded, so this command is only needed to be certain a write back cache - is flushed to disk before the system is powered off by a command to a UPS. - Note that the flush-cache command also stops an asynchronous rebuild or - consistency check, so it should not be used except when the system is being - halted. - - kill <channel>:<target-id> - - The "kill" command marks the physical drive <channel>:<target-id> as DEAD. - This command is provided primarily for testing, and should not be used - during normal system operation. - - make-online <channel>:<target-id> - - The "make-online" command changes the physical drive <channel>:<target-id> - from status DEAD to status ONLINE. In cases where multiple physical drives - have been killed simultaneously, this command may be used to bring all but - one of them back online, after which a rebuild to the final drive is - necessary. - - Warning: make-online should only be used on a dead physical drive that is - an active part of a drive group, never on a standby drive. The command - should never be used on a dead drive that is part of a critical logical - drive; rebuild should be used if only a single drive is dead. - - make-standby <channel>:<target-id> - - The "make-standby" command changes physical drive <channel>:<target-id> - from status DEAD to status STANDBY. It should only be used in cases where - a dead drive was replaced after an automatic rebuild was performed onto a - standby drive. It cannot be used to add a standby drive to the controller - configuration if one was not created initially; the BIOS Configuration - Utility must be used for that currently. - - rebuild <channel>:<target-id> - - The "rebuild" command initiates an asynchronous rebuild onto physical drive - <channel>:<target-id>. It should only be used when a dead drive has been - replaced. - - check-consistency <logical-drive-number> - - The "check-consistency" command initiates an asynchronous consistency check - of <logical-drive-number> with automatic restoration. It can be used - whenever it is desired to verify the consistency of the redundancy - information. - - cancel-rebuild - cancel-consistency-check - - The "cancel-rebuild" and "cancel-consistency-check" commands cancel any - rebuild or consistency check operations previously initiated. - - - EXAMPLE I - DRIVE FAILURE WITHOUT A STANDBY DRIVE - -The following annotated logs demonstrate the controller configuration and and -online status monitoring capabilities of the Linux DAC960 Driver. The test -configuration comprises 6 1GB Quantum Atlas I disk drives on two channels of a -DAC960PJ controller. The physical drives are configured into a single drive -group without a standby drive, and the drive group has been configured into two -logical drives, one RAID-5 and one RAID-6. Note that these logs are from an -earlier version of the driver and the messages have changed somewhat with newer -releases, but the functionality remains similar. First, here is the current -status of the RAID configuration: - -gwynedd:/u/lnz# cat /proc/rd/c0/current_status -***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 ***** -Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> -Configuring Mylex DAC960PJ PCI RAID Controller - Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB - PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned - PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9 - Controller Queue Depth: 128, Maximum Blocks per Command: 128 - Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 - Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 - Physical Devices: - 0:1 - Disk: Online, 2201600 blocks - 0:2 - Disk: Online, 2201600 blocks - 0:3 - Disk: Online, 2201600 blocks - 1:1 - Disk: Online, 2201600 blocks - 1:2 - Disk: Online, 2201600 blocks - 1:3 - Disk: Online, 2201600 blocks - Logical Drives: - /dev/rd/c0d0: RAID-5, Online, 5498880 blocks, Write Thru - /dev/rd/c0d1: RAID-6, Online, 3305472 blocks, Write Thru - No Rebuild or Consistency Check in Progress - -gwynedd:/u/lnz# cat /proc/rd/status -OK - -The above messages indicate that everything is healthy, and /proc/rd/status -returns "OK" indicating that there are no problems with any DAC960 controller -in the system. For demonstration purposes, while I/O is active Physical Drive -1:1 is now disconnected, simulating a drive failure. The failure is noted by -the driver within 10 seconds of the controller's having detected it, and the -driver logs the following console status messages indicating that Logical -Drives 0 and 1 are now CRITICAL as a result of Physical Drive 1:1 being DEAD: - -DAC960#0: Physical Drive 1:2 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 -DAC960#0: Physical Drive 1:3 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 -DAC960#0: Physical Drive 1:1 killed because of timeout on SCSI command -DAC960#0: Physical Drive 1:1 is now DEAD -DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now CRITICAL -DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now CRITICAL - -The Sense Keys logged here are just Check Condition / Unit Attention conditions -arising from a SCSI bus reset that is forced by the controller during its error -recovery procedures. Concurrently with the above, the driver status available -from /proc/rd also reflects the drive failure. The status message in -/proc/rd/status has changed from "OK" to "ALERT": - -gwynedd:/u/lnz# cat /proc/rd/status -ALERT - -and /proc/rd/c0/current_status has been updated: - -gwynedd:/u/lnz# cat /proc/rd/c0/current_status - ... - Physical Devices: - 0:1 - Disk: Online, 2201600 blocks - 0:2 - Disk: Online, 2201600 blocks - 0:3 - Disk: Online, 2201600 blocks - 1:1 - Disk: Dead, 2201600 blocks - 1:2 - Disk: Online, 2201600 blocks - 1:3 - Disk: Online, 2201600 blocks - Logical Drives: - /dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru - /dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru - No Rebuild or Consistency Check in Progress - -Since there are no standby drives configured, the system can continue to access -the logical drives in a performance degraded mode until the failed drive is -replaced and a rebuild operation completed to restore the redundancy of the -logical drives. Once Physical Drive 1:1 is replaced with a properly -functioning drive, or if the physical drive was killed without having failed -(e.g., due to electrical problems on the SCSI bus), the user can instruct the -controller to initiate a rebuild operation onto the newly replaced drive: - -gwynedd:/u/lnz# echo "rebuild 1:1" > /proc/rd/c0/user_command -gwynedd:/u/lnz# cat /proc/rd/c0/user_command -Rebuild of Physical Drive 1:1 Initiated - -The echo command instructs the controller to initiate an asynchronous rebuild -operation onto Physical Drive 1:1, and the status message that results from the -operation is then available for reading from /proc/rd/c0/user_command, as well -as being logged to the console by the driver. - -Within 10 seconds of this command the driver logs the initiation of the -asynchronous rebuild operation: - -DAC960#0: Rebuild of Physical Drive 1:1 Initiated -DAC960#0: Physical Drive 1:1 Error Log: Sense Key = 6, ASC = 29, ASCQ = 01 -DAC960#0: Physical Drive 1:1 is now WRITE-ONLY -DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 1% completed - -and /proc/rd/c0/current_status is updated: - -gwynedd:/u/lnz# cat /proc/rd/c0/current_status - ... - Physical Devices: - 0:1 - Disk: Online, 2201600 blocks - 0:2 - Disk: Online, 2201600 blocks - 0:3 - Disk: Online, 2201600 blocks - 1:1 - Disk: Write-Only, 2201600 blocks - 1:2 - Disk: Online, 2201600 blocks - 1:3 - Disk: Online, 2201600 blocks - Logical Drives: - /dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru - /dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru - Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 6% completed - -As the rebuild progresses, the current status in /proc/rd/c0/current_status is -updated every 10 seconds: - -gwynedd:/u/lnz# cat /proc/rd/c0/current_status - ... - Physical Devices: - 0:1 - Disk: Online, 2201600 blocks - 0:2 - Disk: Online, 2201600 blocks - 0:3 - Disk: Online, 2201600 blocks - 1:1 - Disk: Write-Only, 2201600 blocks - 1:2 - Disk: Online, 2201600 blocks - 1:3 - Disk: Online, 2201600 blocks - Logical Drives: - /dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru - /dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru - Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 15% completed - -and every minute a progress message is logged to the console by the driver: - -DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 32% completed -DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 63% completed -DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 94% completed -DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 94% completed - -Finally, the rebuild completes successfully. The driver logs the status of the -logical and physical drives and the rebuild completion: - -DAC960#0: Rebuild Completed Successfully -DAC960#0: Physical Drive 1:1 is now ONLINE -DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now ONLINE -DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now ONLINE - -/proc/rd/c0/current_status is updated: - -gwynedd:/u/lnz# cat /proc/rd/c0/current_status - ... - Physical Devices: - 0:1 - Disk: Online, 2201600 blocks - 0:2 - Disk: Online, 2201600 blocks - 0:3 - Disk: Online, 2201600 blocks - 1:1 - Disk: Online, 2201600 blocks - 1:2 - Disk: Online, 2201600 blocks - 1:3 - Disk: Online, 2201600 blocks - Logical Drives: - /dev/rd/c0d0: RAID-5, Online, 5498880 blocks, Write Thru - /dev/rd/c0d1: RAID-6, Online, 3305472 blocks, Write Thru - Rebuild Completed Successfully - -and /proc/rd/status indicates that everything is healthy once again: - -gwynedd:/u/lnz# cat /proc/rd/status -OK - - - EXAMPLE II - DRIVE FAILURE WITH A STANDBY DRIVE - -The following annotated logs demonstrate the controller configuration and and -online status monitoring capabilities of the Linux DAC960 Driver. The test -configuration comprises 6 1GB Quantum Atlas I disk drives on two channels of a -DAC960PJ controller. The physical drives are configured into a single drive -group with a standby drive, and the drive group has been configured into two -logical drives, one RAID-5 and one RAID-6. Note that these logs are from an -earlier version of the driver and the messages have changed somewhat with newer -releases, but the functionality remains similar. First, here is the current -status of the RAID configuration: - -gwynedd:/u/lnz# cat /proc/rd/c0/current_status -***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 ***** -Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> -Configuring Mylex DAC960PJ PCI RAID Controller - Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB - PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned - PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9 - Controller Queue Depth: 128, Maximum Blocks per Command: 128 - Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 - Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 - Physical Devices: - 0:1 - Disk: Online, 2201600 blocks - 0:2 - Disk: Online, 2201600 blocks - 0:3 - Disk: Online, 2201600 blocks - 1:1 - Disk: Online, 2201600 blocks - 1:2 - Disk: Online, 2201600 blocks - 1:3 - Disk: Standby, 2201600 blocks - Logical Drives: - /dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru - /dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru - No Rebuild or Consistency Check in Progress - -gwynedd:/u/lnz# cat /proc/rd/status -OK - -The above messages indicate that everything is healthy, and /proc/rd/status -returns "OK" indicating that there are no problems with any DAC960 controller -in the system. For demonstration purposes, while I/O is active Physical Drive -1:2 is now disconnected, simulating a drive failure. The failure is noted by -the driver within 10 seconds of the controller's having detected it, and the -driver logs the following console status messages: - -DAC960#0: Physical Drive 1:1 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 -DAC960#0: Physical Drive 1:3 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 -DAC960#0: Physical Drive 1:2 killed because of timeout on SCSI command -DAC960#0: Physical Drive 1:2 is now DEAD -DAC960#0: Physical Drive 1:2 killed because it was removed -DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now CRITICAL -DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now CRITICAL - -Since a standby drive is configured, the controller automatically begins -rebuilding onto the standby drive: - -DAC960#0: Physical Drive 1:3 is now WRITE-ONLY -DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 4% completed - -Concurrently with the above, the driver status available from /proc/rd also -reflects the drive failure and automatic rebuild. The status message in -/proc/rd/status has changed from "OK" to "ALERT": - -gwynedd:/u/lnz# cat /proc/rd/status -ALERT - -and /proc/rd/c0/current_status has been updated: - -gwynedd:/u/lnz# cat /proc/rd/c0/current_status - ... - Physical Devices: - 0:1 - Disk: Online, 2201600 blocks - 0:2 - Disk: Online, 2201600 blocks - 0:3 - Disk: Online, 2201600 blocks - 1:1 - Disk: Online, 2201600 blocks - 1:2 - Disk: Dead, 2201600 blocks - 1:3 - Disk: Write-Only, 2201600 blocks - Logical Drives: - /dev/rd/c0d0: RAID-5, Critical, 4399104 blocks, Write Thru - /dev/rd/c0d1: RAID-6, Critical, 2754560 blocks, Write Thru - Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 4% completed - -As the rebuild progresses, the current status in /proc/rd/c0/current_status is -updated every 10 seconds: - -gwynedd:/u/lnz# cat /proc/rd/c0/current_status - ... - Physical Devices: - 0:1 - Disk: Online, 2201600 blocks - 0:2 - Disk: Online, 2201600 blocks - 0:3 - Disk: Online, 2201600 blocks - 1:1 - Disk: Online, 2201600 blocks - 1:2 - Disk: Dead, 2201600 blocks - 1:3 - Disk: Write-Only, 2201600 blocks - Logical Drives: - /dev/rd/c0d0: RAID-5, Critical, 4399104 blocks, Write Thru - /dev/rd/c0d1: RAID-6, Critical, 2754560 blocks, Write Thru - Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 40% completed - -and every minute a progress message is logged on the console by the driver: - -DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 40% completed -DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 76% completed -DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 66% completed -DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 84% completed - -Finally, the rebuild completes successfully. The driver logs the status of the -logical and physical drives and the rebuild completion: - -DAC960#0: Rebuild Completed Successfully -DAC960#0: Physical Drive 1:3 is now ONLINE -DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now ONLINE -DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now ONLINE - -/proc/rd/c0/current_status is updated: - -***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 ***** -Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> -Configuring Mylex DAC960PJ PCI RAID Controller - Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB - PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned - PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9 - Controller Queue Depth: 128, Maximum Blocks per Command: 128 - Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 - Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 - Physical Devices: - 0:1 - Disk: Online, 2201600 blocks - 0:2 - Disk: Online, 2201600 blocks - 0:3 - Disk: Online, 2201600 blocks - 1:1 - Disk: Online, 2201600 blocks - 1:2 - Disk: Dead, 2201600 blocks - 1:3 - Disk: Online, 2201600 blocks - Logical Drives: - /dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru - /dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru - Rebuild Completed Successfully - -and /proc/rd/status indicates that everything is healthy once again: - -gwynedd:/u/lnz# cat /proc/rd/status -OK - -Note that the absence of a viable standby drive does not create an "ALERT" -status. Once dead Physical Drive 1:2 has been replaced, the controller must be -told that this has occurred and that the newly replaced drive should become the -new standby drive: - -gwynedd:/u/lnz# echo "make-standby 1:2" > /proc/rd/c0/user_command -gwynedd:/u/lnz# cat /proc/rd/c0/user_command -Make Standby of Physical Drive 1:2 Succeeded - -The echo command instructs the controller to make Physical Drive 1:2 into a -standby drive, and the status message that results from the operation is then -available for reading from /proc/rd/c0/user_command, as well as being logged to -the console by the driver. Within 60 seconds of this command the driver logs: - -DAC960#0: Physical Drive 1:2 Error Log: Sense Key = 6, ASC = 29, ASCQ = 01 -DAC960#0: Physical Drive 1:2 is now STANDBY -DAC960#0: Make Standby of Physical Drive 1:2 Succeeded - -and /proc/rd/c0/current_status is updated: - -gwynedd:/u/lnz# cat /proc/rd/c0/current_status - ... - Physical Devices: - 0:1 - Disk: Online, 2201600 blocks - 0:2 - Disk: Online, 2201600 blocks - 0:3 - Disk: Online, 2201600 blocks - 1:1 - Disk: Online, 2201600 blocks - 1:2 - Disk: Standby, 2201600 blocks - 1:3 - Disk: Online, 2201600 blocks - Logical Drives: - /dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru - /dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru - Rebuild Completed Successfully diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index 875b2b56b87f..3c1b5ab54bc0 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -190,7 +190,7 @@ whitespace: notify_free Depending on device usage scenario it may account a) the number of pages freed because of swap slot free notifications or b) the number of pages freed because of - REQ_DISCARD requests sent by bio. The former ones are + REQ_OP_DISCARD requests sent by bio. The former ones are sent to a swap block device when a swap slot is freed, which implies that this disk is being used as a swap disk. The latter ones are sent by filesystem mounted with diff --git a/Documentation/cdrom/00-INDEX b/Documentation/cdrom/00-INDEX deleted file mode 100644 index 433edf23dc49..000000000000 --- a/Documentation/cdrom/00-INDEX +++ /dev/null @@ -1,11 +0,0 @@ -00-INDEX - - this file (info on CD-ROMs and Linux) -Makefile - - only used to generate TeX output from the documentation. -cdrom-standard.tex - - LaTeX document on standardizing the CD-ROM programming interface. -ide-cd - - info on setting up and using ATAPI (aka IDE) CD-ROMs. -packet-writing.txt - - Info on the CDRW packet writing module - diff --git a/Documentation/cgroup-v1/00-INDEX b/Documentation/cgroup-v1/00-INDEX deleted file mode 100644 index 13e0c85e7b35..000000000000 --- a/Documentation/cgroup-v1/00-INDEX +++ /dev/null @@ -1,26 +0,0 @@ -00-INDEX - - this file -blkio-controller.txt - - Description for Block IO Controller, implementation and usage details. -cgroups.txt - - Control Groups definition, implementation details, examples and API. -cpuacct.txt - - CPU Accounting Controller; account CPU usage for groups of tasks. -cpusets.txt - - documents the cpusets feature; assign CPUs and Mem to a set of tasks. -admin-guide/devices.rst - - Device Whitelist Controller; description, interface and security. -freezer-subsystem.txt - - checkpointing; rationale to not use signals, interface. -hugetlb.txt - - HugeTLB Controller implementation and usage details. -memcg_test.txt - - Memory Resource Controller; implementation details. -memory.txt - - Memory Resource Controller; design, accounting, interface, testing. -net_cls.txt - - Network classifier cgroups details and usages. -net_prio.txt - - Network priority cgroups details and usages. -pids.txt - - Process number cgroups details and usages. diff --git a/Documentation/conf.py b/Documentation/conf.py index b691af4831fa..72647a38b5c2 100644 --- a/Documentation/conf.py +++ b/Documentation/conf.py @@ -259,7 +259,7 @@ latex_elements = { 'papersize': 'a4paper', # The font size ('10pt', '11pt' or '12pt'). -'pointsize': '8pt', +'pointsize': '11pt', # Latex figure (float) alignment #'figure_align': 'htbp', @@ -272,8 +272,8 @@ latex_elements = { 'preamble': ''' % Use some font with UTF-8 support with XeLaTeX \\usepackage{fontspec} - \\setsansfont{DejaVu Serif} - \\setromanfont{DejaVu Sans} + \\setsansfont{DejaVu Sans} + \\setromanfont{DejaVu Serif} \\setmonofont{DejaVu Sans Mono} ''' @@ -383,6 +383,10 @@ latex_documents = [ 'The kernel development community', 'manual'), ('filesystems/index', 'filesystems.tex', 'Linux Filesystems API', 'The kernel development community', 'manual'), + ('admin-guide/ext4', 'ext4-admin-guide.tex', 'ext4 Administration Guide', + 'ext4 Community', 'manual'), + ('filesystems/ext4/index', 'ext4-data-structures.tex', + 'ext4 Data Structures and Algorithms', 'ext4 Community', 'manual'), ('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide', 'The kernel development community', 'manual'), ('input/index', 'linux-input.tex', 'The Linux input driver subsystem', diff --git a/Documentation/core-api/boot-time-mm.rst b/Documentation/core-api/boot-time-mm.rst index 03cb1643f46f..6e12e89a03e0 100644 --- a/Documentation/core-api/boot-time-mm.rst +++ b/Documentation/core-api/boot-time-mm.rst @@ -76,7 +76,7 @@ These interfaces available only with bootmem, i.e when ``CONFIG_NO_BOOTMEM=n`` .. kernel-doc:: include/linux/bootmem.h .. kernel-doc:: mm/bootmem.c - :nodocs: + :functions: Memblock specific API --------------------- @@ -89,4 +89,4 @@ really happens under the hood. .. kernel-doc:: include/linux/memblock.h .. kernel-doc:: mm/memblock.c - :nodocs: + :functions: diff --git a/Documentation/core-api/gfp_mask-from-fs-io.rst b/Documentation/core-api/gfp_mask-from-fs-io.rst index e0df8f416582..e7c32a8de126 100644 --- a/Documentation/core-api/gfp_mask-from-fs-io.rst +++ b/Documentation/core-api/gfp_mask-from-fs-io.rst @@ -1,3 +1,5 @@ +.. _gfp_mask_from_fs_io: + ================================= GFP masks used from FS/IO context ================================= diff --git a/Documentation/core-api/idr.rst b/Documentation/core-api/idr.rst index d351e880a2f6..a2738050c4f0 100644 --- a/Documentation/core-api/idr.rst +++ b/Documentation/core-api/idr.rst @@ -1,4 +1,4 @@ -.. SPDX-License-Identifier: CC-BY-SA-4.0 +.. SPDX-License-Identifier: GPL-2.0+ ============= ID Allocation diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index 26b735cefb93..29c790f571a5 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -27,10 +27,13 @@ Core utilities errseq printk-formats circular-buffers + memory-allocation mm-api gfp_mask-from-fs-io timekeeping boot-time-mm + memory-hotplug + Interfaces for kernel debugging =============================== diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst new file mode 100644 index 000000000000..f8bb9aa120c4 --- /dev/null +++ b/Documentation/core-api/memory-allocation.rst @@ -0,0 +1,122 @@ +======================= +Memory Allocation Guide +======================= + +Linux provides a variety of APIs for memory allocation. You can +allocate small chunks using `kmalloc` or `kmem_cache_alloc` families, +large virtually contiguous areas using `vmalloc` and its derivatives, +or you can directly request pages from the page allocator with +`alloc_pages`. It is also possible to use more specialized allocators, +for instance `cma_alloc` or `zs_malloc`. + +Most of the memory allocation APIs use GFP flags to express how that +memory should be allocated. The GFP acronym stands for "get free +pages", the underlying memory allocation function. + +Diversity of the allocation APIs combined with the numerous GFP flags +makes the question "How should I allocate memory?" not that easy to +answer, although very likely you should use + +:: + + kzalloc(<size>, GFP_KERNEL); + +Of course there are cases when other allocation APIs and different GFP +flags must be used. + +Get Free Page flags +=================== + +The GFP flags control the allocators behavior. They tell what memory +zones can be used, how hard the allocator should try to find free +memory, whether the memory can be accessed by the userspace etc. The +:ref:`Documentation/core-api/mm-api.rst <mm-api-gfp-flags>` provides +reference documentation for the GFP flags and their combinations and +here we briefly outline their recommended usage: + + * Most of the time ``GFP_KERNEL`` is what you need. Memory for the + kernel data structures, DMAable memory, inode cache, all these and + many other allocations types can use ``GFP_KERNEL``. Note, that + using ``GFP_KERNEL`` implies ``GFP_RECLAIM``, which means that + direct reclaim may be triggered under memory pressure; the calling + context must be allowed to sleep. + * If the allocation is performed from an atomic context, e.g interrupt + handler, use ``GFP_NOWAIT``. This flag prevents direct reclaim and + IO or filesystem operations. Consequently, under memory pressure + ``GFP_NOWAIT`` allocation is likely to fail. Allocations which + have a reasonable fallback should be using ``GFP_NOWARN``. + * If you think that accessing memory reserves is justified and the kernel + will be stressed unless allocation succeeds, you may use ``GFP_ATOMIC``. + * Untrusted allocations triggered from userspace should be a subject + of kmem accounting and must have ``__GFP_ACCOUNT`` bit set. There + is the handy ``GFP_KERNEL_ACCOUNT`` shortcut for ``GFP_KERNEL`` + allocations that should be accounted. + * Userspace allocations should use either of the ``GFP_USER``, + ``GFP_HIGHUSER`` or ``GFP_HIGHUSER_MOVABLE`` flags. The longer + the flag name the less restrictive it is. + + ``GFP_HIGHUSER_MOVABLE`` does not require that allocated memory + will be directly accessible by the kernel and implies that the + data is movable. + + ``GFP_HIGHUSER`` means that the allocated memory is not movable, + but it is not required to be directly accessible by the kernel. An + example may be a hardware allocation that maps data directly into + userspace but has no addressing limitations. + + ``GFP_USER`` means that the allocated memory is not movable and it + must be directly accessible by the kernel. + +You may notice that quite a few allocations in the existing code +specify ``GFP_NOIO`` or ``GFP_NOFS``. Historically, they were used to +prevent recursion deadlocks caused by direct memory reclaim calling +back into the FS or IO paths and blocking on already held +resources. Since 4.12 the preferred way to address this issue is to +use new scope APIs described in +:ref:`Documentation/core-api/gfp_mask-from-fs-io.rst <gfp_mask_from_fs_io>`. + +Other legacy GFP flags are ``GFP_DMA`` and ``GFP_DMA32``. They are +used to ensure that the allocated memory is accessible by hardware +with limited addressing capabilities. So unless you are writing a +driver for a device with such restrictions, avoid using these flags. +And even with hardware with restrictions it is preferable to use +`dma_alloc*` APIs. + +Selecting memory allocator +========================== + +The most straightforward way to allocate memory is to use a function +from the :c:func:`kmalloc` family. And, to be on the safe size it's +best to use routines that set memory to zero, like +:c:func:`kzalloc`. If you need to allocate memory for an array, there +are :c:func:`kmalloc_array` and :c:func:`kcalloc` helpers. + +The maximal size of a chunk that can be allocated with `kmalloc` is +limited. The actual limit depends on the hardware and the kernel +configuration, but it is a good practice to use `kmalloc` for objects +smaller than page size. + +For large allocations you can use :c:func:`vmalloc` and +:c:func:`vzalloc`, or directly request pages from the page +allocator. The memory allocated by `vmalloc` and related functions is +not physically contiguous. + +If you are not sure whether the allocation size is too large for +`kmalloc`, it is possible to use :c:func:`kvmalloc` and its +derivatives. It will try to allocate memory with `kmalloc` and if the +allocation fails it will be retried with `vmalloc`. There are +restrictions on which GFP flags can be used with `kvmalloc`; please +see :c:func:`kvmalloc_node` reference documentation. Note that +`kvmalloc` may return memory that is not physically contiguous. + +If you need to allocate many identical objects you can use the slab +cache allocator. The cache should be set up with +:c:func:`kmem_cache_create` before it can be used. Afterwards +:c:func:`kmem_cache_alloc` and its convenience wrappers can allocate +memory from that cache. + +When the allocated memory is no longer needed it must be freed. You +can use :c:func:`kvfree` for the memory allocated with `kmalloc`, +`vmalloc` and `kvmalloc`. The slab caches should be freed with +:c:func:`kmem_cache_free`. And don't forget to destroy the cache with +:c:func:`kmem_cache_destroy`. diff --git a/Documentation/core-api/memory-hotplug.rst b/Documentation/core-api/memory-hotplug.rst new file mode 100644 index 000000000000..de7467e48067 --- /dev/null +++ b/Documentation/core-api/memory-hotplug.rst @@ -0,0 +1,125 @@ +.. _memory_hotplug: + +============== +Memory hotplug +============== + +Memory hotplug event notifier +============================= + +Hotplugging events are sent to a notification queue. + +There are six types of notification defined in ``include/linux/memory.h``: + +MEM_GOING_ONLINE + Generated before new memory becomes available in order to be able to + prepare subsystems to handle memory. The page allocator is still unable + to allocate from the new memory. + +MEM_CANCEL_ONLINE + Generated if MEM_GOING_ONLINE fails. + +MEM_ONLINE + Generated when memory has successfully brought online. The callback may + allocate pages from the new memory. + +MEM_GOING_OFFLINE + Generated to begin the process of offlining memory. Allocations are no + longer possible from the memory but some of the memory to be offlined + is still in use. The callback can be used to free memory known to a + subsystem from the indicated memory block. + +MEM_CANCEL_OFFLINE + Generated if MEM_GOING_OFFLINE fails. Memory is available again from + the memory block that we attempted to offline. + +MEM_OFFLINE + Generated after offlining memory is complete. + +A callback routine can be registered by calling:: + + hotplug_memory_notifier(callback_func, priority) + +Callback functions with higher values of priority are called before callback +functions with lower values. + +A callback function must have the following prototype:: + + int callback_func( + struct notifier_block *self, unsigned long action, void *arg); + +The first argument of the callback function (self) is a pointer to the block +of the notifier chain that points to the callback function itself. +The second argument (action) is one of the event types described above. +The third argument (arg) passes a pointer of struct memory_notify:: + + struct memory_notify { + unsigned long start_pfn; + unsigned long nr_pages; + int status_change_nid_normal; + int status_change_nid_high; + int status_change_nid; + } + +- start_pfn is start_pfn of online/offline memory. +- nr_pages is # of pages of online/offline memory. +- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask + is (will be) set/clear, if this is -1, then nodemask status is not changed. +- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask + is (will be) set/clear, if this is -1, then nodemask status is not changed. +- status_change_nid is set node id when N_MEMORY of nodemask is (will be) + set/clear. It means a new(memoryless) node gets new memory by online and a + node loses all memory. If this is -1, then nodemask status is not changed. + + If status_changed_nid* >= 0, callback should create/discard structures for the + node if necessary. + +The callback routine shall return one of the values +NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP +defined in ``include/linux/notifier.h`` + +NOTIFY_DONE and NOTIFY_OK have no effect on the further processing. + +NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE, +MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops +further processing of the notification queue. + +NOTIFY_STOP stops further processing of the notification queue. + +Locking Internals +================= + +When adding/removing memory that uses memory block devices (i.e. ordinary RAM), +the device_hotplug_lock should be held to: + +- synchronize against online/offline requests (e.g. via sysfs). This way, memory + block devices can only be accessed (.online/.state attributes) by user + space once memory has been fully added. And when removing memory, we + know nobody is in critical sections. +- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC) + +Especially, there is a possible lock inversion that is avoided using +device_hotplug_lock when adding memory and user space tries to online that +memory faster than expected: + +- device_online() will first take the device_lock(), followed by + mem_hotplug_lock +- add_memory_resource() will first take the mem_hotplug_lock, followed by + the device_lock() (while creating the devices, during bus_add_device()). + +As the device is visible to user space before taking the device_lock(), this +can result in a lock inversion. + +onlining/offlining of memory should be done via device_online()/ +device_offline() - to make sure it is properly synchronized to actions +via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type) + +When adding/removing/onlining/offlining memory or adding/removing +heterogeneous/device memory, we should always hold the mem_hotplug_lock in +write mode to serialise memory hotplug (e.g. access to global/zone +variables). + +In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read +mode allows for a quite efficient get_online_mems/put_online_mems +implementation, so code accessing memory can protect from that memory +vanishing. diff --git a/Documentation/core-api/mm-api.rst b/Documentation/core-api/mm-api.rst index 46ae3537fb12..5ce1ec1dd066 100644 --- a/Documentation/core-api/mm-api.rst +++ b/Documentation/core-api/mm-api.rst @@ -14,6 +14,8 @@ User Space Memory Access .. kernel-doc:: mm/util.c :functions: get_user_pages_fast +.. _mm-api-gfp-flags: + Memory Allocation Controls ========================== diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst index 25dc591cb110..86023c33906f 100644 --- a/Documentation/core-api/printk-formats.rst +++ b/Documentation/core-api/printk-formats.rst @@ -376,15 +376,15 @@ correctness of the format string and va_list arguments. Passed by reference. -kobjects --------- +Device tree nodes +----------------- :: %pOF[fnpPcCF] -For printing kobject based structs (device nodes). Default behaviour is +For printing device tree node structures. Default behaviour is equivalent to %pOFf. - f - device node full_name diff --git a/Documentation/dev-tools/coccinelle.rst b/Documentation/dev-tools/coccinelle.rst index 94f41c290bfc..aa14f05cabb1 100644 --- a/Documentation/dev-tools/coccinelle.rst +++ b/Documentation/dev-tools/coccinelle.rst @@ -30,18 +30,29 @@ of many distributions, e.g. : - NetBSD - FreeBSD -You can get the latest version released from the Coccinelle homepage at +Some distribution packages are obsolete and it is recommended +to use the latest version released from the Coccinelle homepage at http://coccinelle.lip6.fr/ -Once you have it, run the following command:: +Or from Github at: - ./configure +https://github.com/coccinelle/coccinelle + +Once you have it, run the following commands:: + + ./autogen + ./configure make as a regular user, and install it with:: sudo make install +More detailed installation instructions to build from source can be +found at: + +https://github.com/coccinelle/coccinelle/blob/master/install.txt + Supplemental documentation --------------------------- @@ -51,6 +62,10 @@ https://bottest.wiki.kernel.org/coccicheck The wiki documentation always refers to the linux-next version of the script. +For Semantic Patch Language(SmPL) grammar documentation refer to: + +http://coccinelle.lip6.fr/documentation.php + Using Coccinelle on the Linux kernel ------------------------------------ @@ -223,7 +238,7 @@ Since coccicheck runs through make, it naturally runs from the kernel proper dir, as such the second rule above would be implied for picking up a .cocciconfig when using ``make coccicheck``. -``make coccicheck`` also supports using M= targets.If you do not supply +``make coccicheck`` also supports using M= targets. If you do not supply any M= target, it is assumed you want to target the entire kernel. The kernel coccicheck script has:: diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst index 6f653acea248..dad1bb8711e2 100644 --- a/Documentation/dev-tools/kselftest.rst +++ b/Documentation/dev-tools/kselftest.rst @@ -159,7 +159,7 @@ Contributing new tests (details) * If a test needs specific kernel config options enabled, add a config file in the test directory to enable them. - e.g: tools/testing/selftests/android/ion/config + e.g: tools/testing/selftests/android/config Test Harness ============ diff --git a/Documentation/device-mapper/dm-flakey.txt b/Documentation/device-mapper/dm-flakey.txt index c43030718cef..9f0e247d0877 100644 --- a/Documentation/device-mapper/dm-flakey.txt +++ b/Documentation/device-mapper/dm-flakey.txt @@ -33,6 +33,10 @@ Optional feature parameters: All write I/O is silently ignored. Read I/O is handled correctly. + error_writes: + All write I/O is failed with an error signalled. + Read I/O is handled correctly. + corrupt_bio_byte <Nth_byte> <direction> <value> <flags>: During <down interval>, replace <Nth_byte> of the data of each matching bio with <value>. diff --git a/Documentation/device-mapper/log-writes.txt b/Documentation/device-mapper/log-writes.txt index f4ebcbaf50f3..b638d124be6a 100644 --- a/Documentation/device-mapper/log-writes.txt +++ b/Documentation/device-mapper/log-writes.txt @@ -38,7 +38,7 @@ inconsistent file system. Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as they complete as those requests will obviously bypass the device cache. -Any REQ_DISCARD requests are treated like WRITE requests. Otherwise we would +Any REQ_OP_DISCARD requests are treated like WRITE requests. Otherwise we would have all the DISCARD requests, and then the WRITE requests and then the FLUSH request. Consider the following example: diff --git a/Documentation/devicetree/00-INDEX b/Documentation/devicetree/00-INDEX deleted file mode 100644 index 8c4102c6a5e7..000000000000 --- a/Documentation/devicetree/00-INDEX +++ /dev/null @@ -1,12 +0,0 @@ -Documentation for device trees, a data structure by which bootloaders pass -hardware layout to Linux in a device-independent manner, simplifying hardware -probing. This subsystem is maintained by Grant Likely -<grant.likely@secretlab.ca> and has a mailing list at -https://lists.ozlabs.org/listinfo/devicetree-discuss - -00-INDEX - - this file -booting-without-of.txt - - Booting Linux without Open Firmware, describes history and format of device trees. -usage-model.txt - - How Linux uses DT and what DT aims to solve.
\ No newline at end of file diff --git a/Documentation/devicetree/bindings/ata/ahci-platform.txt b/Documentation/devicetree/bindings/ata/ahci-platform.txt index 5d5bd456d9d9..e30fd106df4f 100644 --- a/Documentation/devicetree/bindings/ata/ahci-platform.txt +++ b/Documentation/devicetree/bindings/ata/ahci-platform.txt @@ -10,6 +10,7 @@ PHYs. Required properties: - compatible : compatible string, one of: - "allwinner,sun4i-a10-ahci" + - "allwinner,sun8i-r40-ahci" - "brcm,iproc-ahci" - "hisilicon,hisi-ahci" - "cavium,octeon-7130-ahci" @@ -31,8 +32,10 @@ Optional properties: - clocks : a list of phandle + clock specifier pairs - resets : a list of phandle + reset specifier pairs - target-supply : regulator for SATA target power +- phy-supply : regulator for PHY power - phys : reference to the SATA PHY node - phy-names : must be "sata-phy" +- ahci-supply : regulator for AHCI controller - ports-implemented : Mask that indicates which ports that the HBA supports are available for software to use. Useful if PORTS_IMPL is not programmed by the BIOS, which is true with @@ -42,12 +45,13 @@ Required properties when using sub-nodes: - #address-cells : number of cells to encode an address - #size-cells : number of cells representing the size of an address +For allwinner,sun8i-r40-ahci, the reset propertie must be present. Sub-nodes required properties: - reg : the port number And at least one of the following properties: - phys : reference to the SATA PHY node -- target-supply : regulator for SATA target power +- target-supply : regulator for SATA target power Examples: sata@ffe08000 { diff --git a/Documentation/devicetree/bindings/ata/brcm,sata-brcm.txt b/Documentation/devicetree/bindings/ata/brcm,sata-brcm.txt index 0a5b3b47f217..7713a413c6a7 100644 --- a/Documentation/devicetree/bindings/ata/brcm,sata-brcm.txt +++ b/Documentation/devicetree/bindings/ata/brcm,sata-brcm.txt @@ -9,6 +9,7 @@ Required properties: "brcm,bcm7445-ahci" "brcm,bcm-nsp-ahci" "brcm,sata3-ahci" + "brcm,bcm63138-ahci" - reg : register mappings for AHCI and SATA_TOP_CTRL - reg-names : "ahci" and "top-ctrl" - interrupts : interrupt mapping for SATA IRQ diff --git a/Documentation/devicetree/bindings/dma/jz4780-dma.txt b/Documentation/devicetree/bindings/dma/jz4780-dma.txt index 03e9cf7b42e0..636fcb26b164 100644 --- a/Documentation/devicetree/bindings/dma/jz4780-dma.txt +++ b/Documentation/devicetree/bindings/dma/jz4780-dma.txt @@ -2,8 +2,13 @@ Required properties: -- compatible: Should be "ingenic,jz4780-dma" -- reg: Should contain the DMA controller registers location and length. +- compatible: Should be one of: + * ingenic,jz4740-dma + * ingenic,jz4725b-dma + * ingenic,jz4770-dma + * ingenic,jz4780-dma +- reg: Should contain the DMA channel registers location and length, followed + by the DMA controller registers location and length. - interrupts: Should contain the interrupt specifier of the DMA controller. - clocks: Should contain a clock specifier for the JZ4780 PDMA clock. - #dma-cells: Must be <2>. Number of integer cells in the dmas property of @@ -19,9 +24,10 @@ Optional properties: Example: -dma: dma@13420000 { +dma: dma-controller@13420000 { compatible = "ingenic,jz4780-dma"; - reg = <0x13420000 0x10000>; + reg = <0x13420000 0x400 + 0x13421000 0x40>; interrupt-parent = <&intc>; interrupts = <10>; diff --git a/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.txt b/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.txt index 946229c48657..a5a7c3f5a1e3 100644 --- a/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.txt +++ b/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.txt @@ -17,6 +17,7 @@ Required Properties: - compatible: "renesas,dmac-<soctype>", "renesas,rcar-dmac" as fallback. Examples with soctypes are: - "renesas,dmac-r8a7743" (RZ/G1M) + - "renesas,dmac-r8a7744" (RZ/G1N) - "renesas,dmac-r8a7745" (RZ/G1E) - "renesas,dmac-r8a77470" (RZ/G1C) - "renesas,dmac-r8a7790" (R-Car H2) diff --git a/Documentation/devicetree/bindings/dma/renesas,usb-dmac.txt b/Documentation/devicetree/bindings/dma/renesas,usb-dmac.txt index 482e54362d3e..1743017bd948 100644 --- a/Documentation/devicetree/bindings/dma/renesas,usb-dmac.txt +++ b/Documentation/devicetree/bindings/dma/renesas,usb-dmac.txt @@ -4,6 +4,7 @@ Required Properties: -compatible: "renesas,<soctype>-usb-dmac", "renesas,usb-dmac" as fallback. Examples with soctypes are: - "renesas,r8a7743-usb-dmac" (RZ/G1M) + - "renesas,r8a7744-usb-dmac" (RZ/G1N) - "renesas,r8a7745-usb-dmac" (RZ/G1E) - "renesas,r8a7790-usb-dmac" (R-Car H2) - "renesas,r8a7791-usb-dmac" (R-Car M2-W) diff --git a/Documentation/devicetree/bindings/gpio/gpio.txt b/Documentation/devicetree/bindings/gpio/gpio.txt index a7c31de29362..f0ba154b5723 100644 --- a/Documentation/devicetree/bindings/gpio/gpio.txt +++ b/Documentation/devicetree/bindings/gpio/gpio.txt @@ -1,18 +1,9 @@ Specifying GPIO information for devices -============================================ +======================================= 1) gpios property ----------------- -Nodes that makes use of GPIOs should specify them using one or more -properties, each containing a 'gpio-list': - - gpio-list ::= <single-gpio> [gpio-list] - single-gpio ::= <gpio-phandle> <gpio-specifier> - gpio-phandle : phandle to gpio controller node - gpio-specifier : Array of #gpio-cells specifying specific gpio - (controller specific) - GPIO properties should be named "[<name>-]gpios", with <name> being the purpose of this GPIO for the device. While a non-existent <name> is considered valid for compatibility reasons (resolving to the "gpios" property), it is not allowed @@ -33,33 +24,27 @@ The following example could be used to describe GPIO pins used as device enable and bit-banged data signals: gpio1: gpio1 { - gpio-controller - #gpio-cells = <2>; - }; - gpio2: gpio2 { - gpio-controller - #gpio-cells = <1>; + gpio-controller; + #gpio-cells = <2>; }; [...] - enable-gpios = <&gpio2 2>; data-gpios = <&gpio1 12 0>, <&gpio1 13 0>, <&gpio1 14 0>, <&gpio1 15 0>; -Note that gpio-specifier length is controller dependent. In the -above example, &gpio1 uses 2 cells to specify a gpio, while &gpio2 -only uses one. +In the above example, &gpio1 uses 2 cells to specify a gpio. The first cell is +a local offset to the GPIO line and the second cell represent consumer flags, +such as if the consumer desire the line to be active low (inverted) or open +drain. This is the recommended practice. -gpio-specifier may encode: bank, pin position inside the bank, -whether pin is open-drain and whether pin is logically inverted. +The exact meaning of each specifier cell is controller specific, and must be +documented in the device tree binding for the device, but it is strongly +recommended to use the two-cell approach. -Exact meaning of each specifier cell is controller specific, and must -be documented in the device tree binding for the device. - -Most controllers are however specifying a generic flag bitfield -in the last cell, so for these, use the macros defined in +Most controllers are specifying a generic flag bitfield in the last cell, so +for these, use the macros defined in include/dt-bindings/gpio/gpio.h whenever possible: Example of a node using GPIOs: @@ -236,46 +221,40 @@ Example of two SOC GPIO banks defined as gpio-controller nodes: Some or all of the GPIOs provided by a GPIO controller may be routed to pins on the package via a pin controller. This allows muxing those pins between -GPIO and other functions. +GPIO and other functions. It is a fairly common practice among silicon +engineers. + +2.2) Ordinary (numerical) GPIO ranges +------------------------------------- It is useful to represent which GPIOs correspond to which pins on which pin -controllers. The gpio-ranges property described below represents this, and -contains information structures as follows: - - gpio-range-list ::= <single-gpio-range> [gpio-range-list] - single-gpio-range ::= <numeric-gpio-range> | <named-gpio-range> - numeric-gpio-range ::= - <pinctrl-phandle> <gpio-base> <pinctrl-base> <count> - named-gpio-range ::= <pinctrl-phandle> <gpio-base> '<0 0>' - pinctrl-phandle : phandle to pin controller node - gpio-base : Base GPIO ID in the GPIO controller - pinctrl-base : Base pinctrl pin ID in the pin controller - count : The number of GPIOs/pins in this range - -The "pin controller node" mentioned above must conform to the bindings -described in ../pinctrl/pinctrl-bindings.txt. - -In case named gpio ranges are used (ranges with both <pinctrl-base> and -<count> set to 0), the property gpio-ranges-group-names contains one string -for every single-gpio-range in gpio-ranges: - gpiorange-names-list ::= <gpiorange-name> [gpiorange-names-list] - gpiorange-name : Name of the pingroup associated to the GPIO range in - the respective pin controller. - -Elements of gpiorange-names-list corresponding to numeric ranges contain -the empty string. Elements of gpiorange-names-list corresponding to named -ranges contain the name of a pin group defined in the respective pin -controller. The number of pins/GPIOs in the range is the number of pins in -that pin group. +controllers. The gpio-ranges property described below represents this with +a discrete set of ranges mapping pins from the pin controller local number space +to pins in the GPIO controller local number space. -Previous versions of this binding required all pin controller nodes that -were referenced by any gpio-ranges property to contain a property named -#gpio-range-cells with value <3>. This requirement is now deprecated. -However, that property may still exist in older device trees for -compatibility reasons, and would still be required even in new device -trees that need to be compatible with older software. +The format is: <[pin controller phandle], [GPIO controller offset], + [pin controller offset], [number of pins]>; + +The GPIO controller offset pertains to the GPIO controller node containing the +range definition. + +The pin controller node referenced by the phandle must conform to the bindings +described in pinctrl/pinctrl-bindings.txt. + +Each offset runs from 0 to N. It is perfectly fine to pile any number of +ranges with just one pin-to-GPIO line mapping if the ranges are concocted, but +in practice these ranges are often lumped in discrete sets. + +Example: + + gpio-ranges = <&foo 0 20 10>, <&bar 10 50 20>; -Example 1: +This means: +- pins 20..29 on pin controller "foo" is mapped to GPIO line 0..9 and +- pins 50..69 on pin controller "bar" is mapped to GPIO line 10..29 + + +Verbose example: qe_pio_e: gpio-controller@1460 { #gpio-cells = <2>; @@ -289,7 +268,28 @@ Here, a single GPIO controller has GPIOs 0..9 routed to pin controller pinctrl1's pins 20..29, and GPIOs 10..29 routed to pin controller pinctrl2's pins 50..69. -Example 2: + +2.3) GPIO ranges from named pin groups +-------------------------------------- + +It is also possible to use pin groups for gpio ranges when pin groups are the +easiest and most convenient mapping. + +Both both <pinctrl-base> and <count> must set to 0 when using named pin groups +names. + +The property gpio-ranges-group-names must contain exactly one string for each +range. + +Elements of gpio-ranges-group-names must contain the name of a pin group +defined in the respective pin controller. The number of pins/GPIO lines in the +range is the number of pins in that pin group. The number of pins of that +group is defined int the implementation and not in the device tree. + +If numerical and named pin groups are mixed, the string corresponding to a +numerical pin range in gpio-ranges-group-names must be empty. + +Example: gpio_pio_i: gpio-controller@14b0 { #gpio-cells = <2>; @@ -306,6 +306,14 @@ Example 2: "bar"; }; -Here, three GPIO ranges are defined wrt. two pin controllers. pinctrl1 GPIO -ranges are defined using pin numbers whereas the GPIO ranges wrt. pinctrl2 -are named "foo" and "bar". +Here, three GPIO ranges are defined referring to two pin controllers. + +pinctrl1 GPIO ranges are defined using pin numbers whereas the GPIO ranges +in pinctrl2 are defined using the pin groups named "foo" and "bar". + +Previous versions of this binding required all pin controller nodes that +were referenced by any gpio-ranges property to contain a property named +#gpio-range-cells with value <3>. This requirement is now deprecated. +However, that property may still exist in older device trees for +compatibility reasons, and would still be required even in new device +trees that need to be compatible with older software. diff --git a/Documentation/devicetree/bindings/gpio/ingenic,gpio.txt b/Documentation/devicetree/bindings/gpio/ingenic,gpio.txt deleted file mode 100644 index 7988aeb725f4..000000000000 --- a/Documentation/devicetree/bindings/gpio/ingenic,gpio.txt +++ /dev/null @@ -1,46 +0,0 @@ -Ingenic jz47xx GPIO controller - -That the Ingenic GPIO driver node must be a sub-node of the Ingenic pinctrl -driver node. - -Required properties: --------------------- - - - compatible: Must contain one of: - - "ingenic,jz4740-gpio" - - "ingenic,jz4770-gpio" - - "ingenic,jz4780-gpio" - - reg: The GPIO bank number. - - interrupt-controller: Marks the device node as an interrupt controller. - - interrupts: Interrupt specifier for the controllers interrupt. - - #interrupt-cells: Should be 2. Refer to - ../interrupt-controller/interrupts.txt for more details. - - gpio-controller: Marks the device node as a GPIO controller. - - #gpio-cells: Should be 2. The first cell is the GPIO number and the second - cell specifies GPIO flags, as defined in <dt-bindings/gpio/gpio.h>. Only the - GPIO_ACTIVE_HIGH and GPIO_ACTIVE_LOW flags are supported. - - gpio-ranges: Range of pins managed by the GPIO controller. Refer to - 'gpio.txt' in this directory for more details. - -Example: --------- - -&pinctrl { - #address-cells = <1>; - #size-cells = <0>; - - gpa: gpio@0 { - compatible = "ingenic,jz4740-gpio"; - reg = <0>; - - gpio-controller; - gpio-ranges = <&pinctrl 0 0 32>; - #gpio-cells = <2>; - - interrupt-controller; - #interrupt-cells = <2>; - - interrupt-parent = <&intc>; - interrupts = <28>; - }; -}; diff --git a/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt b/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt index 4018ee57a6af..2889bbcd7416 100644 --- a/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt +++ b/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt @@ -4,8 +4,10 @@ Required Properties: - compatible: should contain one or more of the following: - "renesas,gpio-r8a7743": for R8A7743 (RZ/G1M) compatible GPIO controller. + - "renesas,gpio-r8a7744": for R8A7744 (RZ/G1N) compatible GPIO controller. - "renesas,gpio-r8a7745": for R8A7745 (RZ/G1E) compatible GPIO controller. - "renesas,gpio-r8a77470": for R8A77470 (RZ/G1C) compatible GPIO controller. + - "renesas,gpio-r8a774a1": for R8A774A1 (RZ/G2M) compatible GPIO controller. - "renesas,gpio-r8a7778": for R8A7778 (R-Car M1) compatible GPIO controller. - "renesas,gpio-r8a7779": for R8A7779 (R-Car H1) compatible GPIO controller. - "renesas,gpio-r8a7790": for R8A7790 (R-Car H2) compatible GPIO controller. @@ -22,7 +24,7 @@ Required Properties: - "renesas,gpio-r8a77995": for R8A77995 (R-Car D3) compatible GPIO controller. - "renesas,rcar-gen1-gpio": for a generic R-Car Gen1 GPIO controller. - "renesas,rcar-gen2-gpio": for a generic R-Car Gen2 or RZ/G1 GPIO controller. - - "renesas,rcar-gen3-gpio": for a generic R-Car Gen3 GPIO controller. + - "renesas,rcar-gen3-gpio": for a generic R-Car Gen3 or RZ/G2 GPIO controller. - "renesas,gpio-rcar": deprecated. When compatible with the generic version nodes must list the @@ -38,7 +40,7 @@ Required Properties: - #gpio-cells: Should be 2. The first cell is the GPIO number and the second cell specifies GPIO flags, as defined in <dt-bindings/gpio/gpio.h>. Only the GPIO_ACTIVE_HIGH and GPIO_ACTIVE_LOW flags are supported. - - gpio-ranges: Range of pins managed by the GPIO controller. + - gpio-ranges: See gpio.txt. Optional properties: @@ -46,35 +48,44 @@ Optional properties: mandatory if the hardware implements a controllable functional clock for the GPIO instance. -Please refer to gpio.txt in this directory for details of gpio-ranges property -and the common GPIO bindings used by client devices. + - gpio-reserved-ranges: See gpio.txt. + +Please refer to gpio.txt in this directory for the common GPIO bindings used by +client devices. The GPIO controller also acts as an interrupt controller. It uses the default two cells specifier as described in Documentation/devicetree/bindings/ interrupt-controller/interrupts.txt. -Example: R8A7779 (R-Car H1) GPIO controller nodes +Example: R8A77470 (RZ/G1C) GPIO controller nodes - gpio0: gpio@ffc40000 { - compatible = "renesas,gpio-r8a7779", "renesas,rcar-gen1-gpio"; - reg = <0xffc40000 0x2c>; - interrupt-parent = <&gic>; - interrupts = <0 141 0x4>; - #gpio-cells = <2>; - gpio-controller; - gpio-ranges = <&pfc 0 0 32>; - interrupt-controller; - #interrupt-cells = <2>; - }; + gpio0: gpio@e6050000 { + compatible = "renesas,gpio-r8a77470", + "renesas,rcar-gen2-gpio"; + reg = <0 0xe6050000 0 0x50>; + interrupts = <GIC_SPI 4 IRQ_TYPE_LEVEL_HIGH>; + #gpio-cells = <2>; + gpio-controller; + gpio-ranges = <&pfc 0 0 23>; + #interrupt-cells = <2>; + interrupt-controller; + clocks = <&cpg CPG_MOD 912>; + power-domains = <&sysc R8A77470_PD_ALWAYS_ON>; + resets = <&cpg 912>; + }; ... - gpio6: gpio@ffc46000 { - compatible = "renesas,gpio-r8a7779", "renesas,rcar-gen1-gpio"; - reg = <0xffc46000 0x2c>; - interrupt-parent = <&gic>; - interrupts = <0 147 0x4>; - #gpio-cells = <2>; - gpio-controller; - gpio-ranges = <&pfc 0 192 9>; - interrupt-controller; - #interrupt-cells = <2>; - }; + gpio3: gpio@e6053000 { + compatible = "renesas,gpio-r8a77470", + "renesas,rcar-gen2-gpio"; + reg = <0 0xe6053000 0 0x50>; + interrupts = <GIC_SPI 7 IRQ_TYPE_LEVEL_HIGH>; + #gpio-cells = <2>; + gpio-controller; + gpio-ranges = <&pfc 0 96 30>; + gpio-reserved-ranges = <17 10>; + #interrupt-cells = <2>; + interrupt-controller; + clocks = <&cpg CPG_MOD 909>; + power-domains = <&sysc R8A77470_PD_ALWAYS_ON>; + resets = <&cpg 909>; + }; diff --git a/Documentation/devicetree/bindings/gpio/snps,creg-gpio.txt b/Documentation/devicetree/bindings/gpio/snps,creg-gpio.txt new file mode 100644 index 000000000000..1b30812b015b --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/snps,creg-gpio.txt @@ -0,0 +1,21 @@ +Synopsys GPIO via CREG (Control REGisters) driver + +Required properties: +- compatible : "snps,creg-gpio-hsdk" or "snps,creg-gpio-axs10x". +- reg : Exactly one register range with length 0x4. +- #gpio-cells : Since the generic GPIO binding is used, the + amount of cells must be specified as 2. The first cell is the + pin number, the second cell is used to specify optional parameters: + See "gpio-specifier" in .../devicetree/bindings/gpio/gpio.txt. +- gpio-controller : Marks the device node as a GPIO controller. +- ngpios: Number of GPIO pins. + +Example: + +gpio: gpio@f00014b0 { + compatible = "snps,creg-gpio-hsdk"; + reg = <0xf00014b0 0x4>; + gpio-controller; + #gpio-cells = <2>; + ngpios = <2>; +}; diff --git a/Documentation/devicetree/bindings/hwmon/ina3221.txt b/Documentation/devicetree/bindings/hwmon/ina3221.txt new file mode 100644 index 000000000000..a7b25caa2b8e --- /dev/null +++ b/Documentation/devicetree/bindings/hwmon/ina3221.txt @@ -0,0 +1,44 @@ +Texas Instruments INA3221 Device Tree Bindings + +1) ina3221 node + Required properties: + - compatible: Must be "ti,ina3221" + - reg: I2C address + + Optional properties: + = The node contains optional child nodes for three channels = + = Each child node describes the information of input source = + + - #address-cells: Required only if a child node is present. Must be 1. + - #size-cells: Required only if a child node is present. Must be 0. + +2) child nodes + Required properties: + - reg: Must be 0, 1 or 2, corresponding to IN1, IN2 or IN3 port of INA3221 + + Optional properties: + - label: Name of the input source + - shunt-resistor-micro-ohms: Shunt resistor value in micro-Ohm + +Example: + +ina3221@40 { + compatible = "ti,ina3221"; + reg = <0x40>; + #address-cells = <1>; + #size-cells = <0>; + + input@0 { + reg = <0x0>; + status = "disabled"; + }; + input@1 { + reg = <0x1>; + shunt-resistor-micro-ohms = <5000>; + }; + input@2 { + reg = <0x2>; + label = "VDD_5V"; + shunt-resistor-micro-ohms = <5000>; + }; +}; diff --git a/Documentation/devicetree/bindings/hwmon/ltc2978.txt b/Documentation/devicetree/bindings/hwmon/ltc2978.txt index bf2a47bbdc58..b428a70a7cc0 100644 --- a/Documentation/devicetree/bindings/hwmon/ltc2978.txt +++ b/Documentation/devicetree/bindings/hwmon/ltc2978.txt @@ -15,6 +15,7 @@ Required properties: * "lltc,ltm2987" * "lltc,ltm4675" * "lltc,ltm4676" + * "lltc,ltm4686" - reg: I2C slave address Optional properties: @@ -30,6 +31,7 @@ Valid names of regulators depend on number of supplies supported per device: * ltc3880, ltc3882, ltc3886 : vout0 - vout1 * ltc3883 : vout0 * ltm4676 : vout0 - vout1 + * ltm4686 : vout0 - vout1 Example: ltc2978@5e { diff --git a/Documentation/devicetree/bindings/leds/leds-an30259a.txt b/Documentation/devicetree/bindings/leds/leds-an30259a.txt new file mode 100644 index 000000000000..6ffb861083c0 --- /dev/null +++ b/Documentation/devicetree/bindings/leds/leds-an30259a.txt @@ -0,0 +1,43 @@ +* Panasonic AN30259A 3-channel LED driver + +The AN30259A is a LED controller capable of driving three LEDs independently. It supports +constant current output and sloping current output modes. The chip is connected over I2C. + +Required properties: + - compatible: Must be "panasonic,an30259a". + - reg: I2C slave address. + - #address-cells: Must be 1. + - #size-cells: Must be 0. + +Each LED is represented as a sub-node of the panasonic,an30259a node. + +Required sub-node properties: + - reg: Pin that the LED is connected to. Must be 1, 2, or 3. + +Optional sub-node properties: + - label: see Documentation/devicetree/bindings/leds/common.txt + - linux,default-trigger: see Documentation/devicetree/bindings/leds/common.txt + +Example: +led-controller@30 { + compatible = "panasonic,an30259a"; + reg = <0x30>; + #address-cells = <1>; + #size-cells = <0>; + + led@1 { + reg = <1>; + linux,default-trigger = "heartbeat"; + label = "red:indicator"; + }; + + led@2 { + reg = <2>; + label = "green:indicator"; + }; + + led@3 { + reg = <3>; + label = "blue:indicator"; + }; +}; diff --git a/Documentation/devicetree/bindings/serial/atmel-usart.txt b/Documentation/devicetree/bindings/mfd/atmel-usart.txt index 7c0d6b2f53e4..7f0cd72f47d2 100644 --- a/Documentation/devicetree/bindings/serial/atmel-usart.txt +++ b/Documentation/devicetree/bindings/mfd/atmel-usart.txt @@ -1,6 +1,6 @@ * Atmel Universal Synchronous Asynchronous Receiver/Transmitter (USART) -Required properties: +Required properties for USART: - compatible: Should be "atmel,<chip>-usart" or "atmel,<chip>-dbgu" The compatible <chip> indicated will be the first SoC to support an additional mode or an USART new feature. @@ -11,7 +11,13 @@ Required properties: Required elements: "usart" - clocks: phandles to input clocks. -Optional properties: +Required properties for USART in SPI mode: +- #size-cells : Must be <0> +- #address-cells : Must be <1> +- cs-gpios: chipselects (internal cs not supported) +- atmel,usart-mode : Must be <AT91_USART_MODE_SPI> (found in dt-bindings/mfd/at91-usart.h) + +Optional properties in serial mode: - atmel,use-dma-rx: use of PDC or DMA for receiving data - atmel,use-dma-tx: use of PDC or DMA for transmitting data - {rts,cts,dtr,dsr,rng,dcd}-gpios: specify a GPIO for RTS/CTS/DTR/DSR/RI/DCD line respectively. @@ -62,3 +68,18 @@ Example: dma-names = "tx", "rx"; atmel,fifo-size = <32>; }; + +- SPI mode: + #include <dt-bindings/mfd/at91-usart.h> + + spi0: spi@f001c000 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "atmel,at91rm9200-usart", "atmel,at91sam9260-usart"; + atmel,usart-mode = <AT91_USART_MODE_SPI>; + reg = <0xf001c000 0x100>; + interrupts = <12 IRQ_TYPE_LEVEL_HIGH 5>; + clocks = <&usart0_clk>; + clock-names = "usart"; + cs-gpios = <&pioB 3 0>; + }; diff --git a/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt b/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt index 3ca56fdb5ffe..a4b056761eaa 100644 --- a/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt +++ b/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt @@ -1,16 +1,17 @@ -* ROHM BD71837 Power Management Integrated Circuit bindings +* ROHM BD71837 and BD71847 Power Management Integrated Circuit bindings -BD71837MWV is a programmable Power Management IC for powering single-core, -dual-core, and quad-core SoCs such as NXP-i.MX 8M. It is optimized for -low BOM cost and compact solution footprint. It integrates 8 Buck -egulators and 7 LDOs to provide all the power rails required by the SoC and -the commonly used peripherals. +BD71837MWV and BD71847MWV are programmable Power Management ICs for powering +single-core, dual-core, and quad-core SoCs such as NXP-i.MX 8M. They are +optimized for low BOM cost and compact solution footprint. BD71837MWV +integrates 8 Buck regulators and 7 LDOs. BD71847MWV contains 6 Buck regulators +and 6 LDOs. -Datasheet for PMIC is available at: +Datasheet for BD71837 is available at: https://www.rohm.com/datasheet/BD71837MWV/bd71837mwv-e Required properties: - - compatible : Should be "rohm,bd71837". + - compatible : Should be "rohm,bd71837" for bd71837 + "rohm,bd71847" for bd71847. - reg : I2C slave address. - interrupt-parent : Phandle to the parent interrupt controller. - interrupts : The interrupt line the device is connected to. diff --git a/Documentation/devicetree/bindings/mips/mscc.txt b/Documentation/devicetree/bindings/mips/mscc.txt index ae15ec333542..bc817e984628 100644 --- a/Documentation/devicetree/bindings/mips/mscc.txt +++ b/Documentation/devicetree/bindings/mips/mscc.txt @@ -41,3 +41,19 @@ Example: compatible = "mscc,ocelot-cpu-syscon", "syscon"; reg = <0x70000000 0x2c>; }; + +o HSIO regs: + +The SoC has a few registers (HSIO) handling miscellaneous functionalities: +configuration and status of PLL5, RCOMP, SyncE, SerDes configurations and +status, SerDes muxing and a thermal sensor. + +Required properties: +- compatible: Should be "mscc,ocelot-hsio", "syscon", "simple-mfd" +- reg : Should contain registers location and length + +Example: + syscon@10d0000 { + compatible = "mscc,ocelot-hsio", "syscon", "simple-mfd"; + reg = <0x10d0000 0x10000>; + }; diff --git a/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt b/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt index f6ddba31cb73..e2effe17f05e 100644 --- a/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt +++ b/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt @@ -15,6 +15,7 @@ Required Properties: - "arasan,sdhci-5.1": generic Arasan SDHCI 5.1 PHY - "rockchip,rk3399-sdhci-5.1", "arasan,sdhci-5.1": rk3399 eMMC PHY For this device it is strongly suggested to include arasan,soc-ctl-syscon. + - "ti,am654-sdhci-5.1", "arasan,sdhci-5.1": TI AM654 MMC PHY - reg: From mmc bindings: Register location and length. - clocks: From clock bindings: Handles to clock inputs. - clock-names: From clock bindings: Tuple including "clk_xin" and "clk_ahb" diff --git a/Documentation/devicetree/bindings/mmc/jz4740.txt b/Documentation/devicetree/bindings/mmc/jz4740.txt index 7cd8c432d7c8..8a6f87f13114 100644 --- a/Documentation/devicetree/bindings/mmc/jz4740.txt +++ b/Documentation/devicetree/bindings/mmc/jz4740.txt @@ -7,6 +7,7 @@ described in mmc.txt. Required properties: - compatible: Should be one of the following: - "ingenic,jz4740-mmc" for the JZ4740 + - "ingenic,jz4725b-mmc" for the JZ4725B - "ingenic,jz4780-mmc" for the JZ4780 - reg: Should contain the MMC controller registers location and length. - interrupts: Should contain the interrupt specifier of the MMC controller. diff --git a/Documentation/devicetree/bindings/mmc/mmci.txt b/Documentation/devicetree/bindings/mmc/mmci.txt index 03796cf2d3e7..6d3c626e017d 100644 --- a/Documentation/devicetree/bindings/mmc/mmci.txt +++ b/Documentation/devicetree/bindings/mmc/mmci.txt @@ -15,8 +15,11 @@ Required properties: Optional properties: - arm,primecell-periphid : contains the PrimeCell Peripheral ID, it overrides the ID provided by the HW +- resets : phandle to internal reset line. + Should be defined for sdmmc variant. - vqmmc-supply : phandle to the regulator device tree node, mentioned as the VCCQ/VDD_IO supply in the eMMC/SD specs. +specific for ux500 variant: - st,sig-dir-dat0 : bus signal direction pin used for DAT[0]. - st,sig-dir-dat2 : bus signal direction pin used for DAT[2]. - st,sig-dir-dat31 : bus signal direction pin used for DAT[3] and DAT[1]. @@ -24,6 +27,14 @@ Optional properties: - st,sig-dir-cmd : cmd signal direction pin used for CMD. - st,sig-pin-fbclk : feedback clock signal pin used. +specific for sdmmc variant: +- st,sig-dir : signal direction polarity used for cmd, dat0 dat123. +- st,neg-edge : data & command phase relation, generated on + sd clock falling edge. +- st,use-ckin : use ckin pin from an external driver to sample + the receive data (example: with voltage + switch transceiver). + Deprecated properties: - mmc-cap-mmc-highspeed : indicates whether MMC is high speed capable. - mmc-cap-sd-highspeed : indicates whether SD is high speed capable. diff --git a/Documentation/devicetree/bindings/mmc/mtk-sd.txt b/Documentation/devicetree/bindings/mmc/mtk-sd.txt index f33467a54a05..f5bcda3980cc 100644 --- a/Documentation/devicetree/bindings/mmc/mtk-sd.txt +++ b/Documentation/devicetree/bindings/mmc/mtk-sd.txt @@ -10,6 +10,7 @@ Required properties: - compatible: value should be either of the following. "mediatek,mt8135-mmc": for mmc host ip compatible with mt8135 "mediatek,mt8173-mmc": for mmc host ip compatible with mt8173 + "mediatek,mt8183-mmc": for mmc host ip compatible with mt8183 "mediatek,mt2701-mmc": for mmc host ip compatible with mt2701 "mediatek,mt2712-mmc": for mmc host ip compatible with mt2712 "mediatek,mt7622-mmc": for MT7622 SoC @@ -22,6 +23,7 @@ Required properties: "source" - source clock (required) "hclk" - HCLK which used for host (required) "source_cg" - independent source clock gate (required for MT2712) + "bus_clk" - bus clock used for internal register access (required for MT2712 MSDC0/3) - pinctrl-names: should be "default", "state_uhs" - pinctrl-0: should contain default/high speed pin ctrl - pinctrl-1: should contain uhs mode pin ctrl diff --git a/Documentation/devicetree/bindings/mmc/nvidia,tegra20-sdhci.txt b/Documentation/devicetree/bindings/mmc/nvidia,tegra20-sdhci.txt index 9bce57862ed6..32b4b4e41923 100644 --- a/Documentation/devicetree/bindings/mmc/nvidia,tegra20-sdhci.txt +++ b/Documentation/devicetree/bindings/mmc/nvidia,tegra20-sdhci.txt @@ -38,3 +38,75 @@ sdhci@c8000200 { power-gpios = <&gpio 155 0>; /* gpio PT3 */ bus-width = <8>; }; + +Optional properties for Tegra210 and Tegra186: +- pinctrl-names, pinctrl-0, pinctrl-1 : Specify pad voltage + configurations. Valid pinctrl-names are "sdmmc-3v3" and "sdmmc-1v8" + for controllers supporting multiple voltage levels. The order of names + should correspond to the pin configuration states in pinctrl-0 and + pinctrl-1. +- nvidia,only-1-8-v : The presence of this property indicates that the + controller operates at a 1.8 V fixed I/O voltage. +- nvidia,pad-autocal-pull-up-offset-3v3, + nvidia,pad-autocal-pull-down-offset-3v3 : Specify drive strength + calibration offsets for 3.3 V signaling modes. +- nvidia,pad-autocal-pull-up-offset-1v8, + nvidia,pad-autocal-pull-down-offset-1v8 : Specify drive strength + calibration offsets for 1.8 V signaling modes. +- nvidia,pad-autocal-pull-up-offset-3v3-timeout, + nvidia,pad-autocal-pull-down-offset-3v3-timeout : Specify drive + strength used as a fallback in case the automatic calibration times + out on a 3.3 V signaling mode. +- nvidia,pad-autocal-pull-up-offset-1v8-timeout, + nvidia,pad-autocal-pull-down-offset-1v8-timeout : Specify drive + strength used as a fallback in case the automatic calibration times + out on a 1.8 V signaling mode. +- nvidia,pad-autocal-pull-up-offset-sdr104, + nvidia,pad-autocal-pull-down-offset-sdr104 : Specify drive strength + calibration offsets for SDR104 mode. +- nvidia,pad-autocal-pull-up-offset-hs400, + nvidia,pad-autocal-pull-down-offset-hs400 : Specify drive strength + calibration offsets for HS400 mode. +- nvidia,default-tap : Specify the default inbound sampling clock + trimmer value for non-tunable modes. +- nvidia,default-trim : Specify the default outbound clock trimmer + value. +- nvidia,dqs-trim : Specify DQS trim value for HS400 timing + + Notes on the pad calibration pull up and pulldown offset values: + - The property values are drive codes which are programmed into the + PD_OFFSET and PU_OFFSET sections of the + SDHCI_TEGRA_AUTO_CAL_CONFIG register. + - A higher value corresponds to higher drive strength. Please refer + to the reference manual of the SoC for correct values. + - The SDR104 and HS400 timing specific values are used in + corresponding modes if specified. + + Notes on tap and trim values: + - The values are used for compensating trace length differences + by adjusting the sampling point. + - The values are programmed to the Vendor Clock Control Register. + Please refer to the reference manual of the SoC for correct + values. + - The DQS trim values are only used on controllers which support + HS400 timing. Only SDMMC4 on Tegra210 and Tegra 186 supports + HS400. + +Example: +sdhci@700b0000 { + compatible = "nvidia,tegra210-sdhci", "nvidia,tegra124-sdhci"; + reg = <0x0 0x700b0000 0x0 0x200>; + interrupts = <GIC_SPI 14 IRQ_TYPE_LEVEL_HIGH>; + clocks = <&tegra_car TEGRA210_CLK_SDMMC1>; + clock-names = "sdhci"; + resets = <&tegra_car 14>; + reset-names = "sdhci"; + pinctrl-names = "sdmmc-3v3", "sdmmc-1v8"; + pinctrl-0 = <&sdmmc1_3v3>; + pinctrl-1 = <&sdmmc1_1v8>; + nvidia,pad-autocal-pull-up-offset-3v3 = <0x00>; + nvidia,pad-autocal-pull-down-offset-3v3 = <0x7d>; + nvidia,pad-autocal-pull-up-offset-1v8 = <0x7b>; + nvidia,pad-autocal-pull-down-offset-1v8 = <0x7b>; + status = "disabled"; +}; diff --git a/Documentation/devicetree/bindings/mmc/renesas,mmcif.txt b/Documentation/devicetree/bindings/mmc/renesas,mmcif.txt index 5ff1e12c655a..c064af5838aa 100644 --- a/Documentation/devicetree/bindings/mmc/renesas,mmcif.txt +++ b/Documentation/devicetree/bindings/mmc/renesas,mmcif.txt @@ -12,6 +12,7 @@ Required properties: - "renesas,mmcif-r8a73a4" for the MMCIF found in r8a73a4 SoCs - "renesas,mmcif-r8a7740" for the MMCIF found in r8a7740 SoCs - "renesas,mmcif-r8a7743" for the MMCIF found in r8a7743 SoCs + - "renesas,mmcif-r8a7744" for the MMCIF found in r8a7744 SoCs - "renesas,mmcif-r8a7745" for the MMCIF found in r8a7745 SoCs - "renesas,mmcif-r8a7778" for the MMCIF found in r8a7778 SoCs - "renesas,mmcif-r8a7790" for the MMCIF found in r8a7790 SoCs @@ -23,7 +24,8 @@ Required properties: - interrupts: Some SoCs have only 1 shared interrupt, while others have either 2 or 3 individual interrupts (error, int, card detect). Below is the number of interrupts for each SoC: - 1: r8a73a4, r8a7743, r8a7745, r8a7778, r8a7790, r8a7791, r8a7793, r8a7794 + 1: r8a73a4, r8a7743, r8a7744, r8a7745, r8a7778, r8a7790, r8a7791, r8a7793, + r8a7794 2: r8a7740, sh73a0 3: r7s72100 diff --git a/Documentation/devicetree/bindings/mmc/sdhci-sprd.txt b/Documentation/devicetree/bindings/mmc/sdhci-sprd.txt new file mode 100644 index 000000000000..45c9978aad7b --- /dev/null +++ b/Documentation/devicetree/bindings/mmc/sdhci-sprd.txt @@ -0,0 +1,41 @@ +* Spreadtrum SDHCI controller (sdhci-sprd) + +The Secure Digital (SD) Host controller on Spreadtrum SoCs provides an interface +for MMC, SD and SDIO types of cards. + +This file documents differences between the core properties in mmc.txt +and the properties used by the sdhci-sprd driver. + +Required properties: +- compatible: Should contain "sprd,sdhci-r11". +- reg: physical base address of the controller and length. +- interrupts: Interrupts used by the SDHCI controller. +- clocks: Should contain phandle for the clock feeding the SDHCI controller +- clock-names: Should contain the following: + "sdio" - SDIO source clock (required) + "enable" - gate clock which used for enabling/disabling the device (required) + +Optional properties: +- assigned-clocks: the same with "sdio" clock +- assigned-clock-parents: the default parent of "sdio" clock + +Examples: + +sdio0: sdio@20600000 { + compatible = "sprd,sdhci-r11"; + reg = <0 0x20600000 0 0x1000>; + interrupts = <GIC_SPI 60 IRQ_TYPE_LEVEL_HIGH>; + + clock-names = "sdio", "enable"; + clocks = <&ap_clk CLK_EMMC_2X>, + <&apahb_gate CLK_EMMC_EB>; + assigned-clocks = <&ap_clk CLK_EMMC_2X>; + assigned-clock-parents = <&rpll CLK_RPLL_390M>; + + bus-width = <8>; + non-removable; + no-sdio; + no-sd; + cap-mmc-hw-reset; + status = "okay"; +}; diff --git a/Documentation/devicetree/bindings/mmc/tmio_mmc.txt b/Documentation/devicetree/bindings/mmc/tmio_mmc.txt index c434200d19d5..27f2eab2981d 100644 --- a/Documentation/devicetree/bindings/mmc/tmio_mmc.txt +++ b/Documentation/devicetree/bindings/mmc/tmio_mmc.txt @@ -16,7 +16,11 @@ Required properties: "renesas,sdhi-r8a73a4" - SDHI IP on R8A73A4 SoC "renesas,sdhi-r8a7740" - SDHI IP on R8A7740 SoC "renesas,sdhi-r8a7743" - SDHI IP on R8A7743 SoC + "renesas,sdhi-r8a7744" - SDHI IP on R8A7744 SoC "renesas,sdhi-r8a7745" - SDHI IP on R8A7745 SoC + "renesas,sdhi-r8a774a1" - SDHI IP on R8A774A1 SoC + "renesas,sdhi-r8a77470" - SDHI IP on R8A77470 SoC + "renesas,sdhi-mmc-r8a77470" - SDHI/MMC IP on R8A77470 SoC "renesas,sdhi-r8a7778" - SDHI IP on R8A7778 SoC "renesas,sdhi-r8a7779" - SDHI IP on R8A7779 SoC "renesas,sdhi-r8a7790" - SDHI IP on R8A7790 SoC @@ -27,14 +31,16 @@ Required properties: "renesas,sdhi-r8a7795" - SDHI IP on R8A7795 SoC "renesas,sdhi-r8a7796" - SDHI IP on R8A7796 SoC "renesas,sdhi-r8a77965" - SDHI IP on R8A77965 SoC + "renesas,sdhi-r8a77970" - SDHI IP on R8A77970 SoC "renesas,sdhi-r8a77980" - SDHI IP on R8A77980 SoC "renesas,sdhi-r8a77990" - SDHI IP on R8A77990 SoC "renesas,sdhi-r8a77995" - SDHI IP on R8A77995 SoC "renesas,sdhi-shmobile" - a generic sh-mobile SDHI controller "renesas,rcar-gen1-sdhi" - a generic R-Car Gen1 SDHI controller - "renesas,rcar-gen2-sdhi" - a generic R-Car Gen2 or RZ/G1 + "renesas,rcar-gen2-sdhi" - a generic R-Car Gen2 and RZ/G1 SDHI + (not SDHI/MMC) controller + "renesas,rcar-gen3-sdhi" - a generic R-Car Gen3 or RZ/G2 SDHI controller - "renesas,rcar-gen3-sdhi" - a generic R-Car Gen3 SDHI controller When compatible with the generic version, nodes must list diff --git a/Documentation/devicetree/bindings/mmc/uniphier-sd.txt b/Documentation/devicetree/bindings/mmc/uniphier-sd.txt new file mode 100644 index 000000000000..e1d658755722 --- /dev/null +++ b/Documentation/devicetree/bindings/mmc/uniphier-sd.txt @@ -0,0 +1,55 @@ +UniPhier SD/eMMC controller + +Required properties: +- compatible: should be one of the following: + "socionext,uniphier-sd-v2.91" - IP version 2.91 + "socionext,uniphier-sd-v3.1" - IP version 3.1 + "socionext,uniphier-sd-v3.1.1" - IP version 3.1.1 +- reg: offset and length of the register set for the device. +- interrupts: a single interrupt specifier. +- clocks: a single clock specifier of the controller clock. +- reset-names: should contain the following: + "host" - mandatory for all versions + "bridge" - should exist only for "socionext,uniphier-sd-v2.91" + "hw" - should exist if eMMC hw reset line is available +- resets: a list of reset specifiers, corresponding to the reset-names + +Optional properties: +- pinctrl-names: if present, should contain the following: + "default" - should exist for all instances + "uhs" - should exist for SD instance with UHS support +- pinctrl-0: pin control state for the default mode +- pinctrl-1: pin control state for the UHS mode +- dma-names: should be "rx-tx" if present. + This property can exist only for "socionext,uniphier-sd-v2.91". +- dmas: a single DMA channel specifier + This property can exist only for "socionext,uniphier-sd-v2.91". +- bus-width: see mmc.txt +- cap-sd-highspeed: see mmc.txt +- cap-mmc-highspeed: see mmc.txt +- sd-uhs-sdr12: see mmc.txt +- sd-uhs-sdr25: see mmc.txt +- sd-uhs-sdr50: see mmc.txt +- cap-mmc-hw-reset: should exist if reset-names contains "hw". see mmc.txt +- non-removable: see mmc.txt + +Example: + + sd: sdhc@5a400000 { + compatible = "socionext,uniphier-sd-v2.91"; + reg = <0x5a400000 0x200>; + interrupts = <0 76 4>; + pinctrl-names = "default", "uhs"; + pinctrl-0 = <&pinctrl_sd>; + pinctrl-1 = <&pinctrl_sd_uhs>; + clocks = <&mio_clk 0>; + reset-names = "host", "bridge"; + resets = <&mio_rst 0>, <&mio_rst 3>; + dma-names = "rx-tx"; + dmas = <&dmac 4>; + bus-width = <4>; + cap-sd-highspeed; + sd-uhs-sdr12; + sd-uhs-sdr25; + sd-uhs-sdr50; + }; diff --git a/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt b/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt index 4648948f7c3b..e15589f47787 100644 --- a/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt +++ b/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt @@ -19,6 +19,9 @@ Optional properties: - interrupt-names: must be "mdio_done_error" when there is a share interrupt fed to this hardware block, or must be "mdio_done" for the first interrupt and "mdio_error" for the second when there are separate interrupts +- clocks: A reference to the clock supplying the MDIO bus controller +- clock-frequency: the MDIO bus clock that must be output by the MDIO bus + hardware, if absent, the default hardware values are used Child nodes of this MDIO bus controller node are standard Ethernet PHY device nodes as described in Documentation/devicetree/bindings/net/phy.txt diff --git a/Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt b/Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt new file mode 100644 index 000000000000..886cbe8ffb38 --- /dev/null +++ b/Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt @@ -0,0 +1,143 @@ +Lantiq GSWIP Ethernet switches +================================== + +Required properties for GSWIP core: + +- compatible : "lantiq,xrx200-gswip" for the embedded GSWIP in the + xRX200 SoC +- reg : memory range of the GSWIP core registers + : memory range of the GSWIP MDIO registers + : memory range of the GSWIP MII registers + +See Documentation/devicetree/bindings/net/dsa/dsa.txt for a list of +additional required and optional properties. + + +Required properties for MDIO bus: +- compatible : "lantiq,xrx200-mdio" for the MDIO bus inside the GSWIP + core of the xRX200 SoC and the PHYs connected to it. + +See Documentation/devicetree/bindings/net/mdio.txt for a list of additional +required and optional properties. + + +Required properties for GPHY firmware loading: +- compatible : "lantiq,xrx200-gphy-fw", "lantiq,gphy-fw" + "lantiq,xrx300-gphy-fw", "lantiq,gphy-fw" + "lantiq,xrx330-gphy-fw", "lantiq,gphy-fw" + for the loading of the firmware into the embedded + GPHY core of the SoC. +- lantiq,rcu : reference to the rcu syscon + +The GPHY firmware loader has a list of GPHY entries, one for each +embedded GPHY + +- reg : Offset of the GPHY firmware register in the RCU + register range +- resets : list of resets of the embedded GPHY +- reset-names : list of names of the resets + +Example: + +Ethernet switch on the VRX200 SoC: + +switch@e108000 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "lantiq,xrx200-gswip"; + reg = < 0xe108000 0x3100 /* switch */ + 0xe10b100 0xd8 /* mdio */ + 0xe10b1d8 0x130 /* mii */ + >; + dsa,member = <0 0>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0>; + label = "lan3"; + phy-mode = "rgmii"; + phy-handle = <&phy0>; + }; + + port@1 { + reg = <1>; + label = "lan4"; + phy-mode = "rgmii"; + phy-handle = <&phy1>; + }; + + port@2 { + reg = <2>; + label = "lan2"; + phy-mode = "internal"; + phy-handle = <&phy11>; + }; + + port@4 { + reg = <4>; + label = "lan1"; + phy-mode = "internal"; + phy-handle = <&phy13>; + }; + + port@5 { + reg = <5>; + label = "wan"; + phy-mode = "rgmii"; + phy-handle = <&phy5>; + }; + + port@6 { + reg = <0x6>; + label = "cpu"; + ethernet = <ð0>; + }; + }; + + mdio { + #address-cells = <1>; + #size-cells = <0>; + compatible = "lantiq,xrx200-mdio"; + reg = <0>; + + phy0: ethernet-phy@0 { + reg = <0x0>; + }; + phy1: ethernet-phy@1 { + reg = <0x1>; + }; + phy5: ethernet-phy@5 { + reg = <0x5>; + }; + phy11: ethernet-phy@11 { + reg = <0x11>; + }; + phy13: ethernet-phy@13 { + reg = <0x13>; + }; + }; + + gphy-fw { + compatible = "lantiq,xrx200-gphy-fw", "lantiq,gphy-fw"; + lantiq,rcu = <&rcu0>; + #address-cells = <1>; + #size-cells = <0>; + + gphy@20 { + reg = <0x20>; + + resets = <&reset0 31 30>; + reset-names = "gphy"; + }; + + gphy@68 { + reg = <0x68>; + + resets = <&reset0 29 28>; + reset-names = "gphy"; + }; + }; +}; diff --git a/Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt b/Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt new file mode 100644 index 000000000000..5ff5e68bbbb6 --- /dev/null +++ b/Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt @@ -0,0 +1,21 @@ +Lantiq xRX200 GSWIP PMAC Ethernet driver +================================== + +Required properties: + +- compatible : "lantiq,xrx200-net" for the PMAC of the embedded + : GSWIP in the xXR200 +- reg : memory range of the PMAC core inside of the GSWIP core +- interrupts : TX and RX DMA interrupts. Use interrupt-names "tx" for + : the TX interrupt and "rx" for the RX interrupt. + +Example: + +ethernet@e10b308 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "lantiq,xrx200-net"; + reg = <0xe10b308 0xcf8>; + interrupts = <73>, <72>; + interrupt-names = "tx", "rx"; +}; diff --git a/Documentation/devicetree/bindings/net/marvell-pp2.txt b/Documentation/devicetree/bindings/net/marvell-pp2.txt index fc019df0d863..b78397669320 100644 --- a/Documentation/devicetree/bindings/net/marvell-pp2.txt +++ b/Documentation/devicetree/bindings/net/marvell-pp2.txt @@ -31,7 +31,7 @@ required. Required properties (port): -- interrupts: interrupt for the port +- interrupts: interrupt(s) for the port - port-id: ID of the port from the MAC point of view - gop-port-id: only for marvell,armada-7k-pp2, ID of the port from the GOP (Group Of Ports) point of view. This ID is used to index the @@ -43,10 +43,12 @@ Optional properties (port): - marvell,loopback: port is loopback mode - phy: a phandle to a phy node defining the PHY address (as the reg property, a single integer). -- interrupt-names: if more than a single interrupt for rx is given, must - be the name associated to the interrupts listed. Valid - names are: "tx-cpu0", "tx-cpu1", "tx-cpu2", "tx-cpu3", - "rx-shared", "link". +- interrupt-names: if more than a single interrupt for is given, must be the + name associated to the interrupts listed. Valid names are: + "hifX", with X in [0..8], and "link". The names "tx-cpu0", + "tx-cpu1", "tx-cpu2", "tx-cpu3" and "rx-shared" are supported + for backward compatibility but shouldn't be used for new + additions. - marvell,system-controller: a phandle to the system controller. Example for marvell,armada-375-pp2: @@ -89,9 +91,14 @@ cpm_ethernet: ethernet@0 { <ICU_GRP_NSR 43 IRQ_TYPE_LEVEL_HIGH>, <ICU_GRP_NSR 47 IRQ_TYPE_LEVEL_HIGH>, <ICU_GRP_NSR 51 IRQ_TYPE_LEVEL_HIGH>, - <ICU_GRP_NSR 55 IRQ_TYPE_LEVEL_HIGH>; - interrupt-names = "tx-cpu0", "tx-cpu1", "tx-cpu2", - "tx-cpu3", "rx-shared"; + <ICU_GRP_NSR 55 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 59 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 63 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 67 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 71 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 129 IRQ_TYPE_LEVEL_HIGH>; + interrupt-names = "hif0", "hif1", "hif2", "hif3", "hif4", + "hif5", "hif6", "hif7", "hif8", "link"; port-id = <0>; gop-port-id = <0>; }; @@ -101,9 +108,14 @@ cpm_ethernet: ethernet@0 { <ICU_GRP_NSR 44 IRQ_TYPE_LEVEL_HIGH>, <ICU_GRP_NSR 48 IRQ_TYPE_LEVEL_HIGH>, <ICU_GRP_NSR 52 IRQ_TYPE_LEVEL_HIGH>, - <ICU_GRP_NSR 56 IRQ_TYPE_LEVEL_HIGH>; - interrupt-names = "tx-cpu0", "tx-cpu1", "tx-cpu2", - "tx-cpu3", "rx-shared"; + <ICU_GRP_NSR 56 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 60 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 64 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 68 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 72 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 128 IRQ_TYPE_LEVEL_HIGH>; + interrupt-names = "hif0", "hif1", "hif2", "hif3", "hif4", + "hif5", "hif6", "hif7", "hif8", "link"; port-id = <1>; gop-port-id = <2>; }; @@ -113,9 +125,14 @@ cpm_ethernet: ethernet@0 { <ICU_GRP_NSR 45 IRQ_TYPE_LEVEL_HIGH>, <ICU_GRP_NSR 49 IRQ_TYPE_LEVEL_HIGH>, <ICU_GRP_NSR 53 IRQ_TYPE_LEVEL_HIGH>, - <ICU_GRP_NSR 57 IRQ_TYPE_LEVEL_HIGH>; - interrupt-names = "tx-cpu0", "tx-cpu1", "tx-cpu2", - "tx-cpu3", "rx-shared"; + <ICU_GRP_NSR 57 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 61 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 65 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 69 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 73 IRQ_TYPE_LEVEL_HIGH>, + <ICU_GRP_NSR 127 IRQ_TYPE_LEVEL_HIGH>; + interrupt-names = "hif0", "hif1", "hif2", "hif3", "hif4", + "hif5", "hif6", "hif7", "hif8", "link"; port-id = <2>; gop-port-id = <3>; }; diff --git a/Documentation/devicetree/bindings/net/micrel-ksz90x1.txt b/Documentation/devicetree/bindings/net/micrel-ksz90x1.txt index e22d8cfea687..5100358177c9 100644 --- a/Documentation/devicetree/bindings/net/micrel-ksz90x1.txt +++ b/Documentation/devicetree/bindings/net/micrel-ksz90x1.txt @@ -1,4 +1,4 @@ -Micrel KSZ9021/KSZ9031 Gigabit Ethernet PHY +Micrel KSZ9021/KSZ9031/KSZ9131 Gigabit Ethernet PHY Some boards require special tuning values, particularly when it comes to clock delays. You can specify clock delay values in the PHY OF @@ -64,6 +64,32 @@ KSZ9031: Attention: The link partner must be configurable as slave otherwise no link will be established. +KSZ9131: + + All skew control options are specified in picoseconds. The increment + step is 100ps. Unlike KSZ9031, the values represent picoseccond delays. + A negative value can be assigned as rxc-skew-psec = <(-100)>;. + + Optional properties: + + Range of the value -700 to 2400, default value 0: + + - rxc-skew-psec : Skew control of RX clock pad + - txc-skew-psec : Skew control of TX clock pad + + Range of the value -700 to 800, default value 0: + + - rxdv-skew-psec : Skew control of RX CTL pad + - txen-skew-psec : Skew control of TX CTL pad + - rxd0-skew-psec : Skew control of RX data 0 pad + - rxd1-skew-psec : Skew control of RX data 1 pad + - rxd2-skew-psec : Skew control of RX data 2 pad + - rxd3-skew-psec : Skew control of RX data 3 pad + - txd0-skew-psec : Skew control of TX data 0 pad + - txd1-skew-psec : Skew control of TX data 1 pad + - txd2-skew-psec : Skew control of TX data 2 pad + - txd3-skew-psec : Skew control of TX data 3 pad + Examples: mdio { diff --git a/Documentation/devicetree/bindings/net/mscc-ocelot.txt b/Documentation/devicetree/bindings/net/mscc-ocelot.txt index 0a84711abece..9e5c17d426ce 100644 --- a/Documentation/devicetree/bindings/net/mscc-ocelot.txt +++ b/Documentation/devicetree/bindings/net/mscc-ocelot.txt @@ -12,7 +12,6 @@ Required properties: - "sys" - "rew" - "qs" - - "hsio" - "qsys" - "ana" - "portX" with X from 0 to the number of last port index available on that @@ -45,7 +44,6 @@ Example: reg = <0x1010000 0x10000>, <0x1030000 0x10000>, <0x1080000 0x100>, - <0x10d0000 0x10000>, <0x11e0000 0x100>, <0x11f0000 0x100>, <0x1200000 0x100>, @@ -59,10 +57,9 @@ Example: <0x1280000 0x100>, <0x1800000 0x80000>, <0x1880000 0x10000>; - reg-names = "sys", "rew", "qs", "hsio", "port0", - "port1", "port2", "port3", "port4", "port5", - "port6", "port7", "port8", "port9", "port10", - "qsys", "ana"; + reg-names = "sys", "rew", "qs", "port0", "port1", "port2", + "port3", "port4", "port5", "port6", "port7", + "port8", "port9", "port10", "qsys", "ana"; interrupts = <21 22>; interrupt-names = "xtr", "inj"; diff --git a/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt b/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt index 0eedabe22cc3..5ff37c68c941 100644 --- a/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt +++ b/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt @@ -1,10 +1,5 @@ * Microsemi - vsc8531 Giga bit ethernet phy -Required properties: -- compatible : Should contain phy id as "ethernet-phy-idAAAA.BBBB" - The PHY device uses the binding described in - Documentation/devicetree/bindings/net/phy.txt - Optional properties: - vsc8531,vddmac : The vddmac in mV. Allowed values is listed in the first row of Table 1 (below). @@ -27,14 +22,16 @@ Optional properties: 'vddmac'. Default value is 0%. Ref: Table:1 - Edge rate change (below). -- vsc8531,led-0-mode : LED mode. Specify how the LED[0] should behave. - Allowed values are define in - "include/dt-bindings/net/mscc-phy-vsc8531.h". - Default value is VSC8531_LINK_1000_ACTIVITY (1). -- vsc8531,led-1-mode : LED mode. Specify how the LED[1] should behave. - Allowed values are define in +- vsc8531,led-[N]-mode : LED mode. Specify how the LED[N] should behave. + N depends on the number of LEDs supported by a + PHY. + Allowed values are defined in "include/dt-bindings/net/mscc-phy-vsc8531.h". - Default value is VSC8531_LINK_100_ACTIVITY (2). + Default values are VSC8531_LINK_1000_ACTIVITY (1), + VSC8531_LINK_100_ACTIVITY (2), + VSC8531_LINK_ACTIVITY (0) and + VSC8531_DUPLEX_COLLISION (8). + Table: 1 - Edge rate change ----------------------------------------------------------------| diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt b/Documentation/devicetree/bindings/net/renesas,ravb.txt index da249b7c406c..3530256a879c 100644 --- a/Documentation/devicetree/bindings/net/renesas,ravb.txt +++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt @@ -6,6 +6,7 @@ interface contains. Required properties: - compatible: Must contain one or more of the following: - "renesas,etheravb-r8a7743" for the R8A7743 SoC. + - "renesas,etheravb-r8a7744" for the R8A7744 SoC. - "renesas,etheravb-r8a7745" for the R8A7745 SoC. - "renesas,etheravb-r8a77470" for the R8A77470 SoC. - "renesas,etheravb-r8a7790" for the R8A7790 SoC. diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt index 7fd4e8ce4149..2196d1ab3c8c 100644 --- a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt +++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt @@ -56,6 +56,11 @@ Optional properties: the length can vary between hw versions. - <supply-name>-supply: handle to the regulator device tree node optional "supply-name" is "vdd-0.8-cx-mx". +- memory-region: + Usage: optional + Value type: <phandle> + Definition: reference to the reserved-memory for the msa region + used by the wifi firmware running in Q6. Example (to supply the calibration data alone): @@ -149,4 +154,5 @@ wifi@18000000 { <0 140 0 /* CE10 */ >, <0 141 0 /* CE11 */ >; vdd-0.8-cx-mx-supply = <&pm8998_l5>; + memory-region = <&wifi_msa_mem>; }; diff --git a/Documentation/devicetree/bindings/pci/fsl,imx6q-pcie.txt b/Documentation/devicetree/bindings/pci/fsl,imx6q-pcie.txt index cb33421184a0..f37494d5a7be 100644 --- a/Documentation/devicetree/bindings/pci/fsl,imx6q-pcie.txt +++ b/Documentation/devicetree/bindings/pci/fsl,imx6q-pcie.txt @@ -50,6 +50,7 @@ Additional required properties for imx7d-pcie: - reset-names: Must contain the following entires: - "pciephy" - "apps" + - "turnoff" Example: diff --git a/Documentation/devicetree/bindings/pci/pci-keystone.txt b/Documentation/devicetree/bindings/pci/pci-keystone.txt index 4dd17de549a7..2030ee0dc4f9 100644 --- a/Documentation/devicetree/bindings/pci/pci-keystone.txt +++ b/Documentation/devicetree/bindings/pci/pci-keystone.txt @@ -19,6 +19,9 @@ pcie_msi_intc : Interrupt controller device node for MSI IRQ chip interrupt-cells: should be set to 1 interrupts: GIC interrupt lines connected to PCI MSI interrupt lines +ti,syscon-pcie-id : phandle to the device control module required to set device + id and vendor id. + Example: pcie_msi_intc: msi-interrupt-controller { interrupt-controller; diff --git a/Documentation/devicetree/bindings/pci/pci-rcar-gen2.txt b/Documentation/devicetree/bindings/pci/pci-rcar-gen2.txt index 9fe7e12a7bf3..b94078f58d8e 100644 --- a/Documentation/devicetree/bindings/pci/pci-rcar-gen2.txt +++ b/Documentation/devicetree/bindings/pci/pci-rcar-gen2.txt @@ -7,6 +7,7 @@ OHCI and EHCI controllers. Required properties: - compatible: "renesas,pci-r8a7743" for the R8A7743 SoC; + "renesas,pci-r8a7744" for the R8A7744 SoC; "renesas,pci-r8a7745" for the R8A7745 SoC; "renesas,pci-r8a7790" for the R8A7790 SoC; "renesas,pci-r8a7791" for the R8A7791 SoC; diff --git a/Documentation/devicetree/bindings/pci/rcar-pci.txt b/Documentation/devicetree/bindings/pci/rcar-pci.txt index a5f7fc62d10e..976ef7bfff93 100644 --- a/Documentation/devicetree/bindings/pci/rcar-pci.txt +++ b/Documentation/devicetree/bindings/pci/rcar-pci.txt @@ -2,6 +2,7 @@ Required properties: compatible: "renesas,pcie-r8a7743" for the R8A7743 SoC; + "renesas,pcie-r8a7744" for the R8A7744 SoC; "renesas,pcie-r8a7779" for the R8A7779 SoC; "renesas,pcie-r8a7790" for the R8A7790 SoC; "renesas,pcie-r8a7791" for the R8A7791 SoC; @@ -9,6 +10,7 @@ compatible: "renesas,pcie-r8a7743" for the R8A7743 SoC; "renesas,pcie-r8a7795" for the R8A7795 SoC; "renesas,pcie-r8a7796" for the R8A7796 SoC; "renesas,pcie-r8a77980" for the R8A77980 SoC; + "renesas,pcie-r8a77990" for the R8A77990 SoC; "renesas,pcie-rcar-gen2" for a generic R-Car Gen2 or RZ/G1 compatible device. "renesas,pcie-rcar-gen3" for a generic R-Car Gen3 compatible device. diff --git a/Documentation/devicetree/bindings/pci/ti-pci.txt b/Documentation/devicetree/bindings/pci/ti-pci.txt index 7f7af3044016..452fe48c4fdd 100644 --- a/Documentation/devicetree/bindings/pci/ti-pci.txt +++ b/Documentation/devicetree/bindings/pci/ti-pci.txt @@ -26,6 +26,11 @@ HOST MODE ranges, interrupt-map-mask, interrupt-map : as specified in ../designware-pcie.txt + - ti,syscon-unaligned-access: phandle to the syscon DT node. The 1st argument + should contain the register offset within syscon + and the 2nd argument should contain the bit field + for setting the bit to enable unaligned + access. DEVICE MODE =========== diff --git a/Documentation/devicetree/bindings/phy/phy-ocelot-serdes.txt b/Documentation/devicetree/bindings/phy/phy-ocelot-serdes.txt new file mode 100644 index 000000000000..332219860187 --- /dev/null +++ b/Documentation/devicetree/bindings/phy/phy-ocelot-serdes.txt @@ -0,0 +1,43 @@ +Microsemi Ocelot SerDes muxing driver +------------------------------------- + +On Microsemi Ocelot, there is a handful of registers in HSIO address +space for setting up the SerDes to switch port muxing. + +A SerDes X can be "muxed" to work with switch port Y or Z for example. +One specific SerDes can also be used as a PCIe interface. + +Hence, a SerDes represents an interface, be it an Ethernet or a PCIe one. + +There are two kinds of SerDes: SERDES1G supports 10/100Mbps in +half/full-duplex and 1000Mbps in full-duplex mode while SERDES6G supports +10/100Mbps in half/full-duplex and 1000/2500Mbps in full-duplex mode. + +Also, SERDES6G number (aka "macro") 0 is the only interface supporting +QSGMII. + +This is a child of the HSIO syscon ("mscc,ocelot-hsio", see +Documentation/devicetree/bindings/mips/mscc.txt) on the Microsemi Ocelot. + +Required properties: + +- compatible: should be "mscc,vsc7514-serdes" +- #phy-cells : from the generic phy bindings, must be 2. + The first number defines the input port to use for a given + SerDes macro. The second defines the macro to use. They are + defined in dt-bindings/phy/phy-ocelot-serdes.h + +Example: + + serdes: serdes { + compatible = "mscc,vsc7514-serdes"; + #phy-cells = <2>; + }; + + ethernet { + port1 { + phy-handle = <&phy_foo>; + /* Link SERDES1G_5 to port1 */ + phys = <&serdes 1 SERDES1G_5>; + }; + }; diff --git a/Documentation/devicetree/bindings/pinctrl/brcm,bcm4708-pinmux.txt b/Documentation/devicetree/bindings/pinctrl/brcm,bcm4708-pinmux.txt new file mode 100644 index 000000000000..4fa9539070cb --- /dev/null +++ b/Documentation/devicetree/bindings/pinctrl/brcm,bcm4708-pinmux.txt @@ -0,0 +1,57 @@ +Broadcom Northstar pins mux controller + +Some of Northstar SoCs's pins can be used for various purposes thanks to the mux +controller. This binding allows describing mux controller and listing available +functions. They can be referenced later by other bindings to let system +configure controller correctly. + +A list of pins varies across chipsets so few bindings are available. + +Required properties: +- compatible: must be one of: + "brcm,bcm4708-pinmux" + "brcm,bcm4709-pinmux" + "brcm,bcm53012-pinmux" +- reg: iomem address range of CRU (Central Resource Unit) pin registers +- reg-names: "cru_gpio_control" - the only needed & supported reg right now + +Functions and their groups available for all chipsets: +- "spi": "spi_grp" +- "i2c": "i2c_grp" +- "pwm": "pwm0_grp", "pwm1_grp", "pwm2_grp", "pwm3_grp" +- "uart1": "uart1_grp" + +Additionally available on BCM4709 and BCM53012: +- "mdio": "mdio_grp" +- "uart2": "uart2_grp" +- "sdio": "sdio_pwr_grp", "sdio_1p8v_grp" + +For documentation of subnodes see: +Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt + +Example: + dmu@1800c000 { + compatible = "simple-bus"; + ranges = <0 0x1800c000 0x1000>; + #address-cells = <1>; + #size-cells = <1>; + + cru@100 { + compatible = "simple-bus"; + reg = <0x100 0x1a4>; + ranges; + #address-cells = <1>; + #size-cells = <1>; + + pin-controller@1c0 { + compatible = "brcm,bcm4708-pinmux"; + reg = <0x1c0 0x24>; + reg-names = "cru_gpio_control"; + + spi-pins { + function = "spi"; + groups = "spi_grp"; + }; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/pinctrl/ingenic,pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/ingenic,pinctrl.txt index ca313a7aeaff..af20b0ec715c 100644 --- a/Documentation/devicetree/bindings/pinctrl/ingenic,pinctrl.txt +++ b/Documentation/devicetree/bindings/pinctrl/ingenic,pinctrl.txt @@ -20,16 +20,30 @@ Required properties: - compatible: One of: - "ingenic,jz4740-pinctrl" + - "ingenic,jz4725b-pinctrl" - "ingenic,jz4770-pinctrl" - "ingenic,jz4780-pinctrl" - reg: Address range of the pinctrl registers. -GPIO sub-nodes --------------- +Required properties for sub-nodes (GPIO chips): +----------------------------------------------- -The pinctrl node can have optional sub-nodes for the Ingenic GPIO driver; -please refer to ../gpio/ingenic,gpio.txt. + - compatible: Must contain one of: + - "ingenic,jz4740-gpio" + - "ingenic,jz4770-gpio" + - "ingenic,jz4780-gpio" + - reg: The GPIO bank number. + - interrupt-controller: Marks the device node as an interrupt controller. + - interrupts: Interrupt specifier for the controllers interrupt. + - #interrupt-cells: Should be 2. Refer to + ../interrupt-controller/interrupts.txt for more details. + - gpio-controller: Marks the device node as a GPIO controller. + - #gpio-cells: Should be 2. The first cell is the GPIO number and the second + cell specifies GPIO flags, as defined in <dt-bindings/gpio/gpio.h>. Only the + GPIO_ACTIVE_HIGH and GPIO_ACTIVE_LOW flags are supported. + - gpio-ranges: Range of pins managed by the GPIO controller. Refer to + ../gpio/gpio.txt for more details. Example: @@ -38,4 +52,21 @@ Example: pinctrl: pin-controller@10010000 { compatible = "ingenic,jz4740-pinctrl"; reg = <0x10010000 0x400>; + #address-cells = <1>; + #size-cells = <0>; + + gpa: gpio@0 { + compatible = "ingenic,jz4740-gpio"; + reg = <0>; + + gpio-controller; + gpio-ranges = <&pinctrl 0 0 32>; + #gpio-cells = <2>; + + interrupt-controller; + #interrupt-cells = <2>; + + interrupt-parent = <&intc>; + interrupts = <28>; + }; }; diff --git a/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt index 54ecb8ab7788..82ead40311f6 100644 --- a/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt +++ b/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt @@ -13,6 +13,8 @@ Required properties for the root node: "amlogic,meson-gxl-aobus-pinctrl" "amlogic,meson-axg-periphs-pinctrl" "amlogic,meson-axg-aobus-pinctrl" + "amlogic,meson-g12a-periphs-pinctrl" + "amlogic,meson-g12a-aobus-pinctrl" - reg: address and size of registers controlling irq functionality === GPIO sub-nodes === diff --git a/Documentation/devicetree/bindings/pinctrl/nuvoton,npcm7xx-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/nuvoton,npcm7xx-pinctrl.txt new file mode 100644 index 000000000000..83f4bbac94bb --- /dev/null +++ b/Documentation/devicetree/bindings/pinctrl/nuvoton,npcm7xx-pinctrl.txt @@ -0,0 +1,216 @@ +Nuvoton NPCM7XX Pin Controllers + +The Nuvoton BMC NPCM7XX Pin Controller multi-function routed through +the multiplexing block, Each pin supports GPIO functionality (GPIOx) +and multiple functions that directly connect the pin to different +hardware blocks. + +Required properties: +- #address-cells : should be 1. +- #size-cells : should be 1. +- compatible : "nuvoton,npcm750-pinctrl" for Poleg NPCM7XX. +- ranges : defines mapping ranges between pin controller node (parent) + to GPIO bank node (children). + +=== GPIO Bank Subnode === + +The NPCM7XX has 8 GPIO Banks each GPIO bank supports 32 GPIO. + +Required GPIO Bank subnode-properties: +- reg : specifies physical base address and size of the GPIO + bank registers. +- gpio-controller : Marks the device node as a GPIO controller. +- #gpio-cells : Must be <2>. The first cell is the gpio pin number + and the second cell is used for optional parameters. +- interrupts : contain the GPIO bank interrupt with flags for falling edge. +- gpio-ranges : defines the range of pins managed by the GPIO bank controller. + +For example, GPIO bank subnodes like the following: + gpio0: gpio@f0010000 { + gpio-controller; + #gpio-cells = <2>; + reg = <0x0 0x80>; + interrupts = <GIC_SPI 116 IRQ_TYPE_LEVEL_HIGH>; + gpio-ranges = <&pinctrl 0 0 32>; + }; + +=== Pin Mux Subnode === + +- pin: A string containing the name of the pin + An array of strings, each string containing the name of a pin. + These pin are used for selecting pin configuration. + +The following are the list of pins available: + "GPIO0/IOX1DI", "GPIO1/IOX1LD", "GPIO2/IOX1CK", "GPIO3/IOX1D0", + "GPIO4/IOX2DI/SMB1DSDA", "GPIO5/IOX2LD/SMB1DSCL", "GPIO6/IOX2CK/SMB2DSDA", + "GPIO7/IOX2D0/SMB2DSCL", "GPIO8/LKGPO1", "GPIO9/LKGPO2", "GPIO10/IOXHLD", + "GPIO11/IOXHCK", "GPIO12/GSPICK/SMB5BSCL", "GPIO13/GSPIDO/SMB5BSDA", + "GPIO14/GSPIDI/SMB5CSCL", "GPIO15/GSPICS/SMB5CSDA", "GPIO16/LKGPO0", + "GPIO17/PSPI2DI/SMB4DEN","GPIO18/PSPI2D0/SMB4BSDA", "GPIO19/PSPI2CK/SMB4BSCL", + "GPIO20/SMB4CSDA/SMB15SDA", "GPIO21/SMB4CSCL/SMB15SCL", "GPIO22/SMB4DSDA/SMB14SDA", + "GPIO23/SMB4DSCL/SMB14SCL", "GPIO24/IOXHDO", "GPIO25/IOXHDI", "GPIO26/SMB5SDA", + "GPIO27/SMB5SCL", "GPIO28/SMB4SDA", "GPIO29/SMB4SCL", "GPIO30/SMB3SDA", + "GPIO31/SMB3SCL", "GPIO32/nSPI0CS1","SPI0D2", "SPI0D3", "GPIO37/SMB3CSDA", + "GPIO38/SMB3CSCL", "GPIO39/SMB3BSDA", "GPIO40/SMB3BSCL", "GPIO41/BSPRXD", + "GPO42/BSPTXD/STRAP11", "GPIO43/RXD1/JTMS2/BU1RXD", "GPIO44/nCTS1/JTDI2/BU1CTS", + "GPIO45/nDCD1/JTDO2", "GPIO46/nDSR1/JTCK2", "GPIO47/nRI1/JCP_RDY2", + "GPIO48/TXD2/BSPTXD", "GPIO49/RXD2/BSPRXD", "GPIO50/nCTS2", "GPO51/nRTS2/STRAP2", + "GPIO52/nDCD2", "GPO53/nDTR2_BOUT2/STRAP1", "GPIO54/nDSR2", "GPIO55/nRI2", + "GPIO56/R1RXERR", "GPIO57/R1MDC", "GPIO58/R1MDIO", "GPIO59/SMB3DSDA", + "GPIO60/SMB3DSCL", "GPO61/nDTR1_BOUT1/STRAP6", "GPO62/nRTST1/STRAP5", + "GPO63/TXD1/STRAP4", "GPIO64/FANIN0", "GPIO65/FANIN1", "GPIO66/FANIN2", + "GPIO67/FANIN3", "GPIO68/FANIN4", "GPIO69/FANIN5", "GPIO70/FANIN6", "GPIO71/FANIN7", + "GPIO72/FANIN8", "GPIO73/FANIN9", "GPIO74/FANIN10", "GPIO75/FANIN11", + "GPIO76/FANIN12", "GPIO77/FANIN13","GPIO78/FANIN14", "GPIO79/FANIN15", + "GPIO80/PWM0", "GPIO81/PWM1", "GPIO82/PWM2", "GPIO83/PWM3", "GPIO84/R2TXD0", + "GPIO85/R2TXD1", "GPIO86/R2TXEN", "GPIO87/R2RXD0", "GPIO88/R2RXD1", "GPIO89/R2CRSDV", + "GPIO90/R2RXERR", "GPIO91/R2MDC", "GPIO92/R2MDIO", "GPIO93/GA20/SMB5DSCL", + "GPIO94/nKBRST/SMB5DSDA", "GPIO95/nLRESET/nESPIRST", "GPIO96/RG1TXD0", + "GPIO97/RG1TXD1", "GPIO98/RG1TXD2", "GPIO99/RG1TXD3","GPIO100/RG1TXC", + "GPIO101/RG1TXCTL", "GPIO102/RG1RXD0", "GPIO103/RG1RXD1", "GPIO104/RG1RXD2", + "GPIO105/RG1RXD3", "GPIO106/RG1RXC", "GPIO107/RG1RXCTL", "GPIO108/RG1MDC", + "GPIO109/RG1MDIO", "GPIO110/RG2TXD0/DDRV0", "GPIO111/RG2TXD1/DDRV1", + "GPIO112/RG2TXD2/DDRV2", "GPIO113/RG2TXD3/DDRV3", "GPIO114/SMB0SCL", + "GPIO115/SMB0SDA", "GPIO116/SMB1SCL", "GPIO117/SMB1SDA", "GPIO118/SMB2SCL", + "GPIO119/SMB2SDA", "GPIO120/SMB2CSDA", "GPIO121/SMB2CSCL", "GPIO122/SMB2BSDA", + "GPIO123/SMB2BSCL", "GPIO124/SMB1CSDA", "GPIO125/SMB1CSCL","GPIO126/SMB1BSDA", + "GPIO127/SMB1BSCL", "GPIO128/SMB8SCL", "GPIO129/SMB8SDA", "GPIO130/SMB9SCL", + "GPIO131/SMB9SDA", "GPIO132/SMB10SCL", "GPIO133/SMB10SDA","GPIO134/SMB11SCL", + "GPIO135/SMB11SDA", "GPIO136/SD1DT0", "GPIO137/SD1DT1", "GPIO138/SD1DT2", + "GPIO139/SD1DT3", "GPIO140/SD1CLK", "GPIO141/SD1WP", "GPIO142/SD1CMD", + "GPIO143/SD1CD/SD1PWR", "GPIO144/PWM4", "GPIO145/PWM5", "GPIO146/PWM6", + "GPIO147/PWM7", "GPIO148/MMCDT4", "GPIO149/MMCDT5", "GPIO150/MMCDT6", + "GPIO151/MMCDT7", "GPIO152/MMCCLK", "GPIO153/MMCWP", "GPIO154/MMCCMD", + "GPIO155/nMMCCD/nMMCRST", "GPIO156/MMCDT0", "GPIO157/MMCDT1", "GPIO158/MMCDT2", + "GPIO159/MMCDT3", "GPIO160/CLKOUT/RNGOSCOUT", "GPIO161/nLFRAME/nESPICS", + "GPIO162/SERIRQ", "GPIO163/LCLK/ESPICLK", "GPIO164/LAD0/ESPI_IO0", + "GPIO165/LAD1/ESPI_IO1", "GPIO166/LAD2/ESPI_IO2", "GPIO167/LAD3/ESPI_IO3", + "GPIO168/nCLKRUN/nESPIALERT", "GPIO169/nSCIPME", "GPIO170/nSMI", "GPIO171/SMB6SCL", + "GPIO172/SMB6SDA", "GPIO173/SMB7SCL", "GPIO174/SMB7SDA", "GPIO175/PSPI1CK/FANIN19", + "GPIO176/PSPI1DO/FANIN18", "GPIO177/PSPI1DI/FANIN17", "GPIO178/R1TXD0", + "GPIO179/R1TXD1", "GPIO180/R1TXEN", "GPIO181/R1RXD0", "GPIO182/R1RXD1", + "GPIO183/SPI3CK", "GPO184/SPI3D0/STRAP9", "GPO185/SPI3D1/STRAP10", + "GPIO186/nSPI3CS0", "GPIO187/nSPI3CS1", "GPIO188/SPI3D2/nSPI3CS2", + "GPIO189/SPI3D3/nSPI3CS3", "GPIO190/nPRD_SMI", "GPIO191", "GPIO192", "GPIO193/R1CRSDV", + "GPIO194/SMB0BSCL", "GPIO195/SMB0BSDA", "GPIO196/SMB0CSCL", "GPIO197/SMB0DEN", + "GPIO198/SMB0DSDA", "GPIO199/SMB0DSCL", "GPIO200/R2CK", "GPIO201/R1CK", + "GPIO202/SMB0CSDA", "GPIO203/FANIN16", "GPIO204/DDC2SCL", "GPIO205/DDC2SDA", + "GPIO206/HSYNC2", "GPIO207/VSYNC2", "GPIO208/RG2TXC/DVCK", "GPIO209/RG2TXCTL/DDRV4", + "GPIO210/RG2RXD0/DDRV5", "GPIO211/RG2RXD1/DDRV6", "GPIO212/RG2RXD2/DDRV7", + "GPIO213/RG2RXD3/DDRV8", "GPIO214/RG2RXC/DDRV9", "GPIO215/RG2RXCTL/DDRV10", + "GPIO216/RG2MDC/DDRV11", "GPIO217/RG2MDIO/DVHSYNC", "GPIO218/nWDO1", + "GPIO219/nWDO2", "GPIO220/SMB12SCL", "GPIO221/SMB12SDA", "GPIO222/SMB13SCL", + "GPIO223/SMB13SDA", "GPIO224/SPIXCK", "GPO225/SPIXD0/STRAP12", "GPO226/SPIXD1/STRAP13", + "GPIO227/nSPIXCS0", "GPIO228/nSPIXCS1", "GPO229/SPIXD2/STRAP3", "GPIO230/SPIXD3", + "GPIO231/nCLKREQ", "GPI255/DACOSEL" + +Optional Properties: + bias-disable, bias-pull-down, bias-pull-up, input-enable, + input-disable, output-high, output-low, drive-push-pull, + drive-open-drain, input-debounce, slew-rate, drive-strength + + slew-rate valid arguments are: + <0> - slow + <1> - fast + drive-strength valid arguments are: + <2> - 2mA + <4> - 4mA + <8> - 8mA + <12> - 12mA + <16> - 16mA + <24> - 24mA + +For example, pinctrl might have pinmux subnodes like the following: + + gpio0_iox1d1_pin: gpio0-iox1d1-pin { + pins = "GPIO0/IOX1DI"; + output-high; + }; + gpio0_iox1ck_pin: gpio0-iox1ck-pin { + pins = "GPIO2/IOX1CK"; + output_high; + }; + +=== Pin Group Subnode === + +Required pin group subnode-properties: +- groups : A string containing the name of the group to mux. +- function: A string containing the name of the function to mux to the + group. + +The following are the list of the available groups and functions : + smb0, smb0b, smb0c, smb0d, smb0den, smb1, smb1b, smb1c, smb1d, + smb2, smb2b, smb2c, smb2d, smb3, smb3b, smb3c, smb3d, smb4, smb4b, + smb4c, smb4d, smb4den, smb5, smb5b, smb5c, smb5d, ga20kbc, smb6, + smb7, smb8, smb9, smb10, smb11, smb12, smb13, smb14, smb15, fanin0, + fanin1, fanin2, fanin3, fanin4, fanin5, fanin6, fanin7, fanin8, + fanin9, fanin10, fanin11 fanin12 fanin13, fanin14, fanin15, faninx, + pwm0, pwm1, pwm2, pwm3, pwm4, pwm5, pwm6, pwm7, rg1, rg1mdio, rg2, + rg2mdio, ddr, uart1, uart2, bmcuart0a, bmcuart0b, bmcuart1, iox1, + iox2, ioxh, gspi, mmc, mmcwp, mmccd, mmcrst, mmc8, r1, r1err, r1md, + r2, r2err, r2md, sd1, sd1pwr, wdog1, wdog2, scipme, sci, serirq, + jtag2, spix, spixcs1, pspi1, pspi2, ddc, clkreq, clkout, spi3, spi3cs1, + spi3quad, spi3cs2, spi3cs3, spi0cs1, lpc, lpcclk, espi, lkgpo0, lkgpo1, + lkgpo2, nprd_smi + +For example, pinctrl might have group subnodes like the following: + r1err_pins: r1err-pins { + groups = "r1err"; + function = "r1err"; + }; + r1md_pins: r1md-pins { + groups = "r1md"; + function = "r1md"; + }; + r1_pins: r1-pins { + groups = "r1"; + function = "r1"; + }; + +Examples +======== +pinctrl: pinctrl@f0800000 { + #address-cells = <1>; + #size-cells = <1>; + compatible = "nuvoton,npcm750-pinctrl"; + ranges = <0 0xf0010000 0x8000>; + + gpio0: gpio@f0010000 { + gpio-controller; + #gpio-cells = <2>; + reg = <0x0 0x80>; + interrupts = <GIC_SPI 116 IRQ_TYPE_LEVEL_HIGH>; + gpio-ranges = <&pinctrl 0 0 32>; + }; + + .... + + gpio7: gpio@f0017000 { + gpio-controller; + #gpio-cells = <2>; + reg = <0x7000 0x80>; + interrupts = <GIC_SPI 123 IRQ_TYPE_LEVEL_HIGH>; + gpio-ranges = <&pinctrl 0 224 32>; + }; + + gpio0_iox1d1_pin: gpio0-iox1d1-pin { + pins = "GPIO0/IOX1DI"; + output-high; + }; + + iox1_pins: iox1-pins { + groups = "iox1"; + function = "iox1"; + }; + iox2_pins: iox2-pins { + groups = "iox2"; + function = "iox2"; + }; + + .... + + clkreq_pins: clkreq-pins { + groups = "clkreq"; + function = "clkreq"; + }; +};
\ No newline at end of file diff --git a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.txt b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.txt index ffd4345415f3..ab4000eab07d 100644 --- a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.txt +++ b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.txt @@ -19,6 +19,7 @@ PMIC's from Qualcomm. "qcom,pm8998-gpio" "qcom,pma8084-gpio" "qcom,pmi8994-gpio" + "qcom,pms405-gpio" And must contain either "qcom,spmi-gpio" or "qcom,ssbi-gpio" if the device is on an spmi bus or an ssbi bus respectively @@ -91,6 +92,7 @@ to specify in a pin configuration subnode: gpio1-gpio26 for pm8998 gpio1-gpio22 for pma8084 gpio1-gpio10 for pmi8994 + gpio1-gpio11 for pms405 - function: Usage: required diff --git a/Documentation/devicetree/bindings/pinctrl/qcom,qcs404-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/qcom,qcs404-pinctrl.txt new file mode 100644 index 000000000000..2b8f77762edc --- /dev/null +++ b/Documentation/devicetree/bindings/pinctrl/qcom,qcs404-pinctrl.txt @@ -0,0 +1,199 @@ +Qualcomm QCS404 TLMM block + +This binding describes the Top Level Mode Multiplexer block found in the +QCS404 platform. + +- compatible: + Usage: required + Value type: <string> + Definition: must be "qcom,qcs404-pinctrl" + +- reg: + Usage: required + Value type: <prop-encoded-array> + Definition: the base address and size of the north, south and east TLMM + tiles. + +- reg-names: + Usage: required + Value type: <stringlist> + Defintiion: names for the cells of reg, must contain "north", "south" + and "east". + +- interrupts: + Usage: required + Value type: <prop-encoded-array> + Definition: should specify the TLMM summary IRQ. + +- interrupt-controller: + Usage: required + Value type: <none> + Definition: identifies this node as an interrupt controller + +- #interrupt-cells: + Usage: required + Value type: <u32> + Definition: must be 2. Specifying the pin number and flags, as defined + in <dt-bindings/interrupt-controller/irq.h> + +- gpio-controller: + Usage: required + Value type: <none> + Definition: identifies this node as a gpio controller + +- #gpio-cells: + Usage: required + Value type: <u32> + Definition: must be 2. Specifying the pin number and flags, as defined + in <dt-bindings/gpio/gpio.h> + +- gpio-ranges: + Usage: required + Definition: see ../gpio/gpio.txt + +Please refer to ../gpio/gpio.txt and ../interrupt-controller/interrupts.txt for +a general description of GPIO and interrupt bindings. + +Please refer to pinctrl-bindings.txt in this directory for details of the +common pinctrl bindings used by client devices, including the meaning of the +phrase "pin configuration node". + +The pin configuration nodes act as a container for an arbitrary number of +subnodes. Each of these subnodes represents some desired configuration for a +pin, a group, or a list of pins or groups. This configuration can include the +mux function to select on those pin(s)/group(s), and various pin configuration +parameters, such as pull-up, drive strength, etc. + + +PIN CONFIGURATION NODES: + +The name of each subnode is not important; all subnodes should be enumerated +and processed purely based on their content. + +Each subnode only affects those parameters that are explicitly listed. In +other words, a subnode that lists a mux function but no pin configuration +parameters implies no information about any pin configuration parameters. +Similarly, a pin subnode that describes a pullup parameter implies no +information about e.g. the mux function. + + +The following generic properties as defined in pinctrl-bindings.txt are valid +to specify in a pin configuration subnode: + +- pins: + Usage: required + Value type: <string-array> + Definition: List of gpio pins affected by the properties specified in + this subnode. + + Valid pins are: + gpio0-gpio119 + Supports mux, bias and drive-strength + + sdc1_clk, sdc1_cmd, sdc1_data, sdc2_clk, sdc2_cmd, + sdc2_data + Supports bias and drive-strength + + ufs_reset + Supports bias and drive-strength + +- function: + Usage: required + Value type: <string> + Definition: Specify the alternative function to be configured for the + specified pins. Functions are only valid for gpio pins. + Valid values are: + + gpio, hdmi_tx, hdmi_ddc, blsp_uart_tx_a2, blsp_spi2, m_voc, + qdss_cti_trig_in_a0, blsp_uart_rx_a2, qdss_tracectl_a, + blsp_uart2, aud_cdc, blsp_i2c_sda_a2, qdss_tracedata_a, + blsp_i2c_scl_a2, qdss_tracectl_b, qdss_cti_trig_in_b0, + blsp_uart1, blsp_spi_mosi_a1, blsp_spi_miso_a1, + qdss_tracedata_b, blsp_i2c1, blsp_spi_cs_n_a1, gcc_plltest, + blsp_spi_clk_a1, rgb_data0, blsp_uart5, blsp_spi5, + adsp_ext, rgb_data1, prng_rosc, rgb_data2, blsp_i2c5, + gcc_gp1_clk_b, rgb_data3, gcc_gp2_clk_b, blsp_spi0, + blsp_uart0, gcc_gp3_clk_b, blsp_i2c0, qdss_traceclk_b, + pcie_clk, nfc_irq, blsp_spi4, nfc_dwl, audio_ts, rgb_data4, + spi_lcd, blsp_uart_tx_b2, gcc_gp3_clk_a, rgb_data5, + blsp_uart_rx_b2, blsp_i2c_sda_b2, blsp_i2c_scl_b2, + pwm_led11, i2s_3_data0_a, ebi2_lcd, i2s_3_data1_a, + i2s_3_data2_a, atest_char, pwm_led3, i2s_3_data3_a, + pwm_led4, i2s_4, ebi2_a, dsd_clk_b, pwm_led5, pwm_led6, + pwm_led7, pwm_led8, pwm_led24, spkr_dac0, blsp_i2c4, + pwm_led9, pwm_led10, spdifrx_opt, pwm_led12, pwm_led13, + pwm_led14, wlan1_adc1, rgb_data_b0, pwm_led15, + blsp_spi_mosi_b1, wlan1_adc0, rgb_data_b1, pwm_led16, + blsp_spi_miso_b1, qdss_cti_trig_out_b0, wlan2_adc1, + rgb_data_b2, pwm_led17, blsp_spi_cs_n_b1, wlan2_adc0, + rgb_data_b3, pwm_led18, blsp_spi_clk_b1, rgb_data_b4, + pwm_led19, ext_mclk1_b, qdss_traceclk_a, rgb_data_b5, + pwm_led20, atest_char3, i2s_3_sck_b, ldo_update, bimc_dte0, + rgb_hsync, pwm_led21, i2s_3_ws_b, dbg_out, rgb_vsync, + i2s_3_data0_b, ldo_en, hdmi_dtest, rgb_de, i2s_3_data1_b, + hdmi_lbk9, rgb_clk, atest_char1, i2s_3_data2_b, ebi_cdc, + hdmi_lbk8, rgb_mdp, atest_char0, i2s_3_data3_b, hdmi_lbk7, + rgb_data_b6, rgb_data_b7, hdmi_lbk6, rgmii_int, cri_trng1, + rgmii_wol, cri_trng0, gcc_tlmm, rgmii_ck, rgmii_tx, + hdmi_lbk5, hdmi_pixel, hdmi_rcv, hdmi_lbk4, rgmii_ctl, + ext_lpass, rgmii_rx, cri_trng, hdmi_lbk3, hdmi_lbk2, + qdss_cti_trig_out_b1, rgmii_mdio, hdmi_lbk1, rgmii_mdc, + hdmi_lbk0, ir_in, wsa_en, rgb_data6, rgb_data7, + atest_char2, ebi_ch0, blsp_uart3, blsp_spi3, sd_write, + blsp_i2c3, gcc_gp1_clk_a, qdss_cti_trig_in_b1, + gcc_gp2_clk_a, ext_mclk0, mclk_in1, i2s_1, dsd_clk_a, + qdss_cti_trig_in_a1, rgmi_dll1, pwm_led22, pwm_led23, + qdss_cti_trig_out_a0, rgmi_dll2, pwm_led1, + qdss_cti_trig_out_a1, pwm_led2, i2s_2, pll_bist, + ext_mclk1_a, mclk_in2, bimc_dte1, i2s_3_sck_a, i2s_3_ws_a + +- bias-disable: + Usage: optional + Value type: <none> + Definition: The specified pins should be configued as no pull. + +- bias-pull-down: + Usage: optional + Value type: <none> + Definition: The specified pins should be configued as pull down. + +- bias-pull-up: + Usage: optional + Value type: <none> + Definition: The specified pins should be configued as pull up. + +- output-high: + Usage: optional + Value type: <none> + Definition: The specified pins are configured in output mode, driven + high. + Not valid for sdc pins. + +- output-low: + Usage: optional + Value type: <none> + Definition: The specified pins are configured in output mode, driven + low. + Not valid for sdc pins. + +- drive-strength: + Usage: optional + Value type: <u32> + Definition: Selects the drive strength for the specified pins, in mA. + Valid values are: 2, 4, 6, 8, 10, 12, 14 and 16 + +Example: + + tlmm: pinctrl@1000000 { + compatible = "qcom,qcs404-pinctrl"; + reg = <0x01000000 0x200000>, + <0x01300000 0x200000>, + <0x07b00000 0x200000>; + reg-names = "south", "north", "east"; + interrupts = <GIC_SPI 208 IRQ_TYPE_LEVEL_HIGH>; + gpio-controller; + #gpio-cells = <2>; + gpio-ranges = <&tlmm 0 0 120>; + interrupt-controller; + #interrupt-cells = <2>; + }; diff --git a/Documentation/devicetree/bindings/pinctrl/qcom,sdm660-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/qcom,sdm660-pinctrl.txt new file mode 100644 index 000000000000..769ca83bb40d --- /dev/null +++ b/Documentation/devicetree/bindings/pinctrl/qcom,sdm660-pinctrl.txt @@ -0,0 +1,191 @@ +Qualcomm Technologies, Inc. SDM660 TLMM block + +This binding describes the Top Level Mode Multiplexer block found in the +SDM660 platform. + +- compatible: + Usage: required + Value type: <string> + Definition: must be "qcom,sdm660-pinctrl" or + "qcom,sdm630-pinctrl". + +- reg: + Usage: required + Value type: <prop-encoded-array> + Definition: the base address and size of the north, center and south + TLMM tiles. + +- reg-names: + Usage: required + Value type: <stringlist> + Definition: names for the cells of reg, must contain "north", "center" + and "south". + +- interrupts: + Usage: required + Value type: <prop-encoded-array> + Definition: should specify the TLMM summary IRQ. + +- interrupt-controller: + Usage: required + Value type: <none> + Definition: identifies this node as an interrupt controller + +- #interrupt-cells: + Usage: required + Value type: <u32> + Definition: must be 2. Specifying the pin number and flags, as defined + in <dt-bindings/interrupt-controller/irq.h> + +- gpio-controller: + Usage: required + Value type: <none> + Definition: identifies this node as a gpio controller + +- gpio-ranges: + Usage: required + Value type: <prop-encoded-array> + Definition: Specifies the mapping between gpio controller and + pin-controller pins. + +- #gpio-cells: + Usage: required + Value type: <u32> + Definition: must be 2. Specifying the pin number and flags, as defined + in <dt-bindings/gpio/gpio.h> + +Please refer to ../gpio/gpio.txt and ../interrupt-controller/interrupts.txt for +a general description of GPIO and interrupt bindings. + +Please refer to pinctrl-bindings.txt in this directory for details of the +common pinctrl bindings used by client devices, including the meaning of the +phrase "pin configuration node". + +The pin configuration nodes act as a container for an arbitrary number of +subnodes. Each of these subnodes represents some desired configuration for a +pin, a group, or a list of pins or groups. This configuration can include the +mux function to select on those pin(s)/group(s), and various pin configuration +parameters, such as pull-up, drive strength, etc. + + +PIN CONFIGURATION NODES: + +The name of each subnode is not important; all subnodes should be enumerated +and processed purely based on their content. + +Each subnode only affects those parameters that are explicitly listed. In +other words, a subnode that lists a mux function but no pin configuration +parameters implies no information about any pin configuration parameters. +Similarly, a pin subnode that describes a pullup parameter implies no +information about e.g. the mux function. + + +The following generic properties as defined in pinctrl-bindings.txt are valid +to specify in a pin configuration subnode: + +- pins: + Usage: required + Value type: <string-array> + Definition: List of gpio pins affected by the properties specified in + this subnode. Valid pins are: + gpio0-gpio113, + Supports mux, bias and drive-strength + sdc1_clk, sdc1_cmd, sdc1_data sdc2_clk, sdc2_cmd, sdc2_data sdc1_rclk, + Supports bias and drive-strength + +- function: + Usage: required + Value type: <string> + Definition: Specify the alternative function to be configured for the + specified pins. Functions are only valid for gpio pins. + Valid values are: + adsp_ext, agera_pll, atest_char, atest_char0, atest_char1, + atest_char2, atest_char3, atest_gpsadc0, atest_gpsadc1, + atest_tsens, atest_tsens2, atest_usb1, atest_usb10, + atest_usb11, atest_usb12, atest_usb13, atest_usb2, + atest_usb20, atest_usb21, atest_usb22, atest_usb23, + audio_ref, bimc_dte0, bimc_dte1, blsp_i2c1, blsp_i2c2, + blsp_i2c3, blsp_i2c4, blsp_i2c5, blsp_i2c6, blsp_i2c7, + blsp_i2c8_a, blsp_i2c8_b, blsp_spi1, blsp_spi2, blsp_spi3, + blsp_spi3_cs1, blsp_spi3_cs2, blsp_spi4, blsp_spi5, + blsp_spi6, blsp_spi7, blsp_spi8_a, blsp_spi8_b, + blsp_spi8_cs1, blsp_spi8_cs2, blsp_uart1, blsp_uart2, + blsp_uart5, blsp_uart6_a, blsp_uart6_b, blsp_uim1, + blsp_uim2, blsp_uim5, blsp_uim6, cam_mclk, cci_async, + cci_i2c, cri_trng, cri_trng0, cri_trng1, dbg_out, ddr_bist, + gcc_gp1, gcc_gp2, gcc_gp3, gpio, gps_tx_a, gps_tx_b, gps_tx_c, + isense_dbg, jitter_bist, ldo_en, ldo_update, m_voc, mdp_vsync, + mdss_vsync0, mdss_vsync1, mdss_vsync2, mdss_vsync3, mss_lte, + nav_pps_a, nav_pps_b, nav_pps_c, pa_indicator, phase_flag0, + phase_flag1, phase_flag10, phase_flag11, phase_flag12, + phase_flag13, phase_flag14, phase_flag15, phase_flag16, + phase_flag17, phase_flag18, phase_flag19, phase_flag2, + phase_flag20, phase_flag21, phase_flag22, phase_flag23, + phase_flag24, phase_flag25, phase_flag26, phase_flag27, + phase_flag28, phase_flag29, phase_flag3, phase_flag30, + phase_flag31, phase_flag4, phase_flag5, phase_flag6, + phase_flag7, phase_flag8, phase_flag9, pll_bypassnl, + pll_reset, pri_mi2s, pri_mi2s_ws, prng_rosc, pwr_crypto, + pwr_modem, pwr_nav, qdss_cti0_a, qdss_cti0_b, qdss_cti1_a, + qdss_cti1_b, qdss_gpio, qdss_gpio0, qdss_gpio1, qdss_gpio10, + qdss_gpio11, qdss_gpio12, qdss_gpio13, qdss_gpio14, qdss_gpio15, + qdss_gpio2, qdss_gpio3, qdss_gpio4, qdss_gpio5, qdss_gpio6, + qdss_gpio7, qdss_gpio8, qdss_gpio9, qlink_enable, qlink_request, + qspi_clk, qspi_cs, qspi_data0, qspi_data1, qspi_data2, + qspi_data3, qspi_resetn, sec_mi2s, sndwire_clk, sndwire_data, + sp_cmu, ssc_irq, tgu_ch0, tgu_ch1, tsense_pwm1, tsense_pwm2, + uim1_clk, uim1_data, uim1_present, uim1_reset, uim2_clk, + uim2_data, uim2_present, uim2_reset, uim_batt, vfr_1, + vsense_clkout, vsense_data0, vsense_data1, vsense_mode, + wlan1_adc0, wlan1_adc1, wlan2_adc0, wlan2_adc1 + +- bias-disable: + Usage: optional + Value type: <none> + Definition: The specified pins should be configued as no pull. + +- bias-pull-down: + Usage: optional + Value type: <none> + Definition: The specified pins should be configued as pull down. + +- bias-pull-up: + Usage: optional + Value type: <none> + Definition: The specified pins should be configued as pull up. + +- output-high: + Usage: optional + Value type: <none> + Definition: The specified pins are configured in output mode, driven + high. + Not valid for sdc pins. + +- output-low: + Usage: optional + Value type: <none> + Definition: The specified pins are configured in output mode, driven + low. + Not valid for sdc pins. + +- drive-strength: + Usage: optional + Value type: <u32> + Definition: Selects the drive strength for the specified pins, in mA. + Valid values are: 2, 4, 6, 8, 10, 12, 14 and 16 + +Example: + + tlmm: pinctrl@3100000 { + compatible = "qcom,sdm660-pinctrl"; + reg = <0x3100000 0x200000>, + <0x3500000 0x200000>, + <0x3900000 0x200000>; + reg-names = "south", "center", "north"; + interrupts = <GIC_SPI 208 IRQ_TYPE_LEVEL_HIGH>; + gpio-controller; + gpio-ranges = <&tlmm 0 0 114>; + #gpio-cells = <2>; + interrupt-controller; + #interrupt-cells = <2>; + }; diff --git a/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt index abd8fbcf1e62..3902efa18fd0 100644 --- a/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt +++ b/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt @@ -14,8 +14,11 @@ Required Properties: - "renesas,pfc-r8a73a4": for R8A73A4 (R-Mobile APE6) compatible pin-controller. - "renesas,pfc-r8a7740": for R8A7740 (R-Mobile A1) compatible pin-controller. - "renesas,pfc-r8a7743": for R8A7743 (RZ/G1M) compatible pin-controller. + - "renesas,pfc-r8a7744": for R8A7744 (RZ/G1N) compatible pin-controller. - "renesas,pfc-r8a7745": for R8A7745 (RZ/G1E) compatible pin-controller. - "renesas,pfc-r8a77470": for R8A77470 (RZ/G1C) compatible pin-controller. + - "renesas,pfc-r8a774a1": for R8A774A1 (RZ/G2M) compatible pin-controller. + - "renesas,pfc-r8a774c0": for R8A774C0 (RZ/G2E) compatible pin-controller. - "renesas,pfc-r8a7778": for R8A7778 (R-Car M1) compatible pin-controller. - "renesas,pfc-r8a7779": for R8A7779 (R-Car H1) compatible pin-controller. - "renesas,pfc-r8a7790": for R8A7790 (R-Car H2) compatible pin-controller. diff --git a/Documentation/devicetree/bindings/pinctrl/renesas,rzn1-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/renesas,rzn1-pinctrl.txt new file mode 100644 index 000000000000..25e53acd523e --- /dev/null +++ b/Documentation/devicetree/bindings/pinctrl/renesas,rzn1-pinctrl.txt @@ -0,0 +1,153 @@ +Renesas RZ/N1 SoC Pinctrl node description. + +Pin controller node +------------------- +Required properties: +- compatible: SoC-specific compatible string "renesas,<soc-specific>-pinctrl" + followed by "renesas,rzn1-pinctrl" as fallback. The SoC-specific compatible + strings must be one of: + "renesas,r9a06g032-pinctrl" for RZ/N1D + "renesas,r9a06g033-pinctrl" for RZ/N1S +- reg: Address base and length of the memory area where the pin controller + hardware is mapped to. +- clocks: phandle for the clock, see the description of clock-names below. +- clock-names: Contains the name of the clock: + "bus", the bus clock, sometimes described as pclk, for register accesses. + +Example: + pinctrl: pin-controller@40067000 { + compatible = "renesas,r9a06g032-pinctrl", "renesas,rzn1-pinctrl"; + reg = <0x40067000 0x1000>, <0x51000000 0x480>; + clocks = <&sysctrl R9A06G032_HCLK_PINCONFIG>; + clock-names = "bus"; + }; + +Sub-nodes +--------- + +The child nodes of the pin controller node describe a pin multiplexing +function. + +- Pin multiplexing sub-nodes: + A pin multiplexing sub-node describes how to configure a set of + (or a single) pin in some desired alternate function mode. + A single sub-node may define several pin configurations. + Please refer to pinctrl-bindings.txt to get to know more on generic + pin properties usage. + + The allowed generic formats for a pin multiplexing sub-node are the + following ones: + + node-1 { + pinmux = <PIN_ID_AND_MUX>, <PIN_ID_AND_MUX>, ... ; + GENERIC_PINCONFIG; + }; + + node-2 { + sub-node-1 { + pinmux = <PIN_ID_AND_MUX>, <PIN_ID_AND_MUX>, ... ; + GENERIC_PINCONFIG; + }; + + sub-node-2 { + pinmux = <PIN_ID_AND_MUX>, <PIN_ID_AND_MUX>, ... ; + GENERIC_PINCONFIG; + }; + + ... + + sub-node-n { + pinmux = <PIN_ID_AND_MUX>, <PIN_ID_AND_MUX>, ... ; + GENERIC_PINCONFIG; + }; + }; + + node-3 { + pinmux = <PIN_ID_AND_MUX>, <PIN_ID_AND_MUX>, ... ; + GENERIC_PINCONFIG; + + sub-node-1 { + pinmux = <PIN_ID_AND_MUX>, <PIN_ID_AND_MUX>, ... ; + GENERIC_PINCONFIG; + }; + + ... + + sub-node-n { + pinmux = <PIN_ID_AND_MUX>, <PIN_ID_AND_MUX>, ... ; + GENERIC_PINCONFIG; + }; + }; + + Use the latter two formats when pins part of the same logical group need to + have different generic pin configuration flags applied. Note that the generic + pinconfig in node-3 does not apply to the sub-nodes. + + Client sub-nodes shall refer to pin multiplexing sub-nodes using the phandle + of the most external one. + + Eg. + + client-1 { + ... + pinctrl-0 = <&node-1>; + ... + }; + + client-2 { + ... + pinctrl-0 = <&node-2>; + ... + }; + + Required properties: + - pinmux: + integer array representing pin number and pin multiplexing configuration. + When a pin has to be configured in alternate function mode, use this + property to identify the pin by its global index, and provide its + alternate function configuration number along with it. + When multiple pins are required to be configured as part of the same + alternate function they shall be specified as members of the same + argument list of a single "pinmux" property. + Integers values in the "pinmux" argument list are assembled as: + (PIN | MUX_FUNC << 8) + where PIN directly corresponds to the pl_gpio pin number and MUX_FUNC is + one of the alternate function identifiers defined in: + <include/dt-bindings/pinctrl/rzn1-pinctrl.h> + These identifiers collapse the IO Multiplex Configuration Level 1 and + Level 2 numbers that are detailed in the hardware reference manual into a + single number. The identifiers for Level 2 are simply offset by 10. + Additional identifiers are provided to specify the MDIO source peripheral. + + Optional generic pinconf properties: + - bias-disable - disable any pin bias + - bias-pull-up - pull up the pin with 50 KOhm + - bias-pull-down - pull down the pin with 50 KOhm + - bias-high-impedance - high impedance mode + - drive-strength - sink or source at most 4, 6, 8 or 12 mA + + Example: + A serial communication interface with a TX output pin and an RX input pin. + + &pinctrl { + pins_uart0: pins_uart0 { + pinmux = < + RZN1_PINMUX(103, RZN1_FUNC_UART0_I) /* UART0_TXD */ + RZN1_PINMUX(104, RZN1_FUNC_UART0_I) /* UART0_RXD */ + >; + }; + }; + + Example 2: + Here we set the pull up on the RXD pin of the UART. + + &pinctrl { + pins_uart0: pins_uart0 { + pinmux = <RZN1_PINMUX(103, RZN1_FUNC_UART0_I)>; /* TXD */ + + pins_uart6_rx { + pinmux = <RZN1_PINMUX(104, RZN1_FUNC_UART0_I)>; /* RXD */ + bias-pull-up; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/power/reset/qcom,pon.txt b/Documentation/devicetree/bindings/power/reset/qcom,pon.txt index 651491bb63b7..5705f575862d 100644 --- a/Documentation/devicetree/bindings/power/reset/qcom,pon.txt +++ b/Documentation/devicetree/bindings/power/reset/qcom,pon.txt @@ -6,7 +6,10 @@ and resin along with the Android reboot-mode. This DT node has pwrkey and resin as sub nodes. Required Properties: --compatible: "qcom,pm8916-pon" +-compatible: Must be one of: + "qcom,pm8916-pon" + "qcom,pms405-pon" + -reg: Specifies the physical address of the pon register Optional subnode: diff --git a/Documentation/devicetree/bindings/power/supply/bq25890.txt b/Documentation/devicetree/bindings/power/supply/bq25890.txt index c9dd17d142ad..dc0568933359 100644 --- a/Documentation/devicetree/bindings/power/supply/bq25890.txt +++ b/Documentation/devicetree/bindings/power/supply/bq25890.txt @@ -1,5 +1,8 @@ Binding for TI bq25890 Li-Ion Charger +This driver will support the bq25896 and the bq25890. There are other ICs +in the same family but those have not been tested. + Required properties: - compatible: Should contain one of the following: * "ti,bq25890" diff --git a/Documentation/devicetree/bindings/power/supply/bq27xxx.txt b/Documentation/devicetree/bindings/power/supply/bq27xxx.txt index 37994fdb18ca..4fa8e08df2b6 100644 --- a/Documentation/devicetree/bindings/power/supply/bq27xxx.txt +++ b/Documentation/devicetree/bindings/power/supply/bq27xxx.txt @@ -23,6 +23,7 @@ Required properties: * "ti,bq27546" - BQ27546 * "ti,bq27742" - BQ27742 * "ti,bq27545" - BQ27545 + * "ti,bq27411" - BQ27411 * "ti,bq27421" - BQ27421 * "ti,bq27425" - BQ27425 * "ti,bq27426" - BQ27426 diff --git a/Documentation/devicetree/bindings/power/supply/sc2731_charger.txt b/Documentation/devicetree/bindings/power/supply/sc2731_charger.txt new file mode 100644 index 000000000000..5266fab16575 --- /dev/null +++ b/Documentation/devicetree/bindings/power/supply/sc2731_charger.txt @@ -0,0 +1,40 @@ +Spreadtrum SC2731 PMIC battery charger binding + +Required properties: + - compatible: Should be "sprd,sc2731-charger". + - reg: Address offset of charger register. + - phys: Contains a phandle to the USB phy. + +Optional Properties: +- monitored-battery: phandle of battery characteristics devicetree node. + The charger uses the following battery properties: +- charge-term-current-microamp: current for charge termination phase. +- constant-charge-voltage-max-microvolt: maximum constant input voltage. + See Documentation/devicetree/bindings/power/supply/battery.txt + +Example: + + bat: battery { + compatible = "simple-battery"; + charge-term-current-microamp = <120000>; + constant-charge-voltage-max-microvolt = <4350000>; + ...... + }; + + sc2731_pmic: pmic@0 { + compatible = "sprd,sc2731"; + reg = <0>; + spi-max-frequency = <26000000>; + interrupts = <GIC_SPI 31 IRQ_TYPE_LEVEL_HIGH>; + interrupt-controller; + #interrupt-cells = <2>; + #address-cells = <1>; + #size-cells = <0>; + + charger@0 { + compatible = "sprd,sc2731-charger"; + reg = <0x0>; + phys = <&ssphy>; + monitored-battery = <&bat>; + }; + }; diff --git a/Documentation/devicetree/bindings/regulator/pfuze100.txt b/Documentation/devicetree/bindings/regulator/pfuze100.txt index c7610718adff..f9be1acf891c 100644 --- a/Documentation/devicetree/bindings/regulator/pfuze100.txt +++ b/Documentation/devicetree/bindings/regulator/pfuze100.txt @@ -12,6 +12,11 @@ Optional properties: disabled. This binding is a workaround to keep backward compatibility with old dtb's which rely on the fact that the switched regulators are always on and don't mark them explicit as "regulator-always-on". +- fsl,pmic-stby-poweroff: if present, configure the PMIC to shutdown all + power rails when PMIC_STBY_REQ line is asserted during the power off sequence. + Use this option if the SoC should be powered off by external power + management IC (PMIC) on PMIC_STBY_REQ signal. + As opposite to PMIC_STBY_REQ boards can implement PMIC_ON_REQ signal. Required child node: - regulators: This is the list of child nodes that specify the regulator diff --git a/Documentation/devicetree/bindings/regulator/qcom,smd-rpm-regulator.txt b/Documentation/devicetree/bindings/regulator/qcom,smd-rpm-regulator.txt index 58a1d97972f5..45025b5b67f6 100644 --- a/Documentation/devicetree/bindings/regulator/qcom,smd-rpm-regulator.txt +++ b/Documentation/devicetree/bindings/regulator/qcom,smd-rpm-regulator.txt @@ -26,6 +26,7 @@ Regulator nodes are identified by their compatible: "qcom,rpm-pm8998-regulators" "qcom,rpm-pma8084-regulators" "qcom,rpm-pmi8998-regulators" + "qcom,rpm-pms405-regulators" - vdd_s1-supply: - vdd_s2-supply: @@ -188,6 +189,24 @@ Regulator nodes are identified by their compatible: Definition: reference to regulator supplying the input pin, as described in the data sheet +- vdd_s1-supply: +- vdd_s2-supply: +- vdd_s3-supply: +- vdd_s4-supply: +- vdd_s5-supply: +- vdd_l1_l2-supply: +- vdd_l3_l8-supply: +- vdd_l4-supply: +- vdd_l5_l6-supply: +- vdd_l7-supply: +- vdd_l3_l8-supply: +- vdd_l9-supply: +- vdd_l10_l11_l12_l13-supply: + Usage: optional (pms405 only) + Value type: <phandle> + Definition: reference to regulator supplying the input pin, as + described in the data sheet + The regulator node houses sub-nodes for each regulator within the device. Each sub-node is identified using the node's name, with valid values listed for each of the pmics below. @@ -222,6 +241,10 @@ pma8084: pmi8998: bob +pms405: + s1, s2, s3, s4, s5, l1, l2, l3, l4, l5, l6, l7, l8, l9, l10, l11, l12, + l13 + The content of each sub-node is defined by the standard binding for regulators - see regulator.txt. diff --git a/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt b/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt index 76ead07072b1..4b98ca26e61a 100644 --- a/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt +++ b/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt @@ -1,7 +1,9 @@ -ROHM BD71837 Power Management Integrated Circuit (PMIC) regulator bindings +ROHM BD71837 and BD71847 Power Management Integrated Circuit regulator bindings Required properties: - - regulator-name: should be "buck1", ..., "buck8" and "ldo1", ..., "ldo7" + - regulator-name: should be "buck1", ..., "buck8" and "ldo1", ..., "ldo7" for + BD71837. For BD71847 names should be "buck1", ..., "buck6" + and "ldo1", ..., "ldo6" List of regulators provided by this controller. BD71837 regulators node should be sub node of the BD71837 MFD node. See BD71837 MFD bindings at @@ -16,10 +18,14 @@ disabled by driver at startup. LDO5 and LDO6 are supplied by those and if they are disabled at startup the voltage monitoring for LDO5/LDO6 will cause PMIC to reset. -The valid names for regulator nodes are: +The valid names for BD71837 regulator nodes are: BUCK1, BUCK2, BUCK3, BUCK4, BUCK5, BUCK6, BUCK7, BUCK8 LDO1, LDO2, LDO3, LDO4, LDO5, LDO6, LDO7 +The valid names for BD71847 regulator nodes are: +BUCK1, BUCK2, BUCK3, BUCK4, BUCK5, BUCK6 +LDO1, LDO2, LDO3, LDO4, LDO5, LDO6 + Optional properties: - Any optional property defined in bindings/regulator/regulator.txt diff --git a/Documentation/devicetree/bindings/regulator/st,stpmic1-regulator.txt b/Documentation/devicetree/bindings/regulator/st,stpmic1-regulator.txt new file mode 100644 index 000000000000..a3f476240565 --- /dev/null +++ b/Documentation/devicetree/bindings/regulator/st,stpmic1-regulator.txt @@ -0,0 +1,68 @@ +STMicroelectronics STPMIC1 Voltage regulators + +Regulator Nodes are optional depending on needs. + +Available Regulators in STPMIC1 device are: + - buck1 for Buck BUCK1 + - buck2 for Buck BUCK2 + - buck3 for Buck BUCK3 + - buck4 for Buck BUCK4 + - ldo1 for LDO LDO1 + - ldo2 for LDO LDO2 + - ldo3 for LDO LDO3 + - ldo4 for LDO LDO4 + - ldo5 for LDO LDO5 + - ldo6 for LDO LDO6 + - vref_ddr for LDO Vref DDR + - boost for Buck BOOST + - pwr_sw1 for VBUS_OTG switch + - pwr_sw2 for SW_OUT switch + +Switches are fixed voltage regulators with only enable/disable capability. + +Optional properties: +- st,mask-reset: mask reset for this regulator: the regulator configuration + is maintained during pmic reset. +- regulator-pull-down: enable high pull down + if not specified light pull down is used +- regulator-over-current-protection: + if set, all regulators are switched off in case of over-current detection + on this regulator, + if not set, the driver only sends an over-current event. +- interrupt-parent: phandle to the parent interrupt controller +- interrupts: index of current limit detection interrupt +- <regulator>-supply: phandle to the parent supply/regulator node + each regulator supply can be described except vref_ddr. + +Example: +regulators { + compatible = "st,stpmic1-regulators"; + + ldo6-supply = <&v3v3>; + + vdd_core: buck1 { + regulator-name = "vdd_core"; + interrupts = <IT_CURLIM_BUCK1 0>; + interrupt-parent = <&pmic>; + st,mask-reset; + regulator-pull-down; + regulator-min-microvolt = <700000>; + regulator-max-microvolt = <1200000>; + }; + + v3v3: buck4 { + regulator-name = "v3v3"; + interrupts = <IT_CURLIM_BUCK4 0>; + interrupt-parent = <&mypmic>; + + regulator-min-microvolt = <3300000>; + regulator-max-microvolt = <3300000>; + }; + + v1v8: ldo6 { + regulator-name = "v1v8"; + regulator-min-microvolt = <1800000>; + regulator-max-microvolt = <1800000>; + regulator-over-current-protection; + }; +}; diff --git a/Documentation/devicetree/bindings/soc/fsl/cpm_qe/network.txt b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/network.txt index 03c741602c6d..6d2dd8a31482 100644 --- a/Documentation/devicetree/bindings/soc/fsl/cpm_qe/network.txt +++ b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/network.txt @@ -98,6 +98,12 @@ The property below is dependent on fsl,tdm-interface: usage: optional for tdm interface value type: <empty> Definition : Internal loopback connecting on TDM layer. +- fsl,hmask + usage: optional + Value type: <u16> + Definition: HDLC address recognition. Set to zero to disable + address filtering of packets: + fsl,hmask = /bits/ 16 <0x0000>; Example for tdm interface: diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,geni-se.txt b/Documentation/devicetree/bindings/soc/qcom/qcom,geni-se.txt index ff92e5a41bed..dab7ca9f250c 100644 --- a/Documentation/devicetree/bindings/soc/qcom/qcom,geni-se.txt +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,geni-se.txt @@ -53,20 +53,8 @@ Required properties: - clocks: Serial engine core clock needed by the device. Qualcomm Technologies Inc. GENI Serial Engine based SPI Controller - -Required properties: -- compatible: Must contain "qcom,geni-spi". -- reg: Must contain SPI register location and length. -- interrupts: Must contain SPI controller interrupts. -- clock-names: Must contain "se". -- clocks: Serial engine core clock needed by the device. -- spi-max-frequency: Specifies maximum SPI clock frequency, units - Hz. -- #address-cells: Must be <1> to define a chip select address on - the SPI bus. -- #size-cells: Must be <0>. - -SPI slave nodes must be children of the SPI master node and conform to SPI bus -binding as described in Documentation/devicetree/bindings/spi/spi-bus.txt. +node binding is described in +Documentation/devicetree/bindings/spi/qcom,spi-geni-qcom.txt. Example: geniqup@8c0000 { @@ -103,17 +91,4 @@ Example: pinctrl-1 = <&qup_1_uart_3_sleep>; }; - spi0: spi@a84000 { - compatible = "qcom,geni-spi"; - reg = <0xa84000 0x4000>; - interrupts = <GIC_SPI 354 IRQ_TYPE_LEVEL_HIGH>; - clock-names = "se"; - clocks = <&clock_gcc GCC_QUPV3_WRAP0_S0_CLK>; - pinctrl-names = "default", "sleep"; - pinctrl-0 = <&qup_1_spi_2_active>; - pinctrl-1 = <&qup_1_spi_2_sleep>; - spi-max-frequency = <19200000>; - #address-cells = <1>; - #size-cells = <0>; - }; } diff --git a/Documentation/devicetree/bindings/spi/qcom,spi-geni-qcom.txt b/Documentation/devicetree/bindings/spi/qcom,spi-geni-qcom.txt new file mode 100644 index 000000000000..790311a42bf1 --- /dev/null +++ b/Documentation/devicetree/bindings/spi/qcom,spi-geni-qcom.txt @@ -0,0 +1,39 @@ +GENI based Qualcomm Universal Peripheral (QUP) Serial Peripheral Interface (SPI) + +The QUP v3 core is a GENI based AHB slave that provides a common data path +(an output FIFO and an input FIFO) for serial peripheral interface (SPI) +mini-core. + +SPI in master mode supports up to 50MHz, up to four chip selects, programmable +data path from 4 bits to 32 bits and numerous protocol variants. + +Required properties: +- compatible: Must contain "qcom,geni-spi". +- reg: Must contain SPI register location and length. +- interrupts: Must contain SPI controller interrupts. +- clock-names: Must contain "se". +- clocks: Serial engine core clock needed by the device. +- #address-cells: Must be <1> to define a chip select address on + the SPI bus. +- #size-cells: Must be <0>. + +SPI Controller nodes must be child of GENI based Qualcomm Universal +Peripharal. Please refer GENI based QUP wrapper controller node bindings +described in Documentation/devicetree/bindings/soc/qcom/qcom,geni-se.txt. + +SPI slave nodes must be children of the SPI master node and conform to SPI bus +binding as described in Documentation/devicetree/bindings/spi/spi-bus.txt. + +Example: + spi0: spi@a84000 { + compatible = "qcom,geni-spi"; + reg = <0xa84000 0x4000>; + interrupts = <GIC_SPI 354 IRQ_TYPE_LEVEL_HIGH>; + clock-names = "se"; + clocks = <&clock_gcc GCC_QUPV3_WRAP0_S0_CLK>; + pinctrl-names = "default", "sleep"; + pinctrl-0 = <&qup_1_spi_2_active>; + pinctrl-1 = <&qup_1_spi_2_sleep>; + #address-cells = <1>; + #size-cells = <0>; + }; diff --git a/Documentation/devicetree/bindings/spi/qcom,spi-qcom-qspi.txt b/Documentation/devicetree/bindings/spi/qcom,spi-qcom-qspi.txt new file mode 100644 index 000000000000..1d64b61f5171 --- /dev/null +++ b/Documentation/devicetree/bindings/spi/qcom,spi-qcom-qspi.txt @@ -0,0 +1,36 @@ +Qualcomm Quad Serial Peripheral Interface (QSPI) + +The QSPI controller allows SPI protocol communication in single, dual, or quad +wire transmission modes for read/write access to slaves such as NOR flash. + +Required properties: +- compatible: An SoC specific identifier followed by "qcom,qspi-v1", such as + "qcom,sdm845-qspi", "qcom,qspi-v1" +- reg: Should contain the base register location and length. +- interrupts: Interrupt number used by the controller. +- clocks: Should contain the core and AHB clock. +- clock-names: Should be "core" for core clock and "iface" for AHB clock. + +SPI slave nodes must be children of the SPI master node and can contain +properties described in Documentation/devicetree/bindings/spi/spi-bus.txt + +Example: + + qspi: spi@88df000 { + compatible = "qcom,sdm845-qspi", "qcom,qspi-v1"; + reg = <0x88df000 0x600>; + #address-cells = <1>; + #size-cells = <0>; + interrupts = <GIC_SPI 82 IRQ_TYPE_LEVEL_HIGH>; + clock-names = "iface", "core"; + clocks = <&gcc GCC_QSPI_CNOC_PERIPH_AHB_CLK>, + <&gcc GCC_QSPI_CORE_CLK>; + + flash@0 { + compatible = "jedec,spi-nor"; + reg = <0>; + spi-max-frequency = <25000000>; + spi-tx-bus-width = <2>; + spi-rx-bus-width = <2>; + }; + }; diff --git a/Documentation/devicetree/bindings/spi/sh-msiof.txt b/Documentation/devicetree/bindings/spi/sh-msiof.txt index bfbc2035fb6b..4b836ad17b19 100644 --- a/Documentation/devicetree/bindings/spi/sh-msiof.txt +++ b/Documentation/devicetree/bindings/spi/sh-msiof.txt @@ -2,7 +2,9 @@ Renesas MSIOF spi controller Required properties: - compatible : "renesas,msiof-r8a7743" (RZ/G1M) + "renesas,msiof-r8a7744" (RZ/G1N) "renesas,msiof-r8a7745" (RZ/G1E) + "renesas,msiof-r8a774a1" (RZ/G2M) "renesas,msiof-r8a7790" (R-Car H2) "renesas,msiof-r8a7791" (R-Car M2-W) "renesas,msiof-r8a7792" (R-Car V2H) @@ -11,10 +13,14 @@ Required properties: "renesas,msiof-r8a7795" (R-Car H3) "renesas,msiof-r8a7796" (R-Car M3-W) "renesas,msiof-r8a77965" (R-Car M3-N) + "renesas,msiof-r8a77970" (R-Car V3M) + "renesas,msiof-r8a77980" (R-Car V3H) + "renesas,msiof-r8a77990" (R-Car E3) + "renesas,msiof-r8a77995" (R-Car D3) "renesas,msiof-sh73a0" (SH-Mobile AG5) "renesas,sh-mobile-msiof" (generic SH-Mobile compatibile device) "renesas,rcar-gen2-msiof" (generic R-Car Gen2 and RZ/G1 compatible device) - "renesas,rcar-gen3-msiof" (generic R-Car Gen3 compatible device) + "renesas,rcar-gen3-msiof" (generic R-Car Gen3 and RZ/G2 compatible device) "renesas,sh-msiof" (deprecated) When compatible with the generic version, nodes diff --git a/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.txt b/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.txt index 642d3fb1ef85..2864bc6b659c 100644 --- a/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.txt +++ b/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.txt @@ -2,7 +2,7 @@ Synopsys DesignWare AMBA 2.0 Synchronous Serial Interface. Required properties: - compatible : "snps,dw-apb-ssi" or "mscc,<soc>-spi", where soc is "ocelot" or - "jaguar2" + "jaguar2", or "amazon,alpine-dw-apb-ssi" - reg : The register base for the controller. For "mscc,<soc>-spi", a second register set is required (named ICPU_CFG:SPI_MST) - interrupts : One interrupt, used by the controller. diff --git a/Documentation/devicetree/bindings/spi/spi-fsl-lpspi.txt b/Documentation/devicetree/bindings/spi/spi-fsl-lpspi.txt index 4af132606b37..8d178a4503cf 100644 --- a/Documentation/devicetree/bindings/spi/spi-fsl-lpspi.txt +++ b/Documentation/devicetree/bindings/spi/spi-fsl-lpspi.txt @@ -3,6 +3,7 @@ Required properties: - compatible : - "fsl,imx7ulp-spi" for LPSPI compatible with the one integrated on i.MX7ULP soc + - "fsl,imx8qxp-spi" for LPSPI compatible with the one integrated on i.MX8QXP soc - reg : address and length of the lpspi master registers - interrupts : lpspi interrupt - clocks : lpspi clock specifier diff --git a/Documentation/devicetree/bindings/spi/spi-pxa2xx.txt b/Documentation/devicetree/bindings/spi/spi-pxa2xx.txt new file mode 100644 index 000000000000..0335a9bd2e8a --- /dev/null +++ b/Documentation/devicetree/bindings/spi/spi-pxa2xx.txt @@ -0,0 +1,24 @@ +PXA2xx SSP SPI Controller + +Required properties: +- compatible: Must be "marvell,mmp2-ssp". +- reg: Offset and length of the device's register set. +- interrupts: Should be the interrupt number. +- clocks: Should contain a single entry describing the clock input. +- #address-cells: Number of cells required to define a chip select address. +- #size-cells: Should be zero. + +Optional properties: +- cs-gpios: list of GPIO chip selects. See the SPI bus bindings, + Documentation/devicetree/bindings/spi/spi-bus.txt + +Child nodes represent devices on the SPI bus + See ../spi/spi-bus.txt + +Example: + ssp1: spi@d4035000 { + compatible = "marvell,mmp2-ssp"; + reg = <0xd4035000 0x1000>; + clocks = <&soc_clocks MMP2_CLK_SSP0>; + interrupts = <0>; + }; diff --git a/Documentation/devicetree/bindings/spi/spi-rspi.txt b/Documentation/devicetree/bindings/spi/spi-rspi.txt index 96fd58548f69..fc97ad64fbf2 100644 --- a/Documentation/devicetree/bindings/spi/spi-rspi.txt +++ b/Documentation/devicetree/bindings/spi/spi-rspi.txt @@ -3,7 +3,7 @@ Device tree configuration for Renesas RSPI/QSPI driver Required properties: - compatible : For Renesas Serial Peripheral Interface on legacy SH: "renesas,rspi-<soctype>", "renesas,rspi" as fallback. - For Renesas Serial Peripheral Interface on RZ/A1H: + For Renesas Serial Peripheral Interface on RZ/A: "renesas,rspi-<soctype>", "renesas,rspi-rz" as fallback. For Quad Serial Peripheral Interface on R-Car Gen2 and RZ/G1 devices: @@ -11,7 +11,9 @@ Required properties: Examples with soctypes are: - "renesas,rspi-sh7757" (SH) - "renesas,rspi-r7s72100" (RZ/A1H) + - "renesas,rspi-r7s9210" (RZ/A2) - "renesas,qspi-r8a7743" (RZ/G1M) + - "renesas,qspi-r8a7744" (RZ/G1N) - "renesas,qspi-r8a7745" (RZ/G1E) - "renesas,qspi-r8a7790" (R-Car H2) - "renesas,qspi-r8a7791" (R-Car M2-W) diff --git a/Documentation/devicetree/bindings/spi/spi-slave-mt27xx.txt b/Documentation/devicetree/bindings/spi/spi-slave-mt27xx.txt new file mode 100644 index 000000000000..c37e5a179b21 --- /dev/null +++ b/Documentation/devicetree/bindings/spi/spi-slave-mt27xx.txt @@ -0,0 +1,32 @@ +Binding for MTK SPI Slave controller + +Required properties: +- compatible: should be one of the following. + - mediatek,mt2712-spi-slave: for mt2712 platforms +- reg: Address and length of the register set for the device. +- interrupts: Should contain spi interrupt. +- clocks: phandles to input clocks. + It's clock gate, and should be <&infracfg CLK_INFRA_AO_SPI1>. +- clock-names: should be "spi" for the clock gate. + +Optional properties: +- assigned-clocks: it's mux clock, should be <&topckgen CLK_TOP_SPISLV_SEL>. +- assigned-clock-parents: parent of mux clock. + It's PLL, and should be one of the following. + - <&topckgen CLK_TOP_UNIVPLL1_D2>: specify parent clock 312MHZ. + It's the default one. + - <&topckgen CLK_TOP_UNIVPLL1_D4>: specify parent clock 156MHZ. + - <&topckgen CLK_TOP_UNIVPLL2_D4>: specify parent clock 104MHZ. + - <&topckgen CLK_TOP_UNIVPLL1_D8>: specify parent clock 78MHZ. + +Example: +- SoC Specific Portion: +spis1: spi@10013000 { + compatible = "mediatek,mt2712-spi-slave"; + reg = <0 0x10013000 0 0x100>; + interrupts = <GIC_SPI 283 IRQ_TYPE_LEVEL_LOW>; + clocks = <&infracfg CLK_INFRA_AO_SPI1>; + clock-names = "spi"; + assigned-clocks = <&topckgen CLK_TOP_SPISLV_SEL>; + assigned-clock-parents = <&topckgen CLK_TOP_UNIVPLL1_D2>; +}; diff --git a/Documentation/devicetree/bindings/spi/spi-sprd.txt b/Documentation/devicetree/bindings/spi/spi-sprd.txt new file mode 100644 index 000000000000..bad211a19da4 --- /dev/null +++ b/Documentation/devicetree/bindings/spi/spi-sprd.txt @@ -0,0 +1,26 @@ +Spreadtrum SPI Controller + +Required properties: +- compatible: Should be "sprd,sc9860-spi". +- reg: Offset and length of SPI controller register space. +- interrupts: Should contain SPI interrupt. +- clock-names: Should contain following entries: + "spi" for SPI clock, + "source" for SPI source (parent) clock, + "enable" for SPI module enable clock. +- clocks: List of clock input name strings sorted in the same order + as the clock-names property. +- #address-cells: The number of cells required to define a chip select + address on the SPI bus. Should be set to 1. +- #size-cells: Should be set to 0. + +Example: +spi0: spi@70a00000{ + compatible = "sprd,sc9860-spi"; + reg = <0 0x70a00000 0 0x1000>; + interrupts = <GIC_SPI 7 IRQ_TYPE_LEVEL_HIGH>; + clock-names = "spi", "source","enable"; + clocks = <&clk_spi0>, <&ext_26m>, <&clk_ap_apb_gates 5>; + #address-cells = <1>; + #size-cells = <0>; +}; diff --git a/Documentation/devicetree/bindings/spi/spi-stm32-qspi.txt b/Documentation/devicetree/bindings/spi/spi-stm32-qspi.txt new file mode 100644 index 000000000000..adeeb63e84b9 --- /dev/null +++ b/Documentation/devicetree/bindings/spi/spi-stm32-qspi.txt @@ -0,0 +1,44 @@ +* STMicroelectronics Quad Serial Peripheral Interface(QSPI) + +Required properties: +- compatible: should be "st,stm32f469-qspi" +- reg: the first contains the register location and length. + the second contains the memory mapping address and length +- reg-names: should contain the reg names "qspi" "qspi_mm" +- interrupts: should contain the interrupt for the device +- clocks: the phandle of the clock needed by the QSPI controller +- A pinctrl must be defined to set pins in mode of operation for QSPI transfer + +Optional properties: +- resets: must contain the phandle to the reset controller. + +A spi flash (NOR/NAND) must be a child of spi node and could have some +properties. Also see jedec,spi-nor.txt. + +Required properties: +- reg: chip-Select number (QSPI controller may connect 2 flashes) +- spi-max-frequency: max frequency of spi bus + +Optional property: +- spi-rx-bus-width: see ./spi-bus.txt for the description + +Example: + +qspi: spi@a0001000 { + compatible = "st,stm32f469-qspi"; + reg = <0xa0001000 0x1000>, <0x90000000 0x10000000>; + reg-names = "qspi", "qspi_mm"; + interrupts = <91>; + resets = <&rcc STM32F4_AHB3_RESET(QSPI)>; + clocks = <&rcc 0 STM32F4_AHB3_CLOCK(QSPI)>; + pinctrl-names = "default"; + pinctrl-0 = <&pinctrl_qspi0>; + + flash@0 { + compatible = "jedec,spi-nor"; + reg = <0>; + spi-rx-bus-width = <4>; + spi-max-frequency = <108000000>; + ... + }; +}; diff --git a/Documentation/driver-api/basics.rst b/Documentation/driver-api/basics.rst index 826e85d50a16..e970fadf4d1a 100644 --- a/Documentation/driver-api/basics.rst +++ b/Documentation/driver-api/basics.rst @@ -121,6 +121,9 @@ Kernel utility functions .. kernel-doc:: kernel/rcu/update.c :export: +.. kernel-doc:: include/linux/overflow.h + :internal: + Device Resource Management -------------------------- diff --git a/Documentation/driver-api/firewire.rst b/Documentation/driver-api/firewire.rst new file mode 100644 index 000000000000..94a2d7f01d99 --- /dev/null +++ b/Documentation/driver-api/firewire.rst @@ -0,0 +1,48 @@ +=========================================== +Firewire (IEEE 1394) driver Interface Guide +=========================================== + +Introduction and Overview +========================= + +The Linux FireWire subsystem adds some interfaces into the Linux system to + use/maintain+any resource on IEEE 1394 bus. + +The main purpose of these interfaces is to access address space on each node +on IEEE 1394 bus by ISO/IEC 13213 (IEEE 1212) procedure, and to control +isochronous resources on the bus by IEEE 1394 procedure. + +Two types of interfaces are added, according to consumers of the interface. A +set of userspace interfaces is available via `firewire character devices`. A set +of kernel interfaces is available via exported symbols in `firewire-core` module. + +Firewire char device data structures +==================================== + +.. include:: /ABI/stable/firewire-cdev + :literal: + +.. kernel-doc:: include/uapi/linux/firewire-cdev.h + :internal: + +Firewire device probing and sysfs interfaces +============================================ + +.. include:: /ABI/stable/sysfs-bus-firewire + :literal: + +.. kernel-doc:: drivers/firewire/core-device.c + :export: + +Firewire core transaction interfaces +==================================== + +.. kernel-doc:: drivers/firewire/core-transaction.c + :export: + +Firewire Isochronous I/O interfaces +=================================== + +.. kernel-doc:: drivers/firewire/core-iso.c + :export: + diff --git a/Documentation/driver-api/gpio/board.rst b/Documentation/driver-api/gpio/board.rst index 2c112553df84..a0f294e2e250 100644 --- a/Documentation/driver-api/gpio/board.rst +++ b/Documentation/driver-api/gpio/board.rst @@ -193,3 +193,27 @@ And the table can be added to the board code as follows:: The line will be hogged as soon as the gpiochip is created or - in case the chip was created earlier - when the hog table is registered. + +Arrays of pins +-------------- +In addition to requesting pins belonging to a function one by one, a device may +also request an array of pins assigned to the function. The way those pins are +mapped to the device determines if the array qualifies for fast bitmap +processing. If yes, a bitmap is passed over get/set array functions directly +between a caller and a respective .get/set_multiple() callback of a GPIO chip. + +In order to qualify for fast bitmap processing, the array must meet the +following requirements: +- pin hardware number of array member 0 must also be 0, +- pin hardware numbers of consecutive array members which belong to the same + chip as member 0 does must also match their array indexes. + +Otherwise fast bitmap processing path is not used in order to avoid consecutive +pins which belong to the same chip but are not in hardware order being processed +separately. + +If the array applies for fast bitmap processing path, pins which belong to +different chips than member 0 does, as well as those with indexes different from +their hardware pin numbers, are excluded from the fast path, both input and +output. Moreover, open drain and open source pins are excluded from fast bitmap +output processing. diff --git a/Documentation/driver-api/gpio/consumer.rst b/Documentation/driver-api/gpio/consumer.rst index aa03f389d41d..5e4d8aa68913 100644 --- a/Documentation/driver-api/gpio/consumer.rst +++ b/Documentation/driver-api/gpio/consumer.rst @@ -109,9 +109,11 @@ For a function using multiple GPIOs all of those can be obtained with one call:: enum gpiod_flags flags) This function returns a struct gpio_descs which contains an array of -descriptors:: +descriptors. It also contains a pointer to a gpiolib private structure which, +if passed back to get/set array functions, may speed up I/O proocessing:: struct gpio_descs { + struct gpio_array *info; unsigned int ndescs; struct gpio_desc *desc[]; } @@ -323,29 +325,37 @@ The following functions get or set the values of an array of GPIOs:: int gpiod_get_array_value(unsigned int array_size, struct gpio_desc **desc_array, - int *value_array); + struct gpio_array *array_info, + unsigned long *value_bitmap); int gpiod_get_raw_array_value(unsigned int array_size, struct gpio_desc **desc_array, - int *value_array); + struct gpio_array *array_info, + unsigned long *value_bitmap); int gpiod_get_array_value_cansleep(unsigned int array_size, struct gpio_desc **desc_array, - int *value_array); + struct gpio_array *array_info, + unsigned long *value_bitmap); int gpiod_get_raw_array_value_cansleep(unsigned int array_size, struct gpio_desc **desc_array, - int *value_array); - - void gpiod_set_array_value(unsigned int array_size, - struct gpio_desc **desc_array, - int *value_array) - void gpiod_set_raw_array_value(unsigned int array_size, - struct gpio_desc **desc_array, - int *value_array) - void gpiod_set_array_value_cansleep(unsigned int array_size, - struct gpio_desc **desc_array, - int *value_array) - void gpiod_set_raw_array_value_cansleep(unsigned int array_size, - struct gpio_desc **desc_array, - int *value_array) + struct gpio_array *array_info, + unsigned long *value_bitmap); + + int gpiod_set_array_value(unsigned int array_size, + struct gpio_desc **desc_array, + struct gpio_array *array_info, + unsigned long *value_bitmap) + int gpiod_set_raw_array_value(unsigned int array_size, + struct gpio_desc **desc_array, + struct gpio_array *array_info, + unsigned long *value_bitmap) + int gpiod_set_array_value_cansleep(unsigned int array_size, + struct gpio_desc **desc_array, + struct gpio_array *array_info, + unsigned long *value_bitmap) + int gpiod_set_raw_array_value_cansleep(unsigned int array_size, + struct gpio_desc **desc_array, + struct gpio_array *array_info, + unsigned long *value_bitmap) The array can be an arbitrary set of GPIOs. The functions will try to access GPIOs belonging to the same bank or chip simultaneously if supported by the @@ -356,8 +366,9 @@ accessed sequentially. The functions take three arguments: * array_size - the number of array elements * desc_array - an array of GPIO descriptors - * value_array - an array to store the GPIOs' values (get) or - an array of values to assign to the GPIOs (set) + * array_info - optional information obtained from gpiod_array_get() + * value_bitmap - a bitmap to store the GPIOs' values (get) or + a bitmap of values to assign to the GPIOs (set) The descriptor array can be obtained using the gpiod_get_array() function or one of its variants. If the group of descriptors returned by that function @@ -366,16 +377,25 @@ the struct gpio_descs returned by gpiod_get_array():: struct gpio_descs *my_gpio_descs = gpiod_get_array(...); gpiod_set_array_value(my_gpio_descs->ndescs, my_gpio_descs->desc, - my_gpio_values); + my_gpio_descs->info, my_gpio_value_bitmap); It is also possible to access a completely arbitrary array of descriptors. The descriptors may be obtained using any combination of gpiod_get() and gpiod_get_array(). Afterwards the array of descriptors has to be setup -manually before it can be passed to one of the above functions. +manually before it can be passed to one of the above functions. In that case, +array_info should be set to NULL. Note that for optimal performance GPIOs belonging to the same chip should be contiguous within the array of descriptors. +Still better performance may be achieved if array indexes of the descriptors +match hardware pin numbers of a single chip. If an array passed to a get/set +array function matches the one obtained from gpiod_get_array() and array_info +associated with the array is also passed, the function may take a fast bitmap +processing path, passing the value_bitmap argument directly to the respective +.get/set_multiple() callback of the chip. That allows for utilization of GPIO +banks as data I/O ports without much loss of performance. + The return value of gpiod_get_array_value() and its variants is 0 on success or negative on error. Note the difference to gpiod_get_value(), which returns 0 or 1 on success to convey the GPIO value. With the array functions, the GPIO diff --git a/Documentation/driver-api/gpio/driver.rst b/Documentation/driver-api/gpio/driver.rst index cbe0242842d1..a6c14ff0c54f 100644 --- a/Documentation/driver-api/gpio/driver.rst +++ b/Documentation/driver-api/gpio/driver.rst @@ -374,7 +374,28 @@ When implementing an irqchip inside a GPIO driver, these two functions should typically be called in the .startup() and .shutdown() callbacks from the irqchip. -When using the gpiolib irqchip helpers, these callback are automatically +When using the gpiolib irqchip helpers, these callbacks are automatically +assigned. + + +Disabling and enabling IRQs +--------------------------- +When a GPIO is used as an IRQ signal, then gpiolib also needs to know if +the IRQ is enabled or disabled. In order to inform gpiolib about this, +a driver should call:: + + void gpiochip_disable_irq(struct gpio_chip *chip, unsigned int offset) + +This allows drivers to drive the GPIO as an output while the IRQ is +disabled. When the IRQ is enabled again, a driver should call:: + + void gpiochip_enable_irq(struct gpio_chip *chip, unsigned int offset) + +When implementing an irqchip inside a GPIO driver, these two functions should +typically be called in the .irq_disable() and .irq_enable() callbacks from the +irqchip. + +When using the gpiolib irqchip helpers, these callbacks are automatically assigned. Real-Time compliance for GPIO IRQ chips diff --git a/Documentation/driver-api/gpio/index.rst b/Documentation/driver-api/gpio/index.rst index 6a374ded1287..c5b8467f9104 100644 --- a/Documentation/driver-api/gpio/index.rst +++ b/Documentation/driver-api/gpio/index.rst @@ -38,7 +38,7 @@ Device tree support Device-managed API ================== -.. kernel-doc:: drivers/gpio/devres.c +.. kernel-doc:: drivers/gpio/gpiolib-devres.c :export: sysfs helpers diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst index 6d9f2f9fe20e..909f991b4c0d 100644 --- a/Documentation/driver-api/index.rst +++ b/Documentation/driver-api/index.rst @@ -29,7 +29,8 @@ available subsections can be seen below. iio/index input usb/index - pci + firewire + pci/index spi i2c hsi diff --git a/Documentation/driver-api/mtdnand.rst b/Documentation/driver-api/mtdnand.rst index c55a6034c397..55447659b81f 100644 --- a/Documentation/driver-api/mtdnand.rst +++ b/Documentation/driver-api/mtdnand.rst @@ -180,10 +180,10 @@ by a chip select decoder. { struct nand_chip *this = mtd_to_nand(mtd); switch(cmd){ - case NAND_CTL_SETCLE: this->IO_ADDR_W |= CLE_ADRR_BIT; break; - case NAND_CTL_CLRCLE: this->IO_ADDR_W &= ~CLE_ADRR_BIT; break; - case NAND_CTL_SETALE: this->IO_ADDR_W |= ALE_ADRR_BIT; break; - case NAND_CTL_CLRALE: this->IO_ADDR_W &= ~ALE_ADRR_BIT; break; + case NAND_CTL_SETCLE: this->legacy.IO_ADDR_W |= CLE_ADRR_BIT; break; + case NAND_CTL_CLRCLE: this->legacy.IO_ADDR_W &= ~CLE_ADRR_BIT; break; + case NAND_CTL_SETALE: this->legacy.IO_ADDR_W |= ALE_ADRR_BIT; break; + case NAND_CTL_CLRALE: this->legacy.IO_ADDR_W &= ~ALE_ADRR_BIT; break; } } @@ -197,7 +197,7 @@ to read back the state of the pin. The function has no arguments and should return 0, if the device is busy (R/B pin is low) and 1, if the device is ready (R/B pin is high). If the hardware interface does not give access to the ready busy pin, then the function must not be defined -and the function pointer this->dev_ready is set to NULL. +and the function pointer this->legacy.dev_ready is set to NULL. Init function ------------- @@ -235,18 +235,18 @@ necessary information about the device. } /* Set address of NAND IO lines */ - this->IO_ADDR_R = baseaddr; - this->IO_ADDR_W = baseaddr; + this->legacy.IO_ADDR_R = baseaddr; + this->legacy.IO_ADDR_W = baseaddr; /* Reference hardware control function */ this->hwcontrol = board_hwcontrol; /* Set command delay time, see datasheet for correct value */ - this->chip_delay = CHIP_DEPENDEND_COMMAND_DELAY; + this->legacy.chip_delay = CHIP_DEPENDEND_COMMAND_DELAY; /* Assign the device ready function, if available */ - this->dev_ready = board_dev_ready; + this->legacy.dev_ready = board_dev_ready; this->eccmode = NAND_ECC_SOFT; /* Scan to find existence of the device */ - if (nand_scan (board_mtd, 1)) { + if (nand_scan (this, 1)) { err = -ENXIO; goto out_ior; } @@ -277,7 +277,7 @@ unregisters the partitions in the MTD layer. static void __exit board_cleanup (void) { /* Release resources, unregister device */ - nand_release (board_mtd); + nand_release (mtd_to_nand(board_mtd)); /* unmap physical address */ iounmap(baseaddr); @@ -336,17 +336,17 @@ connected to an address decoder. struct nand_chip *this = mtd_to_nand(mtd); /* Deselect all chips */ - this->IO_ADDR_R &= ~BOARD_NAND_ADDR_MASK; - this->IO_ADDR_W &= ~BOARD_NAND_ADDR_MASK; + this->legacy.IO_ADDR_R &= ~BOARD_NAND_ADDR_MASK; + this->legacy.IO_ADDR_W &= ~BOARD_NAND_ADDR_MASK; switch (chip) { case 0: - this->IO_ADDR_R |= BOARD_NAND_ADDR_CHIP0; - this->IO_ADDR_W |= BOARD_NAND_ADDR_CHIP0; + this->legacy.IO_ADDR_R |= BOARD_NAND_ADDR_CHIP0; + this->legacy.IO_ADDR_W |= BOARD_NAND_ADDR_CHIP0; break; .... case n: - this->IO_ADDR_R |= BOARD_NAND_ADDR_CHIPn; - this->IO_ADDR_W |= BOARD_NAND_ADDR_CHIPn; + this->legacy.IO_ADDR_R |= BOARD_NAND_ADDR_CHIPn; + this->legacy.IO_ADDR_W |= BOARD_NAND_ADDR_CHIPn; break; } } diff --git a/Documentation/driver-api/pci/index.rst b/Documentation/driver-api/pci/index.rst new file mode 100644 index 000000000000..c6cf1fef61ce --- /dev/null +++ b/Documentation/driver-api/pci/index.rst @@ -0,0 +1,22 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================================ +The Linux PCI driver implementer's API guide +============================================ + +.. class:: toc-title + + Table of contents + +.. toctree:: + :maxdepth: 2 + + pci + p2pdma + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst new file mode 100644 index 000000000000..4c577fa7bef9 --- /dev/null +++ b/Documentation/driver-api/pci/p2pdma.rst @@ -0,0 +1,145 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================ +PCI Peer-to-Peer DMA Support +============================ + +The PCI bus has pretty decent support for performing DMA transfers +between two devices on the bus. This type of transaction is henceforth +called Peer-to-Peer (or P2P). However, there are a number of issues that +make P2P transactions tricky to do in a perfectly safe way. + +One of the biggest issues is that PCI doesn't require forwarding +transactions between hierarchy domains, and in PCIe, each Root Port +defines a separate hierarchy domain. To make things worse, there is no +simple way to determine if a given Root Complex supports this or not. +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel +only supports doing P2P when the endpoints involved are all behind the +same PCI bridge, as such devices are all in the same PCI hierarchy +domain, and the spec guarantees that all transactions within the +hierarchy will be routable, but it does not require routing +between hierarchies. + +The second issue is that to make use of existing interfaces in Linux, +memory that is used for P2P transactions needs to be backed by struct +pages. However, PCI BARs are not typically cache coherent so there are +a few corner case gotchas with these pages so developers need to +be careful about what they do with them. + + +Driver Writer's Guide +===================== + +In a given P2P implementation there may be three or more different +types of kernel drivers in play: + +* Provider - A driver which provides or publishes P2P resources like + memory or doorbell registers to other drivers. +* Client - A driver which makes use of a resource by setting up a + DMA transaction to or from it. +* Orchestrator - A driver which orchestrates the flow of data between + clients and providers. + +In many cases there could be overlap between these three types (i.e., +it may be typical for a driver to be both a provider and a client). + +For example, in the NVMe Target Copy Offload implementation: + +* The NVMe PCI driver is both a client, provider and orchestrator + in that it exposes any CMB (Controller Memory Buffer) as a P2P memory + resource (provider), it accepts P2P memory pages as buffers in requests + to be used directly (client) and it can also make use of the CMB as + submission queue entries (orchastrator). +* The RDMA driver is a client in this arrangement so that an RNIC + can DMA directly to the memory exposed by the NVMe device. +* The NVMe Target driver (nvmet) can orchestrate the data from the RNIC + to the P2P memory (CMB) and then to the NVMe device (and vice versa). + +This is currently the only arrangement supported by the kernel but +one could imagine slight tweaks to this that would allow for the same +functionality. For example, if a specific RNIC added a BAR with some +memory behind it, its driver could add support as a P2P provider and +then the NVMe Target could use the RNIC's memory instead of the CMB +in cases where the NVMe cards in use do not have CMB support. + + +Provider Drivers +---------------- + +A provider simply needs to register a BAR (or a portion of a BAR) +as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`. +This will register struct pages for all the specified memory. + +After that it may optionally publish all of its resources as +P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow +any orchestrator drivers to find and use the memory. When marked in +this way, the resource must be regular memory with no side effects. + +For the time being this is fairly rudimentary in that all resources +are typically going to be P2P memory. Future work will likely expand +this to include other types of resources like doorbells. + + +Client Drivers +-------------- + +A client driver typically only has to conditionally change its DMA map +routine to use the mapping function :c:func:`pci_p2pdma_map_sg()` instead +of the usual :c:func:`dma_map_sg()` function. Memory mapped in this +way does not need to be unmapped. + +The client may also, optionally, make use of +:c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping +functions and when to use the regular mapping functions. In some +situations, it may be more appropriate to use a flag to indicate a +given request is P2P memory and map appropriately. It is important to +ensure that struct pages that back P2P memory stay out of code that +does not have support for them as other code may treat the pages as +regular memory which may not be appropriate. + + +Orchestrator Drivers +-------------------- + +The first task an orchestrator driver must do is compile a list of +all client devices that will be involved in a given transaction. For +example, the NVMe Target driver creates a list including the namespace +block device and the RNIC in use. If the orchestrator has access to +a specific P2P provider to use it may check compatibility using +:c:func:`pci_p2pdma_distance()` otherwise it may find a memory provider +that's compatible with all clients using :c:func:`pci_p2pmem_find()`. +If more than one provider is supported, the one nearest to all the clients will +be chosen first. If more than one provider is an equal distance away, the +one returned will be chosen at random (it is not an arbitrary but +truely random). This function returns the PCI device to use for the provider +with a reference taken and therefore when it's no longer needed it should be +returned with pci_dev_put(). + +Once a provider is selected, the orchestrator can then use +:c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to +allocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()` +and :c:func:`pci_p2pmem_free_sgl()` are convenience functions for +allocating scatter-gather lists with P2P memory. + +Struct Page Caveats +------------------- + +Driver writers should be very careful about not passing these special +struct pages to code that isn't prepared for it. At this time, the kernel +interfaces do not have any checks for ensuring this. This obviously +precludes passing these pages to userspace. + +P2P memory is also technically IO memory but should never have any side +effects behind it. Thus, the order of loads and stores should not be important +and ioreadX(), iowriteX() and friends should not be necessary. +However, as the memory is not cache coherent, if access ever needs to +be protected by a spinlock then :c:func:`mmiowb()` must be used before +unlocking the lock. (See ACQUIRES VS I/O ACCESSES in +Documentation/memory-barriers.txt) + + +P2P DMA Support Library +======================= + +.. kernel-doc:: drivers/pci/p2pdma.c + :export: diff --git a/Documentation/driver-api/pci.rst b/Documentation/driver-api/pci/pci.rst index ca85e5e78b2c..ca85e5e78b2c 100644 --- a/Documentation/driver-api/pci.rst +++ b/Documentation/driver-api/pci/pci.rst diff --git a/Documentation/efi-stub.txt b/Documentation/efi-stub.txt index 41df801f9a50..833edb0d0bc4 100644 --- a/Documentation/efi-stub.txt +++ b/Documentation/efi-stub.txt @@ -83,7 +83,18 @@ is passed to bzImage.efi. The "dtb=" option ----------------- -For the ARM and arm64 architectures, we also need to be able to provide a -device tree to the kernel. This is done with the "dtb=" command line option, -and is processed in the same manner as the "initrd=" option that is +For the ARM and arm64 architectures, a device tree must be provided to +the kernel. Normally firmware shall supply the device tree via the +EFI CONFIGURATION TABLE. However, the "dtb=" command line option can +be used to override the firmware supplied device tree, or to supply +one when firmware is unable to. + +Please note: Firmware adds runtime configuration information to the +device tree before booting the kernel. If dtb= is used to override +the device tree, then any runtime data provided by firmware will be +lost. The dtb= option should only be used either as a debug tool, or +as a last resort when a device tree is not provided in the EFI +CONFIGURATION TABLE. + +"dtb=" is processed in the same manner as the "initrd=" option that is described above. diff --git a/Documentation/fb/00-INDEX b/Documentation/fb/00-INDEX deleted file mode 100644 index fe85e7c5907a..000000000000 --- a/Documentation/fb/00-INDEX +++ /dev/null @@ -1,75 +0,0 @@ -Index of files in Documentation/fb. If you think something about frame -buffer devices needs an entry here, needs correction or you've written one -please mail me. - Geert Uytterhoeven <geert@linux-m68k.org> - -00-INDEX - - this file. -api.txt - - The frame buffer API between applications and buffer devices. -arkfb.txt - - info on the fbdev driver for ARK Logic chips. -aty128fb.txt - - info on the ATI Rage128 frame buffer driver. -cirrusfb.txt - - info on the driver for Cirrus Logic chipsets. -cmap_xfbdev.txt - - an introduction to fbdev's cmap structures. -deferred_io.txt - - an introduction to deferred IO. -efifb.txt - - info on the EFI platform driver for Intel based Apple computers. -ep93xx-fb.txt - - info on the driver for EP93xx LCD controller. -fbcon.txt - - intro to and usage guide for the framebuffer console (fbcon). -framebuffer.txt - - introduction to frame buffer devices. -gxfb.txt - - info on the framebuffer driver for AMD Geode GX2 based processors. -intel810.txt - - documentation for the Intel 810/815 framebuffer driver. -intelfb.txt - - docs for Intel 830M/845G/852GM/855GM/865G/915G/945G fb driver. -internals.txt - - quick overview of frame buffer device internals. -lxfb.txt - - info on the framebuffer driver for AMD Geode LX based processors. -matroxfb.txt - - info on the Matrox framebuffer driver for Alpha, Intel and PPC. -metronomefb.txt - - info on the driver for the Metronome display controller. -modedb.txt - - info on the video mode database. -pvr2fb.txt - - info on the PowerVR 2 frame buffer driver. -pxafb.txt - - info on the driver for the PXA25x LCD controller. -s3fb.txt - - info on the fbdev driver for S3 Trio/Virge chips. -sa1100fb.txt - - information about the driver for the SA-1100 LCD controller. -sh7760fb.txt - - info on the SH7760/SH7763 integrated LCDC Framebuffer driver. -sisfb.txt - - info on the framebuffer device driver for various SiS chips. -sm501.txt - - info on the framebuffer device driver for sm501 videoframebuffer. -sstfb.txt - - info on the frame buffer driver for 3dfx' Voodoo Graphics boards. -tgafb.txt - - info on the TGA (DECChip 21030) frame buffer driver. -tridentfb.txt - info on the framebuffer driver for some Trident chip based cards. -udlfb.txt - - Driver for DisplayLink USB 2.0 chips. -uvesafb.txt - - info on the userspace VESA (VBE2+ compliant) frame buffer device. -vesafb.txt - - info on the VESA frame buffer device. -viafb.modes - - list of modes for VIA Integration Graphic Chip. -viafb.txt - - info on the VIA Integration Graphic Chip console framebuffer driver. -vt8623fb.txt - - info on the fb driver for the graphics core in VIA VT8623 chipsets. diff --git a/Documentation/fb/vesafb.txt b/Documentation/fb/vesafb.txt index 950d5a658cb3..413bb73235be 100644 --- a/Documentation/fb/vesafb.txt +++ b/Documentation/fb/vesafb.txt @@ -114,11 +114,11 @@ to turn it on. You can pass options to vesafb using "video=vesafb:option" on the kernel command line. Multiple options should be separated -by comma, like this: "video=vesafb:ypan,invers" +by comma, like this: "video=vesafb:ypan,inverse" Accepted options: -invers no comment... +inverse use inverse color map ypan enable display panning using the VESA protected mode interface. The visible screen is just a window of the diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX deleted file mode 100644 index 0937bade1099..000000000000 --- a/Documentation/filesystems/00-INDEX +++ /dev/null @@ -1,153 +0,0 @@ -00-INDEX - - this file (info on some of the filesystems supported by linux). -Locking - - info on locking rules as they pertain to Linux VFS. -9p.txt - - 9p (v9fs) is an implementation of the Plan 9 remote fs protocol. -adfs.txt - - info and mount options for the Acorn Advanced Disc Filing System. -afs.txt - - info and examples for the distributed AFS (Andrew File System) fs. -affs.txt - - info and mount options for the Amiga Fast File System. -autofs-mount-control.txt - - info on device control operations for autofs module. -automount-support.txt - - information about filesystem automount support. -befs.txt - - information about the BeOS filesystem for Linux. -bfs.txt - - info for the SCO UnixWare Boot Filesystem (BFS). -btrfs.txt - - info for the BTRFS filesystem. -caching/ - - directory containing filesystem cache documentation. -ceph.txt - - info for the Ceph Distributed File System. -cifs/ - - directory containing CIFS filesystem documentation and example code. -coda.txt - - description of the CODA filesystem. -configfs/ - - directory containing configfs documentation and example code. -cramfs.txt - - info on the cram filesystem for small storage (ROMs etc). -dax.txt - - info on avoiding the page cache for files stored on CPU-addressable - storage devices. -debugfs.txt - - info on the debugfs filesystem. -devpts.txt - - info on the devpts filesystem. -directory-locking - - info about the locking scheme used for directory operations. -dlmfs.txt - - info on the userspace interface to the OCFS2 DLM. -dnotify.txt - - info about directory notification in Linux. -dnotify_test.c - - example program for dnotify. -ecryptfs.txt - - docs on eCryptfs: stacked cryptographic filesystem for Linux. -efivarfs.txt - - info for the efivarfs filesystem. -exofs.txt - - info, usage, mount options, design about EXOFS. -ext2.txt - - info, mount options and specifications for the Ext2 filesystem. -ext3.txt - - info, mount options and specifications for the Ext3 filesystem. -ext4.txt - - info, mount options and specifications for the Ext4 filesystem. -f2fs.txt - - info and mount options for the F2FS filesystem. -fiemap.txt - - info on fiemap ioctl. -files.txt - - info on file management in the Linux kernel. -fuse.txt - - info on the Filesystem in User SpacE including mount options. -gfs2-glocks.txt - - info on the Global File System 2 - Glock internal locking rules. -gfs2-uevents.txt - - info on the Global File System 2 - uevents. -gfs2.txt - - info on the Global File System 2. -hfs.txt - - info on the Macintosh HFS Filesystem for Linux. -hfsplus.txt - - info on the Macintosh HFSPlus Filesystem for Linux. -hpfs.txt - - info and mount options for the OS/2 HPFS. -inotify.txt - - info on the powerful yet simple file change notification system. -isofs.txt - - info and mount options for the ISO 9660 (CDROM) filesystem. -jfs.txt - - info and mount options for the JFS filesystem. -locks.txt - - info on file locking implementations, flock() vs. fcntl(), etc. -mandatory-locking.txt - - info on the Linux implementation of Sys V mandatory file locking. -nfs/ - - nfs-related documentation. -nilfs2.txt - - info and mount options for the NILFS2 filesystem. -ntfs.txt - - info and mount options for the NTFS filesystem (Windows NT). -ocfs2.txt - - info and mount options for the OCFS2 clustered filesystem. -omfs.txt - - info on the Optimized MPEG FileSystem. -path-lookup.txt - - info on path walking and name lookup locking. -pohmelfs/ - - directory containing pohmelfs filesystem documentation. -porting - - various information on filesystem porting. -proc.txt - - info on Linux's /proc filesystem. -qnx6.txt - - info on the QNX6 filesystem. -quota.txt - - info on Quota subsystem. -ramfs-rootfs-initramfs.txt - - info on the 'in memory' filesystems ramfs, rootfs and initramfs. -relay.txt - - info on relay, for efficient streaming from kernel to user space. -romfs.txt - - description of the ROMFS filesystem. -seq_file.txt - - how to use the seq_file API. -sharedsubtree.txt - - a description of shared subtrees for namespaces. -spufs.txt - - info and mount options for the SPU filesystem used on Cell. -squashfs.txt - - info on the squashfs filesystem. -sysfs-pci.txt - - info on accessing PCI device resources through sysfs. -sysfs-tagging.txt - - info on sysfs tagging to avoid duplicates. -sysfs.txt - - info on sysfs, a ram-based filesystem for exporting kernel objects. -sysv-fs.txt - - info on the SystemV/V7/Xenix/Coherent filesystem. -tmpfs.txt - - info on tmpfs, a filesystem that holds all files in virtual memory. -ubifs.txt - - info on the Unsorted Block Images FileSystem. -udf.txt - - info and mount options for the UDF filesystem. -ufs.txt - - info on the ufs filesystem. -vfat.txt - - info on using the VFAT filesystem used in Windows NT and Windows 95. -vfs.txt - - overview of the Virtual File System. -xfs-delayed-logging-design.txt - - info on the XFS Delayed Logging Design. -xfs-self-describing-metadata.txt - - info on XFS Self Describing Metadata. -xfs.txt - - info and mount options for the XFS filesystem. diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt index 70cb68bed2e8..bc393e0a22b8 100644 --- a/Documentation/filesystems/dax.txt +++ b/Documentation/filesystems/dax.txt @@ -75,7 +75,7 @@ exposure of uninitialized data through mmap. These filesystems may be used for inspiration: - ext2: see Documentation/filesystems/ext2.txt -- ext4: see Documentation/filesystems/ext4.txt +- ext4: see Documentation/filesystems/ext4/ext4.rst - xfs: see Documentation/filesystems/xfs.txt diff --git a/Documentation/filesystems/ext2.txt b/Documentation/filesystems/ext2.txt index 81c0becab225..a45c9fc0747b 100644 --- a/Documentation/filesystems/ext2.txt +++ b/Documentation/filesystems/ext2.txt @@ -358,7 +358,7 @@ and are copied into the filesystem. If a transaction is incomplete at the time of the crash, then there is no guarantee of consistency for the blocks in that transaction so they are discarded (which means any filesystem changes they represent are also lost). -Check Documentation/filesystems/ext4.txt if you want to read more about +Check Documentation/filesystems/ext4/ext4.rst if you want to read more about ext4 and journaling. References diff --git a/Documentation/filesystems/ext4/ondisk/about.rst b/Documentation/filesystems/ext4/about.rst index 0aadba052264..0aadba052264 100644 --- a/Documentation/filesystems/ext4/ondisk/about.rst +++ b/Documentation/filesystems/ext4/about.rst diff --git a/Documentation/filesystems/ext4/ondisk/allocators.rst b/Documentation/filesystems/ext4/allocators.rst index 7aa85152ace3..7aa85152ace3 100644 --- a/Documentation/filesystems/ext4/ondisk/allocators.rst +++ b/Documentation/filesystems/ext4/allocators.rst diff --git a/Documentation/filesystems/ext4/ondisk/attributes.rst b/Documentation/filesystems/ext4/attributes.rst index 0b01b67b81fe..54386a010a8d 100644 --- a/Documentation/filesystems/ext4/ondisk/attributes.rst +++ b/Documentation/filesystems/ext4/attributes.rst @@ -30,7 +30,7 @@ Extended attributes, when stored after the inode, have a header ``ext4_xattr_ibody_header`` that is 4 bytes long: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -47,7 +47,7 @@ The beginning of an extended attribute block is in ``struct ext4_xattr_header``, which is 32 bytes long: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -92,7 +92,7 @@ entries must be stored in sorted order. The sort order is Attributes stored inside an inode do not need be stored in sorted order. .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -157,7 +157,7 @@ attribute name index field is set, and matching string is removed from the key name. Here is a map of name index values to key prefixes: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Name Index diff --git a/Documentation/filesystems/ext4/ondisk/bigalloc.rst b/Documentation/filesystems/ext4/bigalloc.rst index c6d88557553c..c6d88557553c 100644 --- a/Documentation/filesystems/ext4/ondisk/bigalloc.rst +++ b/Documentation/filesystems/ext4/bigalloc.rst diff --git a/Documentation/filesystems/ext4/ondisk/bitmaps.rst b/Documentation/filesystems/ext4/bitmaps.rst index c7546dbc197a..c7546dbc197a 100644 --- a/Documentation/filesystems/ext4/ondisk/bitmaps.rst +++ b/Documentation/filesystems/ext4/bitmaps.rst diff --git a/Documentation/filesystems/ext4/ondisk/blockgroup.rst b/Documentation/filesystems/ext4/blockgroup.rst index baf888e4c06a..baf888e4c06a 100644 --- a/Documentation/filesystems/ext4/ondisk/blockgroup.rst +++ b/Documentation/filesystems/ext4/blockgroup.rst diff --git a/Documentation/filesystems/ext4/ondisk/blockmap.rst b/Documentation/filesystems/ext4/blockmap.rst index 30e25750d88a..30e25750d88a 100644 --- a/Documentation/filesystems/ext4/ondisk/blockmap.rst +++ b/Documentation/filesystems/ext4/blockmap.rst diff --git a/Documentation/filesystems/ext4/ondisk/blocks.rst b/Documentation/filesystems/ext4/blocks.rst index 73d4dc0f7bda..73d4dc0f7bda 100644 --- a/Documentation/filesystems/ext4/ondisk/blocks.rst +++ b/Documentation/filesystems/ext4/blocks.rst diff --git a/Documentation/filesystems/ext4/ondisk/checksums.rst b/Documentation/filesystems/ext4/checksums.rst index 9d6a793b2e03..5519e253810d 100644 --- a/Documentation/filesystems/ext4/ondisk/checksums.rst +++ b/Documentation/filesystems/ext4/checksums.rst @@ -28,7 +28,7 @@ of checksum. The checksum function is whatever the superblock describes (crc32c as of October 2013) unless noted otherwise. .. list-table:: - :widths: 1 1 4 + :widths: 20 8 50 :header-rows: 1 * - Metadata diff --git a/Documentation/filesystems/ext4/ondisk/directory.rst b/Documentation/filesystems/ext4/directory.rst index 8fcba68c2884..614034e24669 100644 --- a/Documentation/filesystems/ext4/ondisk/directory.rst +++ b/Documentation/filesystems/ext4/directory.rst @@ -34,7 +34,7 @@ is at most 263 bytes long, though on disk you'll need to reference ``dirent.rec_len`` to know for sure. .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -66,7 +66,7 @@ tree traversal. This format is ``ext4_dir_entry_2``, which is at most ``dirent.rec_len`` to know for sure. .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -99,7 +99,7 @@ tree traversal. This format is ``ext4_dir_entry_2``, which is at most The directory file type is one of the following values: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -130,7 +130,7 @@ in the place where the name normally goes. The structure is ``struct ext4_dir_entry_tail``: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -212,7 +212,7 @@ The root of the htree is in ``struct dx_root``, which is the full length of a data block: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -305,7 +305,7 @@ of a data block: The directory hash is one of the following values: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -327,7 +327,7 @@ Interior nodes of an htree are recorded as ``struct dx_node``, which is also the full length of a data block: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -375,7 +375,7 @@ The hash maps that exist in both ``struct dx_root`` and long: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -405,7 +405,7 @@ directory index (which will ensure that there's space for the checksum. The dx\_tail structure is 8 bytes long and looks like this: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset diff --git a/Documentation/filesystems/ext4/ondisk/dynamic.rst b/Documentation/filesystems/ext4/dynamic.rst index bb0c84333341..bb0c84333341 100644 --- a/Documentation/filesystems/ext4/ondisk/dynamic.rst +++ b/Documentation/filesystems/ext4/dynamic.rst diff --git a/Documentation/filesystems/ext4/ondisk/eainode.rst b/Documentation/filesystems/ext4/eainode.rst index ecc0d01a0a72..ecc0d01a0a72 100644 --- a/Documentation/filesystems/ext4/ondisk/eainode.rst +++ b/Documentation/filesystems/ext4/eainode.rst diff --git a/Documentation/filesystems/ext4/ext4.rst b/Documentation/filesystems/ext4/ext4.rst deleted file mode 100644 index 9d4368d591fa..000000000000 --- a/Documentation/filesystems/ext4/ext4.rst +++ /dev/null @@ -1,613 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -======================== -General Information -======================== - -Ext4 is an advanced level of the ext3 filesystem which incorporates -scalability and reliability enhancements for supporting large filesystems -(64 bit) in keeping with increasing disk capacities and state-of-the-art -feature requirements. - -Mailing list: linux-ext4@vger.kernel.org -Web site: http://ext4.wiki.kernel.org - - -Quick usage instructions -======================== - -Note: More extensive information for getting started with ext4 can be -found at the ext4 wiki site at the URL: -http://ext4.wiki.kernel.org/index.php/Ext4_Howto - - - The latest version of e2fsprogs can be found at: - - https://www.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/ - - or - - http://sourceforge.net/project/showfiles.php?group_id=2406 - - or grab the latest git repository from: - - https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git - - - Create a new filesystem using the ext4 filesystem type: - - # mke2fs -t ext4 /dev/hda1 - - Or to configure an existing ext3 filesystem to support extents: - - # tune2fs -O extents /dev/hda1 - - If the filesystem was created with 128 byte inodes, it can be - converted to use 256 byte for greater efficiency via: - - # tune2fs -I 256 /dev/hda1 - - - Mounting: - - # mount -t ext4 /dev/hda1 /wherever - - - When comparing performance with other filesystems, it's always - important to try multiple workloads; very often a subtle change in a - workload parameter can completely change the ranking of which - filesystems do well compared to others. When comparing versus ext3, - note that ext4 enables write barriers by default, while ext3 does - not enable write barriers by default. So it is useful to use - explicitly specify whether barriers are enabled or not when via the - '-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems - for a fair comparison. When tuning ext3 for best benchmark numbers, - it is often worthwhile to try changing the data journaling mode; '-o - data=writeback' can be faster for some workloads. (Note however that - running mounted with data=writeback can potentially leave stale data - exposed in recently written files in case of an unclean shutdown, - which could be a security exposure in some situations.) Configuring - the filesystem with a large journal can also be helpful for - metadata-intensive workloads. - -Features -======== - -Currently Available -------------------- - -* ability to use filesystems > 16TB (e2fsprogs support not available yet) -* extent format reduces metadata overhead (RAM, IO for access, transactions) -* extent format more robust in face of on-disk corruption due to magics, -* internal redundancy in tree -* improved file allocation (multi-block alloc) -* lift 32000 subdirectory limit imposed by i_links_count[1] -* nsec timestamps for mtime, atime, ctime, create time -* inode version field on disk (NFSv4, Lustre) -* reduced e2fsck time via uninit_bg feature -* journal checksumming for robustness, performance -* persistent file preallocation (e.g for streaming media, databases) -* ability to pack bitmaps and inode tables into larger virtual groups via the - flex_bg feature -* large file support -* inode allocation using large virtual block groups via flex_bg -* delayed allocation -* large block (up to pagesize) support -* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force - the ordering) - -[1] Filesystems with a block size of 1k may see a limit imposed by the -directory hash tree having a maximum depth of two. - -Options -======= - -When mounting an ext4 filesystem, the following option are accepted: -(*) == default - -======================= ======================================================= -Mount Option Description -======================= ======================================================= -ro Mount filesystem read only. Note that ext4 will - replay the journal (and thus write to the - partition) even when mounted "read only". The - mount options "ro,noload" can be used to prevent - writes to the filesystem. - -journal_checksum Enable checksumming of the journal transactions. - This will allow the recovery code in e2fsck and the - kernel to detect corruption in the kernel. It is a - compatible change and will be ignored by older kernels. - -journal_async_commit Commit block can be written to disk without waiting - for descriptor blocks. If enabled older kernels cannot - mount the device. This will enable 'journal_checksum' - internally. - -journal_path=path -journal_dev=devnum When the external journal device's major/minor numbers - have changed, these options allow the user to specify - the new journal location. The journal device is - identified through either its new major/minor numbers - encoded in devnum, or via a path to the device. - -norecovery Don't load the journal on mounting. Note that -noload if the filesystem was not unmounted cleanly, - skipping the journal replay will lead to the - filesystem containing inconsistencies that can - lead to any number of problems. - -data=journal All data are committed into the journal prior to being - written into the main file system. Enabling - this mode will disable delayed allocation and - O_DIRECT support. - -data=ordered (*) All data are forced directly out to the main file - system prior to its metadata being committed to the - journal. - -data=writeback Data ordering is not preserved, data may be written - into the main file system after its metadata has been - committed to the journal. - -commit=nrsec (*) Ext4 can be told to sync all its data and metadata - every 'nrsec' seconds. The default value is 5 seconds. - This means that if you lose your power, you will lose - as much as the latest 5 seconds of work (your - filesystem will not be damaged though, thanks to the - journaling). This default value (or any low value) - will hurt performance, but it's good for data-safety. - Setting it to 0 will have the same effect as leaving - it at the default (5 seconds). - Setting it to very large values will improve - performance. - -barrier=<0|1(*)> This enables/disables the use of write barriers in -barrier(*) the jbd code. barrier=0 disables, barrier=1 enables. -nobarrier This also requires an IO stack which can support - barriers, and if jbd gets an error on a barrier - write, it will disable again with a warning. - Write barriers enforce proper on-disk ordering - of journal commits, making volatile disk write caches - safe to use, at some performance penalty. If - your disks are battery-backed in one way or another, - disabling barriers may safely improve performance. - The mount options "barrier" and "nobarrier" can - also be used to enable or disable barriers, for - consistency with other ext4 mount options. - -inode_readahead_blks=n This tuning parameter controls the maximum - number of inode table blocks that ext4's inode - table readahead algorithm will pre-read into - the buffer cache. The default value is 32 blocks. - -nouser_xattr Disables Extended User Attributes. See the - attr(5) manual page for more information about - extended attributes. - -noacl This option disables POSIX Access Control List - support. If ACL support is enabled in the kernel - configuration (CONFIG_EXT4_FS_POSIX_ACL), ACL is - enabled by default on mount. See the acl(5) manual - page for more information about acl. - -bsddf (*) Make 'df' act like BSD. -minixdf Make 'df' act like Minix. - -debug Extra debugging information is sent to syslog. - -abort Simulate the effects of calling ext4_abort() for - debugging purposes. This is normally used while - remounting a filesystem which is already mounted. - -errors=remount-ro Remount the filesystem read-only on an error. -errors=continue Keep going on a filesystem error. -errors=panic Panic and halt the machine if an error occurs. - (These mount options override the errors behavior - specified in the superblock, which can be configured - using tune2fs) - -data_err=ignore(*) Just print an error message if an error occurs - in a file data buffer in ordered mode. -data_err=abort Abort the journal if an error occurs in a file - data buffer in ordered mode. - -grpid New objects have the group ID of their parent. -bsdgroups - -nogrpid (*) New objects have the group ID of their creator. -sysvgroups - -resgid=n The group ID which may use the reserved blocks. - -resuid=n The user ID which may use the reserved blocks. - -sb=n Use alternate superblock at this location. - -quota These options are ignored by the filesystem. They -noquota are used only by quota tools to recognize volumes -grpquota where quota should be turned on. See documentation -usrquota in the quota-tools package for more details - (http://sourceforge.net/projects/linuxquota). - -jqfmt=<quota type> These options tell filesystem details about quota -usrjquota=<file> so that quota information can be properly updated -grpjquota=<file> during journal replay. They replace the above - quota options. See documentation in the quota-tools - package for more details - (http://sourceforge.net/projects/linuxquota). - -stripe=n Number of filesystem blocks that mballoc will try - to use for allocation size and alignment. For RAID5/6 - systems this should be the number of data - disks * RAID chunk size in file system blocks. - -delalloc (*) Defer block allocation until just before ext4 - writes out the block(s) in question. This - allows ext4 to better allocation decisions - more efficiently. -nodelalloc Disable delayed allocation. Blocks are allocated - when the data is copied from userspace to the - page cache, either via the write(2) system call - or when an mmap'ed page which was previously - unallocated is written for the first time. - -max_batch_time=usec Maximum amount of time ext4 should wait for - additional filesystem operations to be batch - together with a synchronous write operation. - Since a synchronous write operation is going to - force a commit and then a wait for the I/O - complete, it doesn't cost much, and can be a - huge throughput win, we wait for a small amount - of time to see if any other transactions can - piggyback on the synchronous write. The - algorithm used is designed to automatically tune - for the speed of the disk, by measuring the - amount of time (on average) that it takes to - finish committing a transaction. Call this time - the "commit time". If the time that the - transaction has been running is less than the - commit time, ext4 will try sleeping for the - commit time to see if other operations will join - the transaction. The commit time is capped by - the max_batch_time, which defaults to 15000us - (15ms). This optimization can be turned off - entirely by setting max_batch_time to 0. - -min_batch_time=usec This parameter sets the commit time (as - described above) to be at least min_batch_time. - It defaults to zero microseconds. Increasing - this parameter may improve the throughput of - multi-threaded, synchronous workloads on very - fast disks, at the cost of increasing latency. - -journal_ioprio=prio The I/O priority (from 0 to 7, where 0 is the - highest priority) which should be used for I/O - operations submitted by kjournald2 during a - commit operation. This defaults to 3, which is - a slightly higher priority than the default I/O - priority. - -auto_da_alloc(*) Many broken applications don't use fsync() when -noauto_da_alloc replacing existing files via patterns such as - fd = open("foo.new")/write(fd,..)/close(fd)/ - rename("foo.new", "foo"), or worse yet, - fd = open("foo", O_TRUNC)/write(fd,..)/close(fd). - If auto_da_alloc is enabled, ext4 will detect - the replace-via-rename and replace-via-truncate - patterns and force that any delayed allocation - blocks are allocated such that at the next - journal commit, in the default data=ordered - mode, the data blocks of the new file are forced - to disk before the rename() operation is - committed. This provides roughly the same level - of guarantees as ext3, and avoids the - "zero-length" problem that can happen when a - system crashes before the delayed allocation - blocks are forced to disk. - -noinit_itable Do not initialize any uninitialized inode table - blocks in the background. This feature may be - used by installation CD's so that the install - process can complete as quickly as possible; the - inode table initialization process would then be - deferred until the next time the file system - is unmounted. - -init_itable=n The lazy itable init code will wait n times the - number of milliseconds it took to zero out the - previous block group's inode table. This - minimizes the impact on the system performance - while file system's inode table is being initialized. - -discard Controls whether ext4 should issue discard/TRIM -nodiscard(*) commands to the underlying block device when - blocks are freed. This is useful for SSD devices - and sparse/thinly-provisioned LUNs, but it is off - by default until sufficient testing has been done. - -nouid32 Disables 32-bit UIDs and GIDs. This is for - interoperability with older kernels which only - store and expect 16-bit values. - -block_validity(*) These options enable or disable the in-kernel -noblock_validity facility for tracking filesystem metadata blocks - within internal data structures. This allows multi- - block allocator and other routines to notice - bugs or corrupted allocation bitmaps which cause - blocks to be allocated which overlap with - filesystem metadata blocks. - -dioread_lock Controls whether or not ext4 should use the DIO read -dioread_nolock locking. If the dioread_nolock option is specified - ext4 will allocate uninitialized extent before buffer - write and convert the extent to initialized after IO - completes. This approach allows ext4 code to avoid - using inode mutex, which improves scalability on high - speed storages. However this does not work with - data journaling and dioread_nolock option will be - ignored with kernel warning. Note that dioread_nolock - code path is only used for extent-based files. - Because of the restrictions this options comprises - it is off by default (e.g. dioread_lock). - -max_dir_size_kb=n This limits the size of directories so that any - attempt to expand them beyond the specified - limit in kilobytes will cause an ENOSPC error. - This is useful in memory constrained - environments, where a very large directory can - cause severe performance problems or even - provoke the Out Of Memory killer. (For example, - if there is only 512mb memory available, a 176mb - directory may seriously cramp the system's style.) - -i_version Enable 64-bit inode version support. This option is - off by default. - -dax Use direct access (no page cache). See - Documentation/filesystems/dax.txt. Note that - this option is incompatible with data=journal. -======================= ======================================================= - -Data Mode -========= -There are 3 different data modes: - -* writeback mode - - In data=writeback mode, ext4 does not journal data at all. This mode provides - a similar level of journaling as that of XFS, JFS, and ReiserFS in its default - mode - metadata journaling. A crash+recovery can cause incorrect data to - appear in files which were written shortly before the crash. This mode will - typically provide the best ext4 performance. - -* ordered mode - - In data=ordered mode, ext4 only officially journals metadata, but it logically - groups metadata information related to data changes with the data blocks into - a single unit called a transaction. When it's time to write the new metadata - out to disk, the associated data blocks are written first. In general, this - mode performs slightly slower than writeback but significantly faster than - journal mode. - -* journal mode - - data=journal mode provides full data and metadata journaling. All new data is - written to the journal first, and then to its final location. In the event of - a crash, the journal can be replayed, bringing both data and metadata into a - consistent state. This mode is the slowest except when data needs to be read - from and written to disk at the same time where it outperforms all others - modes. Enabling this mode will disable delayed allocation and O_DIRECT - support. - -/proc entries -============= - -Information about mounted ext4 file systems can be found in -/proc/fs/ext4. Each mounted filesystem will have a directory in -/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or -/proc/fs/ext4/dm-0). The files in each per-device directory are shown -in table below. - -Files in /proc/fs/ext4/<devname> - -================ ======= - File Content -================ ======= - mb_groups details of multiblock allocator buddy cache of free blocks -================ ======= - -/sys entries -============ - -Information about mounted ext4 file systems can be found in -/sys/fs/ext4. Each mounted filesystem will have a directory in -/sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or -/sys/fs/ext4/dm-0). The files in each per-device directory are shown -in table below. - -Files in /sys/fs/ext4/<devname>: - -(see also Documentation/ABI/testing/sysfs-fs-ext4) - -============================= ================================================= -File Content -============================= ================================================= - delayed_allocation_blocks This file is read-only and shows the number of - blocks that are dirty in the page cache, but - which do not have their location in the - filesystem allocated yet. - -inode_goal Tuning parameter which (if non-zero) controls - the goal inode used by the inode allocator in - preference to all other allocation heuristics. - This is intended for debugging use only, and - should be 0 on production systems. - -inode_readahead_blks Tuning parameter which controls the maximum - number of inode table blocks that ext4's inode - table readahead algorithm will pre-read into - the buffer cache - -lifetime_write_kbytes This file is read-only and shows the number of - kilobytes of data that have been written to this - filesystem since it was created. - - max_writeback_mb_bump The maximum number of megabytes the writeback - code will try to write out before move on to - another inode. - - mb_group_prealloc The multiblock allocator will round up allocation - requests to a multiple of this tuning parameter if - the stripe size is not set in the ext4 superblock - - mb_max_to_scan The maximum number of extents the multiblock - allocator will search to find the best extent - - mb_min_to_scan The minimum number of extents the multiblock - allocator will search to find the best extent - - mb_order2_req Tuning parameter which controls the minimum size - for requests (as a power of 2) where the buddy - cache is used - - mb_stats Controls whether the multiblock allocator should - collect statistics, which are shown during the - unmount. 1 means to collect statistics, 0 means - not to collect statistics - - mb_stream_req Files which have fewer blocks than this tunable - parameter will have their blocks allocated out - of a block group specific preallocation pool, so - that small files are packed closely together. - Each large file will have its blocks allocated - out of its own unique preallocation pool. - - session_write_kbytes This file is read-only and shows the number of - kilobytes of data that have been written to this - filesystem since it was mounted. - - reserved_clusters This is RW file and contains number of reserved - clusters in the file system which will be used - in the specific situations to avoid costly - zeroout, unexpected ENOSPC, or possible data - loss. The default is 2% or 4096 clusters, - whichever is smaller and this can be changed - however it can never exceed number of clusters - in the file system. If there is not enough space - for the reserved space when mounting the file - mount will _not_ fail. -============================= ================================================= - -Ioctls -====== - -There is some Ext4 specific functionality which can be accessed by applications -through the system call interfaces. The list of all Ext4 specific ioctls are -shown in the table below. - -Table of Ext4 specific ioctls - -============================= ================================================= -Ioctl Description -============================= ================================================= - EXT4_IOC_GETFLAGS Get additional attributes associated with inode. - The ioctl argument is an integer bitfield, with - bit values described in ext4.h. This ioctl is an - alias for FS_IOC_GETFLAGS. - - EXT4_IOC_SETFLAGS Set additional attributes associated with inode. - The ioctl argument is an integer bitfield, with - bit values described in ext4.h. This ioctl is an - alias for FS_IOC_SETFLAGS. - - EXT4_IOC_GETVERSION - EXT4_IOC_GETVERSION_OLD - Get the inode i_generation number stored for - each inode. The i_generation number is normally - changed only when new inode is created and it is - particularly useful for network filesystems. The - '_OLD' version of this ioctl is an alias for - FS_IOC_GETVERSION. - - EXT4_IOC_SETVERSION - EXT4_IOC_SETVERSION_OLD - Set the inode i_generation number stored for - each inode. The '_OLD' version of this ioctl - is an alias for FS_IOC_SETVERSION. - - EXT4_IOC_GROUP_EXTEND This ioctl has the same purpose as the resize - mount option. It allows to resize filesystem - to the end of the last existing block group, - further resize has to be done with resize2fs, - either online, or offline. The argument points - to the unsigned logn number representing the - filesystem new block count. - - EXT4_IOC_MOVE_EXT Move the block extents from orig_fd (the one - this ioctl is pointing to) to the donor_fd (the - one specified in move_extent structure passed - as an argument to this ioctl). Then, exchange - inode metadata between orig_fd and donor_fd. - This is especially useful for online - defragmentation, because the allocator has the - opportunity to allocate moved blocks better, - ideally into one contiguous extent. - - EXT4_IOC_GROUP_ADD Add a new group descriptor to an existing or - new group descriptor block. The new group - descriptor is described by ext4_new_group_input - structure, which is passed as an argument to - this ioctl. This is especially useful in - conjunction with EXT4_IOC_GROUP_EXTEND, - which allows online resize of the filesystem - to the end of the last existing block group. - Those two ioctls combined is used in userspace - online resize tool (e.g. resize2fs). - - EXT4_IOC_MIGRATE This ioctl operates on the filesystem itself. - It converts (migrates) ext3 indirect block mapped - inode to ext4 extent mapped inode by walking - through indirect block mapping of the original - inode and converting contiguous block ranges - into ext4 extents of the temporary inode. Then, - inodes are swapped. This ioctl might help, when - migrating from ext3 to ext4 filesystem, however - suggestion is to create fresh ext4 filesystem - and copy data from the backup. Note, that - filesystem has to support extents for this ioctl - to work. - - EXT4_IOC_ALLOC_DA_BLKS Force all of the delay allocated blocks to be - allocated to preserve application-expected ext3 - behaviour. Note that this will also start - triggering a write of the data blocks, but this - behaviour may change in the future as it is - not necessary and has been done this way only - for sake of simplicity. - - EXT4_IOC_RESIZE_FS Resize the filesystem to a new size. The number - of blocks of resized filesystem is passed in via - 64 bit integer argument. The kernel allocates - bitmaps and inode table, the userspace tool thus - just passes the new number of blocks. - - EXT4_IOC_SWAP_BOOT Swap i_blocks and associated attributes - (like i_blocks, i_size, i_flags, ...) from - the specified inode with inode - EXT4_BOOT_LOADER_INO (#5). This is typically - used to store a boot loader in a secure part of - the filesystem, where it can't be changed by a - normal user by accident. - The data blocks of the previous boot loader - will be associated with the given inode. -============================= ================================================= - -References -========== - -kernel source: <file:fs/ext4/> - <file:fs/jbd2/> - -programs: http://e2fsprogs.sourceforge.net/ - -useful links: http://fedoraproject.org/wiki/ext3-devel - http://www.bullopensource.org/ext4/ - http://ext4.wiki.kernel.org/index.php/Main_Page - http://fedoraproject.org/wiki/Features/Ext4 diff --git a/Documentation/filesystems/ext4/ondisk/globals.rst b/Documentation/filesystems/ext4/globals.rst index 368bf7662b96..368bf7662b96 100644 --- a/Documentation/filesystems/ext4/ondisk/globals.rst +++ b/Documentation/filesystems/ext4/globals.rst diff --git a/Documentation/filesystems/ext4/ondisk/group_descr.rst b/Documentation/filesystems/ext4/group_descr.rst index 759827e5d2cf..0f783ed88592 100644 --- a/Documentation/filesystems/ext4/ondisk/group_descr.rst +++ b/Documentation/filesystems/ext4/group_descr.rst @@ -43,7 +43,7 @@ entire bitmap. The block group descriptor is laid out in ``struct ext4_group_desc``. .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -157,7 +157,7 @@ The block group descriptor is laid out in ``struct ext4_group_desc``. Block group flags can be any combination of the following: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value diff --git a/Documentation/filesystems/ext4/ondisk/ifork.rst b/Documentation/filesystems/ext4/ifork.rst index 5dbe3b2b121a..b9816d5a896b 100644 --- a/Documentation/filesystems/ext4/ondisk/ifork.rst +++ b/Documentation/filesystems/ext4/ifork.rst @@ -68,7 +68,7 @@ The extent tree header is recorded in ``struct ext4_extent_header``, which is 12 bytes long: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -104,7 +104,7 @@ Internal nodes of the extent tree, also known as index nodes, are recorded as ``struct ext4_extent_idx``, and are 12 bytes long: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -134,7 +134,7 @@ Leaf nodes of the extent tree are recorded as ``struct ext4_extent``, and are also 12 bytes long: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -174,7 +174,7 @@ including) the checksum itself. ``struct ext4_extent_tail`` is 4 bytes long: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset diff --git a/Documentation/filesystems/ext4/index.rst b/Documentation/filesystems/ext4/index.rst index 71121605558c..3be3e54d480d 100644 --- a/Documentation/filesystems/ext4/index.rst +++ b/Documentation/filesystems/ext4/index.rst @@ -1,17 +1,14 @@ .. SPDX-License-Identifier: GPL-2.0 -=============== -ext4 Filesystem -=============== - -General usage and on-disk artifacts writen by ext4. More documentation may -be ported from the wiki as time permits. This should be considered the -canonical source of information as the details here have been reviewed by -the ext4 community. +=================================== +ext4 Data Structures and Algorithms +=================================== .. toctree:: - :maxdepth: 5 + :maxdepth: 6 :numbered: - ext4 - ondisk/index + about.rst + overview.rst + globals.rst + dynamic.rst diff --git a/Documentation/filesystems/ext4/ondisk/inlinedata.rst b/Documentation/filesystems/ext4/inlinedata.rst index d1075178ce0b..d1075178ce0b 100644 --- a/Documentation/filesystems/ext4/ondisk/inlinedata.rst +++ b/Documentation/filesystems/ext4/inlinedata.rst diff --git a/Documentation/filesystems/ext4/ondisk/inodes.rst b/Documentation/filesystems/ext4/inodes.rst index 655ce898f3f5..6bd35e506b6f 100644 --- a/Documentation/filesystems/ext4/ondisk/inodes.rst +++ b/Documentation/filesystems/ext4/inodes.rst @@ -29,8 +29,9 @@ and the inode structure itself. The inode table entry is laid out in ``struct ext4_inode``. .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 + :class: longtable * - Offset - Size @@ -176,7 +177,7 @@ The inode table entry is laid out in ``struct ext4_inode``. The ``i_mode`` value is a combination of the following flags: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -227,7 +228,7 @@ The ``i_mode`` value is a combination of the following flags: The ``i_flags`` field is a combination of these values: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -314,7 +315,7 @@ The ``osd1`` field has multiple meanings depending on the creator: Linux: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -331,7 +332,7 @@ Linux: Hurd: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -346,7 +347,7 @@ Hurd: Masix: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -365,7 +366,7 @@ The ``osd2`` field has multiple meanings depending on the filesystem creator: Linux: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -402,7 +403,7 @@ Linux: Hurd: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -433,7 +434,7 @@ Hurd: Masix: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset diff --git a/Documentation/filesystems/ext4/ondisk/journal.rst b/Documentation/filesystems/ext4/journal.rst index e7031af86876..ea613ee701f5 100644 --- a/Documentation/filesystems/ext4/ondisk/journal.rst +++ b/Documentation/filesystems/ext4/journal.rst @@ -48,7 +48,7 @@ Layout Generally speaking, the journal has this format: .. list-table:: - :widths: 1 1 78 + :widths: 16 48 16 :header-rows: 1 * - Superblock @@ -76,7 +76,7 @@ The journal superblock will be in the next full block after the superblock. .. list-table:: - :widths: 1 1 1 1 76 + :widths: 12 12 12 32 12 :header-rows: 1 * - 1024 bytes of padding @@ -98,7 +98,7 @@ Every block in the journal starts with a common 12-byte header ``struct journal_header_s``: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -124,7 +124,7 @@ Every block in the journal starts with a common 12-byte header The journal block type can be any one of: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -154,7 +154,7 @@ The journal superblock is recorded as ``struct journal_superblock_s``, which is 1024 bytes long: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -264,7 +264,7 @@ which is 1024 bytes long: The journal compat features are any combination of the following: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -278,7 +278,7 @@ The journal compat features are any combination of the following: The journal incompat features are any combination of the following: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -306,7 +306,7 @@ Journal checksum type codes are one of the following. crc32 or crc32c are the most likely choices. .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -330,7 +330,7 @@ described by a data structure, but here is the block structure anyway. Descriptor blocks consume at least 36 bytes, but use a full block: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -355,7 +355,7 @@ defined as ``struct journal_block_tag3_s``, which looks like the following. The size is 16 or 32 bytes. .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -400,7 +400,7 @@ following. The size is 16 or 32 bytes. The journal tag flags are any combination of the following: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -421,7 +421,7 @@ is defined as ``struct journal_block_tag_s``, which looks like the following. The size is 8, 12, 24, or 28 bytes: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -471,7 +471,7 @@ JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 are set, the end of the block is a ``struct jbd2_journal_block_tail``, which looks like this: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -513,7 +513,7 @@ Revocation blocks are described in length, but use a full block: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -543,7 +543,7 @@ JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 are set, the end of the revocation block is a ``struct jbd2_journal_revoke_tail``, which has this format: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -567,7 +567,7 @@ The commit block is described by ``struct commit_header``, which is 32 bytes long (but uses a full block): .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset diff --git a/Documentation/filesystems/ext4/ondisk/mmp.rst b/Documentation/filesystems/ext4/mmp.rst index b7d7a3137f80..25660981d93c 100644 --- a/Documentation/filesystems/ext4/ondisk/mmp.rst +++ b/Documentation/filesystems/ext4/mmp.rst @@ -32,7 +32,7 @@ The checksum is calculated against the FS UUID and the MMP structure. The MMP structure (``struct mmp_struct``) is as follows: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 12 20 40 :header-rows: 1 * - Offset diff --git a/Documentation/filesystems/ext4/ondisk/index.rst b/Documentation/filesystems/ext4/ondisk/index.rst deleted file mode 100644 index f7d082c3a435..000000000000 --- a/Documentation/filesystems/ext4/ondisk/index.rst +++ /dev/null @@ -1,9 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -============================== -Data Structures and Algorithms -============================== -.. include:: about.rst -.. include:: overview.rst -.. include:: globals.rst -.. include:: dynamic.rst diff --git a/Documentation/filesystems/ext4/ondisk/overview.rst b/Documentation/filesystems/ext4/overview.rst index cbab18baba12..cbab18baba12 100644 --- a/Documentation/filesystems/ext4/ondisk/overview.rst +++ b/Documentation/filesystems/ext4/overview.rst diff --git a/Documentation/filesystems/ext4/ondisk/special_inodes.rst b/Documentation/filesystems/ext4/special_inodes.rst index a82f70c9baeb..9061aabba827 100644 --- a/Documentation/filesystems/ext4/ondisk/special_inodes.rst +++ b/Documentation/filesystems/ext4/special_inodes.rst @@ -6,7 +6,7 @@ Special inodes ext4 reserves some inode for special features, as follows: .. list-table:: - :widths: 1 79 + :widths: 6 70 :header-rows: 1 * - inode Number diff --git a/Documentation/filesystems/ext4/ondisk/super.rst b/Documentation/filesystems/ext4/super.rst index 5f81dd87e0b9..04ff079a2acf 100644 --- a/Documentation/filesystems/ext4/ondisk/super.rst +++ b/Documentation/filesystems/ext4/super.rst @@ -19,7 +19,7 @@ The ext4 superblock is laid out as follows in ``struct ext4_super_block``: .. list-table:: - :widths: 1 1 1 77 + :widths: 8 8 24 40 :header-rows: 1 * - Offset @@ -483,7 +483,7 @@ The ext4 superblock is laid out as follows in The superblock state is some combination of the following: .. list-table:: - :widths: 1 79 + :widths: 8 72 :header-rows: 1 * - Value @@ -500,7 +500,7 @@ The superblock state is some combination of the following: The superblock error policy is one of the following: .. list-table:: - :widths: 1 79 + :widths: 8 72 :header-rows: 1 * - Value @@ -517,7 +517,7 @@ The superblock error policy is one of the following: The filesystem creator is one of the following: .. list-table:: - :widths: 1 79 + :widths: 8 72 :header-rows: 1 * - Value @@ -538,7 +538,7 @@ The filesystem creator is one of the following: The superblock revision is one of the following: .. list-table:: - :widths: 1 79 + :widths: 8 72 :header-rows: 1 * - Value @@ -556,7 +556,7 @@ The superblock compatible features field is a combination of any of the following: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -595,7 +595,7 @@ The superblock incompatible features field is a combination of any of the following: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -647,7 +647,7 @@ The superblock read-only compatible features field is a combination of any of the following: .. list-table:: - :widths: 1 79 + :widths: 16 64 :header-rows: 1 * - Value @@ -702,7 +702,7 @@ the following: The ``s_def_hash_version`` field is one of the following: .. list-table:: - :widths: 1 79 + :widths: 8 72 :header-rows: 1 * - Value @@ -725,7 +725,7 @@ The ``s_def_hash_version`` field is one of the following: The ``s_default_mount_opts`` field is any combination of the following: .. list-table:: - :widths: 1 79 + :widths: 8 72 :header-rows: 1 * - Value @@ -767,7 +767,7 @@ The ``s_default_mount_opts`` field is any combination of the following: The ``s_flags`` field is any combination of the following: .. list-table:: - :widths: 1 79 + :widths: 8 72 :header-rows: 1 * - Value @@ -784,7 +784,7 @@ The ``s_flags`` field is any combination of the following: The ``s_encrypt_algos`` list can contain any of the following: .. list-table:: - :widths: 1 79 + :widths: 8 72 :header-rows: 1 * - Value diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt index e5edd29687b5..e46c2147ddf8 100644 --- a/Documentation/filesystems/f2fs.txt +++ b/Documentation/filesystems/f2fs.txt @@ -172,9 +172,10 @@ fault_type=%d Support configuring fault injection type, should be FAULT_DIR_DEPTH 0x000000100 FAULT_EVICT_INODE 0x000000200 FAULT_TRUNCATE 0x000000400 - FAULT_IO 0x000000800 + FAULT_READ_IO 0x000000800 FAULT_CHECKPOINT 0x000001000 FAULT_DISCARD 0x000002000 + FAULT_WRITE_IO 0x000004000 mode=%s Control block allocation mode which supports "adaptive" and "lfs". In "lfs" mode, there should be no random writes towards main area. @@ -211,6 +212,11 @@ fsync_mode=%s Control the policy of fsync. Currently supports "posix", non-atomic files likewise "nobarrier" mount option. test_dummy_encryption Enable dummy encryption, which provides a fake fscrypt context. The fake fscrypt context is used by xfstests. +checkpoint=%s Set to "disable" to turn off checkpointing. Set to "enable" + to reenable checkpointing. Is enabled by default. While + disabled, any unmounting or unexpected shutdowns will cause + the filesystem contents to appear as they did when the + filesystem was mounted with that option. ================================================================================ DEBUGFS ENTRIES diff --git a/Documentation/filesystems/nfs/00-INDEX b/Documentation/filesystems/nfs/00-INDEX deleted file mode 100644 index 53f3b596ac0d..000000000000 --- a/Documentation/filesystems/nfs/00-INDEX +++ /dev/null @@ -1,26 +0,0 @@ -00-INDEX - - this file (nfs-related documentation). -Exporting - - explanation of how to make filesystems exportable. -fault_injection.txt - - information for using fault injection on the server -knfsd-stats.txt - - statistics which the NFS server makes available to user space. -nfs.txt - - nfs client, and DNS resolution for fs_locations. -nfs41-server.txt - - info on the Linux server implementation of NFSv4 minor version 1. -nfs-rdma.txt - - how to install and setup the Linux NFS/RDMA client and server software -nfsd-admin-interfaces.txt - - Administrative interfaces for nfsd. -nfsroot.txt - - short guide on setting up a diskless box with NFS root filesystem. -pnfs.txt - - short explanation of some of the internals of the pnfs client code -rpc-cache.txt - - introduction to the caching mechanisms in the sunrpc layer. -idmapper.txt - - information for configuring request-keys to be used by idmapper -rpc-server-gss.txt - - Information on GSS authentication support in the NFS Server diff --git a/Documentation/fmc/00-INDEX b/Documentation/fmc/00-INDEX deleted file mode 100644 index 431c69570f43..000000000000 --- a/Documentation/fmc/00-INDEX +++ /dev/null @@ -1,38 +0,0 @@ - -Documentation in this directory comes from sections of the manual we -wrote for the externally-developed fmc-bus package. The complete -manual as of today (2013-02) is available in PDF format at -http://www.ohwr.org/projects/fmc-bus/files - -00-INDEX - - this file. - -FMC-and-SDB.txt - - What are FMC and SDB, basic concepts for this framework - -API.txt - - The functions that are exported by the bus driver - -parameters.txt - - The module parameters - -carrier.txt - - writing a carrier (a device) - -mezzanine.txt - - writing code for your mezzanine (a driver) - -identifiers.txt - - how identification and matching works - -fmc-fakedev.txt - - about drivers/fmc/fmc-fakedev.ko - -fmc-trivial.txt - - about drivers/fmc/fmc-trivial.ko - -fmc-write-eeprom.txt - - about drivers/fmc/fmc-write-eeprom.ko - -fmc-chardev.txt - - about drivers/fmc/fmc-chardev.ko diff --git a/Documentation/gpio/00-INDEX b/Documentation/gpio/00-INDEX deleted file mode 100644 index 17e19a68058f..000000000000 --- a/Documentation/gpio/00-INDEX +++ /dev/null @@ -1,4 +0,0 @@ -00-INDEX - - This file -sysfs.txt - - Information about the GPIO sysfs interface diff --git a/Documentation/hwmon/ina3221 b/Documentation/hwmon/ina3221 index 0ff74854cb2e..4b82cbfb551c 100644 --- a/Documentation/hwmon/ina3221 +++ b/Documentation/hwmon/ina3221 @@ -21,6 +21,8 @@ and power are calculated host-side from these. Sysfs entries ------------- +in[123]_label Voltage channel labels +in[123]_enable Voltage channel enable controls in[123]_input Bus voltage(mV) channels curr[123]_input Current(mA) measurement channels shunt[123]_resistor Shunt resistance(uOhm) channels diff --git a/Documentation/hwmon/lm75 b/Documentation/hwmon/lm75 index ac95edfcd907..2f1120f88c16 100644 --- a/Documentation/hwmon/lm75 +++ b/Documentation/hwmon/lm75 @@ -17,8 +17,8 @@ Supported chips: Addresses scanned: none Datasheet: Publicly available at the Maxim website http://www.maximintegrated.com/ - * Maxim MAX6625, MAX6626 - Prefixes: 'max6625', 'max6626' + * Maxim MAX6625, MAX6626, MAX31725, MAX31726 + Prefixes: 'max6625', 'max6626', 'max31725', 'max31726' Addresses scanned: none Datasheet: Publicly available at the Maxim website http://www.maxim-ic.com/ @@ -86,7 +86,7 @@ The LM75 is essentially an industry standard; there may be other LM75 clones not listed here, with or without various enhancements, that are supported. The clones are not detected by the driver, unless they reproduce the exact register tricks of the original LM75, and must -therefore be instantiated explicitly. Higher resolution up to 12-bit +therefore be instantiated explicitly. Higher resolution up to 16-bit is supported by this driver, other specific enhancements are not. The LM77 is not supported, contrary to what we pretended for a long time. diff --git a/Documentation/hwmon/ltc2978 b/Documentation/hwmon/ltc2978 index 9a49d3c90cd1..dfb2caa401d9 100644 --- a/Documentation/hwmon/ltc2978 +++ b/Documentation/hwmon/ltc2978 @@ -55,6 +55,10 @@ Supported chips: Prefix: 'ltm4676' Addresses scanned: - Datasheet: http://www.linear.com/product/ltm4676 + * Analog Devices LTM4686 + Prefix: 'ltm4686' + Addresses scanned: - + Datasheet: http://www.analog.com/ltm4686 Author: Guenter Roeck <linux@roeck-us.net> @@ -76,6 +80,7 @@ additional components on a single die. The chip is instantiated and reported as two separate chips on two different I2C bus addresses. LTM4675 is a dual 9A or single 18A μModule regulator LTM4676 is a dual 13A or single 26A uModule regulator. +LTM4686 is a dual 10A or single 20A uModule regulator. Usage Notes diff --git a/Documentation/hwmon/mc13783-adc b/Documentation/hwmon/mc13783-adc index d0e7b3fa9e75..05ccc9f159f1 100644 --- a/Documentation/hwmon/mc13783-adc +++ b/Documentation/hwmon/mc13783-adc @@ -2,12 +2,12 @@ Kernel driver mc13783-adc ========================= Supported chips: - * Freescale Atlas MC13783 + * Freescale MC13783 Prefix: 'mc13783' - Datasheet: http://www.freescale.com/files/rf_if/doc/data_sheet/MC13783.pdf?fsrch=1 - * Freescale Atlas MC13892 + Datasheet: https://www.nxp.com/docs/en/data-sheet/MC13783.pdf + * Freescale MC13892 Prefix: 'mc13892' - Datasheet: http://cache.freescale.com/files/analog/doc/data_sheet/MC13892.pdf?fsrch=1&sr=1 + Datasheet: https://www.nxp.com/docs/en/data-sheet/MC13892.pdf Authors: Sascha Hauer <s.hauer@pengutronix.de> diff --git a/Documentation/ide/00-INDEX b/Documentation/ide/00-INDEX deleted file mode 100644 index 22f98ca79539..000000000000 --- a/Documentation/ide/00-INDEX +++ /dev/null @@ -1,14 +0,0 @@ -00-INDEX - - this file -ChangeLog.ide-cd.1994-2004 - - ide-cd changelog -ChangeLog.ide-floppy.1996-2002 - - ide-floppy changelog -ChangeLog.ide-tape.1995-2002 - - ide-tape changelog -ide-tape.txt - - info on the IDE ATAPI streaming tape driver -ide.txt - - important info for users of ATA devices (IDE/EIDE disks and CD-ROMS). -warm-plug-howto.txt - - using sysfs to remove and add IDE devices.
\ No newline at end of file diff --git a/Documentation/index.rst b/Documentation/index.rst index 5db7e87c7cb1..c858c2e66e36 100644 --- a/Documentation/index.rst +++ b/Documentation/index.rst @@ -22,10 +22,7 @@ The following describes the license of the Linux kernel source code (GPLv2), how to properly mark the license of individual files in the source tree, as well as links to the full license text. -.. toctree:: - :maxdepth: 2 - - process/license-rules.rst +* :ref:`kernel_licensing` User-oriented documentation --------------------------- diff --git a/Documentation/input/event-codes.rst b/Documentation/input/event-codes.rst index a8c0873beb95..cef220c176a4 100644 --- a/Documentation/input/event-codes.rst +++ b/Documentation/input/event-codes.rst @@ -190,7 +190,16 @@ A few EV_REL codes have special meanings: * REL_WHEEL, REL_HWHEEL: - These codes are used for vertical and horizontal scroll wheels, - respectively. + respectively. The value is the number of "notches" moved on the wheel, the + physical size of which varies by device. For high-resolution wheels (which + report multiple events for each notch of movement, or do not have notches) + this may be an approximation based on the high-resolution scroll events. + +* REL_WHEEL_HI_RES: + + - If a vertical scroll wheel supports high-resolution scrolling, this code + will be emitted in addition to REL_WHEEL. The value is the (approximate) + distance travelled by the user's finger, in microns. EV_ABS ------ diff --git a/Documentation/ioctl/00-INDEX b/Documentation/ioctl/00-INDEX deleted file mode 100644 index c1a925787950..000000000000 --- a/Documentation/ioctl/00-INDEX +++ /dev/null @@ -1,12 +0,0 @@ -00-INDEX - - this file -botching-up-ioctls.txt - - how to avoid botching up ioctls -cdrom.txt - - summary of CDROM ioctl calls -hdio.txt - - summary of HDIO_ ioctl calls -ioctl-decoding.txt - - how to decode the bits of an IOCTL code -ioctl-number.txt - - how to implement and register device/driver ioctl calls diff --git a/Documentation/isdn/00-INDEX b/Documentation/isdn/00-INDEX deleted file mode 100644 index 2d1889b6c1fa..000000000000 --- a/Documentation/isdn/00-INDEX +++ /dev/null @@ -1,42 +0,0 @@ -00-INDEX - - this file (info on ISDN implementation for Linux) -CREDITS - - list of the kind folks that brought you this stuff. -HiSax.cert - - information about the ITU approval certification of the HiSax driver. -INTERFACE - - description of isdn4linux Link Level and Hardware Level interfaces. -INTERFACE.fax - - description of the fax subinterface of isdn4linux. -INTERFACE.CAPI - - description of kernel CAPI Link Level to Hardware Level interface. -README - - general info on what you need and what to do for Linux ISDN. -README.FAQ - - general info for FAQ. -README.HiSax - - info on the HiSax driver which replaces the old teles. -README.audio - - info for running audio over ISDN. -README.avmb1 - - info on driver for AVM-B1 ISDN card. -README.concap - - info on "CONCAP" encapsulation protocol interface used for X.25. -README.diversion - - info on module for isdn diversion services. -README.fax - - info for using Fax over ISDN. -README.gigaset - - info on the drivers for Siemens Gigaset ISDN adapters -README.hfc-pci - - info on hfc-pci based cards. -README.hysdn - - info on driver for Hypercope active HYSDN cards -README.mISDN - - info on the Modular ISDN subsystem (mISDN) -README.syncppp - - info on running Sync PPP over ISDN. -README.x25 - - info for running X.25 over ISDN. -syncPPP.FAQ - - frequently asked questions about running PPP over ISDN. diff --git a/Documentation/kbuild/00-INDEX b/Documentation/kbuild/00-INDEX deleted file mode 100644 index 8c5e6aa78004..000000000000 --- a/Documentation/kbuild/00-INDEX +++ /dev/null @@ -1,14 +0,0 @@ -00-INDEX - - this file: info on the kernel build process -headers_install.txt - - how to export Linux headers for use by userspace -kbuild.txt - - developer information on kbuild -kconfig.txt - - usage help for make *config -kconfig-language.txt - - specification of Config Language, the language in Kconfig files -makefiles.txt - - developer information for linux kernel makefiles -modules.txt - - how to build modules and to install them diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt index 0f00f9c164ac..23b0c8b20cd1 100644 --- a/Documentation/kernel-per-CPU-kthreads.txt +++ b/Documentation/kernel-per-CPU-kthreads.txt @@ -321,7 +321,7 @@ To reduce its OS jitter, do at least one of the following: to do. Name: - rcuob/%d, rcuop/%d, and rcuos/%d + rcuop/%d and rcuos/%d Purpose: Offload RCU callbacks from the corresponding CPU. diff --git a/Documentation/laptops/00-INDEX b/Documentation/laptops/00-INDEX deleted file mode 100644 index 86169dc766f7..000000000000 --- a/Documentation/laptops/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - This file -asus-laptop.txt - - information on the Asus Laptop Extras driver. -disk-shock-protection.txt - - information on hard disk shock protection. -laptop-mode.txt - - how to conserve battery power using laptop-mode. -sony-laptop.txt - - Sony Notebook Control Driver (SNC) Readme. -sonypi.txt - - info on Linux Sony Programmable I/O Device support. -thinkpad-acpi.txt - - information on the (IBM and Lenovo) ThinkPad ACPI Extras driver. -toshiba_haps.txt - - information on the Toshiba HDD Active Protection Sensor driver. diff --git a/Documentation/leds/00-INDEX b/Documentation/leds/00-INDEX deleted file mode 100644 index ae626b29a740..000000000000 --- a/Documentation/leds/00-INDEX +++ /dev/null @@ -1,32 +0,0 @@ -00-INDEX - - This file -leds-blinkm.txt - - Driver for BlinkM LED-devices. -leds-class.txt - - documents LED handling under Linux. -leds-class-flash.txt - - documents flash LED handling under Linux. -leds-lm3556.txt - - notes on how to use the leds-lm3556 driver. -leds-lp3944.txt - - notes on how to use the leds-lp3944 driver. -leds-lp5521.txt - - notes on how to use the leds-lp5521 driver. -leds-lp5523.txt - - notes on how to use the leds-lp5523 driver. -leds-lp5562.txt - - notes on how to use the leds-lp5562 driver. -leds-lp55xx.txt - - description about lp55xx common driver. -leds-lm3556.txt - - notes on how to use the leds-lm3556 driver. -leds-mlxcpld.txt - - notes on how to use the leds-mlxcpld driver. -ledtrig-oneshot.txt - - One-shot LED trigger for both sporadic and dense events. -ledtrig-transient.txt - - LED Transient Trigger, one shot timer activation. -ledtrig-usbport.txt - - notes on how to use the drivers/usb/core/ledtrig-usbport.c trigger. -uleds.txt - - notes on how to use the uleds driver. diff --git a/Documentation/locking/00-INDEX b/Documentation/locking/00-INDEX deleted file mode 100644 index c256c9bee2a4..000000000000 --- a/Documentation/locking/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - this file. -lockdep-design.txt - - documentation on the runtime locking correctness validator. -lockstat.txt - - info on collecting statistics on locks (and contention). -mutex-design.txt - - info on the generic mutex subsystem. -rt-mutex-design.txt - - description of the RealTime mutex implementation design. -rt-mutex.txt - - desc. of RT-mutex subsystem with PI (Priority Inheritance) support. -spinlocks.txt - - info on using spinlocks to provide exclusive access in kernel. -ww-mutex-design.txt - - Intro to Mutex wait/would deadlock handling.s diff --git a/Documentation/locking/lockstat.txt b/Documentation/locking/lockstat.txt index 5786ad2cd5e6..fdbeb0c45ef3 100644 --- a/Documentation/locking/lockstat.txt +++ b/Documentation/locking/lockstat.txt @@ -91,7 +91,7 @@ Look at the current lock statistics: 07 &mm->mmap_sem-R: 37 100 1.31 299502.61 325629.52 3256.30 212344 34316685 0.10 7744.91 95016910.20 2.77 08 --------------- 09 &mm->mmap_sem 1 [<ffffffff811502a7>] khugepaged_scan_mm_slot+0x57/0x280 -19 &mm->mmap_sem 96 [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510 +10 &mm->mmap_sem 96 [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510 11 &mm->mmap_sem 34 [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0 12 &mm->mmap_sem 17 [<ffffffff81127e71>] vm_munmap+0x41/0x80 13 --------------- diff --git a/Documentation/m68k/00-INDEX b/Documentation/m68k/00-INDEX deleted file mode 100644 index 2be8c6b00e74..000000000000 --- a/Documentation/m68k/00-INDEX +++ /dev/null @@ -1,7 +0,0 @@ -00-INDEX - - this file -README.buddha - - Amiga Buddha and Catweasel IDE Driver -kernel-options.txt - - command line options for Linux/m68k - diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 0d8d7ef131e9..c1d913944ad8 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -471,8 +471,7 @@ And a couple of implicit varieties: operations after the ACQUIRE operation will appear to happen after the ACQUIRE operation with respect to the other components of the system. ACQUIRE operations include LOCK operations and both smp_load_acquire() - and smp_cond_acquire() operations. The later builds the necessary ACQUIRE - semantics from relying on a control dependency and smp_rmb(). + and smp_cond_load_acquire() operations. Memory operations that occur before an ACQUIRE operation may appear to happen after it completes. diff --git a/Documentation/mips/00-INDEX b/Documentation/mips/00-INDEX deleted file mode 100644 index 8ae9cffc2262..000000000000 --- a/Documentation/mips/00-INDEX +++ /dev/null @@ -1,4 +0,0 @@ -00-INDEX - - this file. -AU1xxx_IDE.README - - README for MIPS AU1XXX IDE driver. diff --git a/Documentation/mmc/00-INDEX b/Documentation/mmc/00-INDEX deleted file mode 100644 index 4623bc0aa0bb..000000000000 --- a/Documentation/mmc/00-INDEX +++ /dev/null @@ -1,10 +0,0 @@ -00-INDEX - - this file -mmc-dev-attrs.txt - - info on SD and MMC device attributes -mmc-dev-parts.txt - - info on SD and MMC device partitions -mmc-async-req.txt - - info on mmc asynchronous requests -mmc-tools.txt - - info on mmc-utils tools diff --git a/Documentation/mtd/nand/pxa3xx-nand.txt b/Documentation/mtd/nand/pxa3xx-nand.txt deleted file mode 100644 index 1074cbc67ec6..000000000000 --- a/Documentation/mtd/nand/pxa3xx-nand.txt +++ /dev/null @@ -1,113 +0,0 @@ - -About this document -=================== - -Some notes about Marvell's NAND controller available in PXA and Armada 370/XP -SoC (aka NFCv1 and NFCv2), with an emphasis on the latter. - -NFCv2 controller background -=========================== - -The controller has a 2176 bytes FIFO buffer. Therefore, in order to support -larger pages, I/O operations on 4 KiB and 8 KiB pages is done with a set of -chunked transfers. - -For instance, if we choose a 2048 data chunk and set "BCH" ECC (see below) -we'll have this layout in the pages: - - ------------------------------------------------------------------------------ - | 2048B data | 32B spare | 30B ECC || 2048B data | 32B spare | 30B ECC | ... | - ------------------------------------------------------------------------------ - -The driver reads the data and spare portions independently and builds an internal -buffer with this layout (in the 4 KiB page case): - - ------------------------------------------ - | 4096B data | 64B spare | - ------------------------------------------ - -Also, for the READOOB command the driver disables the ECC and reads a 'spare + ECC' -OOB, one per chunk read. - - ------------------------------------------------------------------- - | 4096B data | 32B spare | 30B ECC | 32B spare | 30B ECC | - ------------------------------------------------------------------- - -So, in order to achieve reading (for instance), we issue several READ0 commands -(with some additional controller-specific magic) and read two chunks of 2080B -(2048 data + 32 spare) each. -The driver accommodates this data to expose the NAND core a contiguous buffer -(4096 data + spare) or (4096 + spare + ECC + spare + ECC). - -ECC -=== - -The controller has built-in hardware ECC capabilities. In addition it is -configurable between two modes: 1) Hamming, 2) BCH. - -Note that the actual BCH mode: BCH-4 or BCH-8 will depend on the way -the controller is configured to transfer the data. - -In the BCH mode the ECC code will be calculated for each transferred chunk -and expected to be located (when reading/programming) right after the spare -bytes as the figure above shows. - -So, repeating the above scheme, a 2048B data chunk will be followed by 32B -spare, and then the ECC controller will read/write the ECC code (30B in -this case): - - ------------------------------------ - | 2048B data | 32B spare | 30B ECC | - ------------------------------------ - -If the ECC mode is 'BCH' then the ECC is *always* 30 bytes long. -If the ECC mode is 'Hamming' the ECC is 6 bytes long, for each 512B block. -So in Hamming mode, a 2048B page will have a 24B ECC. - -Despite all of the above, the controller requires the driver to only read or -write in multiples of 8-bytes, because the data buffer is 64-bits. - -OOB -=== - -Because of the above scheme, and because the "spare" OOB is really located in -the middle of a page, spare OOB cannot be read or write independently of the -data area. In other words, in order to read the OOB (aka READOOB), the entire -page (aka READ0) has to be read. - -In the same sense, in order to write to the spare OOB the driver has to write -an *entire* page. - -Factory bad blocks handling -=========================== - -Given the ECC BCH requires to layout the device's pages in a split -data/OOB/data/OOB way, the controller has a view of the flash page that's -different from the specified (aka the manufacturer's) view. In other words, - -Factory view: - - ----------------------------------------------- - | Data |x OOB | - ----------------------------------------------- - -Driver's view: - - ----------------------------------------------- - | Data | OOB | Data x | OOB | - ----------------------------------------------- - -It can be seen from the above, that the factory bad block marker must be -searched within the 'data' region, and not in the usual OOB region. - -In addition, this means under regular usage the driver will write such -position (since it belongs to the data region) and every used block is -likely to be marked as bad. - -For this reason, marking the block as bad in the OOB is explicitly -disabled by using the NAND_BBT_NO_OOB_BBM option in the driver. The rationale -for this is that there's no point in marking a block as bad, because good -blocks are also 'marked as bad' (in the OOB BBM sense) under normal usage. - -Instead, the driver relies on the bad block table alone, and should only perform -the bad block scan on the very first time (when the device hasn't been used). diff --git a/Documentation/netlabel/00-INDEX b/Documentation/netlabel/00-INDEX deleted file mode 100644 index 837bf35990e2..000000000000 --- a/Documentation/netlabel/00-INDEX +++ /dev/null @@ -1,10 +0,0 @@ -00-INDEX - - this file. -cipso_ipv4.txt - - documentation on the IPv4 CIPSO protocol engine. -draft-ietf-cipso-ipsecurity-01.txt - - IETF draft of the CIPSO protocol, dated 16 July 1992. -introduction.txt - - NetLabel introduction, READ THIS FIRST. -lsm_interface.txt - - documentation on the NetLabel kernel security module API. diff --git a/Documentation/netlabel/cipso_ipv4.txt b/Documentation/netlabel/cipso_ipv4.txt index 93dacb132c3c..a6075481fd60 100644 --- a/Documentation/netlabel/cipso_ipv4.txt +++ b/Documentation/netlabel/cipso_ipv4.txt @@ -6,11 +6,12 @@ May 17, 2006 * Overview -The NetLabel CIPSO/IPv4 protocol engine is based on the IETF Commercial IP -Security Option (CIPSO) draft from July 16, 1992. A copy of this draft can be -found in this directory, consult '00-INDEX' for the filename. While the IETF -draft never made it to an RFC standard it has become a de-facto standard for -labeled networking and is used in many trusted operating systems. +The NetLabel CIPSO/IPv4 protocol engine is based on the IETF Commercial +IP Security Option (CIPSO) draft from July 16, 1992. A copy of this +draft can be found in this directory +(draft-ietf-cipso-ipsecurity-01.txt). While the IETF draft never made +it to an RFC standard it has become a de-facto standard for labeled +networking and is used in many trusted operating systems. * Outbound Packet Processing diff --git a/Documentation/netlabel/introduction.txt b/Documentation/netlabel/introduction.txt index 5ecd8d1dcf4e..3caf77bcff0f 100644 --- a/Documentation/netlabel/introduction.txt +++ b/Documentation/netlabel/introduction.txt @@ -22,7 +22,7 @@ refrain from calling the protocol engines directly, instead they should use the NetLabel kernel security module API described below. Detailed information about each NetLabel protocol engine can be found in this -directory, consult '00-INDEX' for filenames. +directory. * Communication Layer diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX deleted file mode 100644 index 02a323c43261..000000000000 --- a/Documentation/networking/00-INDEX +++ /dev/null @@ -1,234 +0,0 @@ -00-INDEX - - this file -3c509.txt - - information on the 3Com Etherlink III Series Ethernet cards. -6pack.txt - - info on the 6pack protocol, an alternative to KISS for AX.25 -LICENSE.qla3xxx - - GPLv2 for QLogic Linux Networking HBA Driver -LICENSE.qlge - - GPLv2 for QLogic Linux qlge NIC Driver -LICENSE.qlcnic - - GPLv2 for QLogic Linux qlcnic NIC Driver -PLIP.txt - - PLIP: The Parallel Line Internet Protocol device driver -README.ipw2100 - - README for the Intel PRO/Wireless 2100 driver. -README.ipw2200 - - README for the Intel PRO/Wireless 2915ABG and 2200BG driver. -README.sb1000 - - info on General Instrument/NextLevel SURFboard1000 cable modem. -altera_tse.txt - - Altera Triple-Speed Ethernet controller. -arcnet-hardware.txt - - tons of info on ARCnet, hubs, jumper settings for ARCnet cards, etc. -arcnet.txt - - info on the using the ARCnet driver itself. -atm.txt - - info on where to get ATM programs and support for Linux. -ax25.txt - - info on using AX.25 and NET/ROM code for Linux -baycom.txt - - info on the driver for Baycom style amateur radio modems -bonding.txt - - Linux Ethernet Bonding Driver HOWTO: link aggregation in Linux. -bridge.txt - - where to get user space programs for ethernet bridging with Linux. -cdc_mbim.txt - - 3G/LTE USB modem (Mobile Broadband Interface Model) -checksum-offloads.txt - - Explanation of checksum offloads; LCO, RCO -cops.txt - - info on the COPS LocalTalk Linux driver -cs89x0.txt - - the Crystal LAN (CS8900/20-based) Ethernet ISA adapter driver -cxacru.txt - - Conexant AccessRunner USB ADSL Modem -cxacru-cf.py - - Conexant AccessRunner USB ADSL Modem configuration file parser -cxgb.txt - - Release Notes for the Chelsio N210 Linux device driver. -dccp.txt - - the Datagram Congestion Control Protocol (DCCP) (RFC 4340..42). -dctcp.txt - - DataCenter TCP congestion control -de4x5.txt - - the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver -decnet.txt - - info on using the DECnet networking layer in Linux. -dl2k.txt - - README for D-Link DL2000-based Gigabit Ethernet Adapters (dl2k.ko). -dm9000.txt - - README for the Simtec DM9000 Network driver. -dmfe.txt - - info on the Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver. -dns_resolver.txt - - The DNS resolver module allows kernel servies to make DNS queries. -driver.txt - - Softnet driver issues. -ena.txt - - info on Amazon's Elastic Network Adapter (ENA) -e100.txt - - info on Intel's EtherExpress PRO/100 line of 10/100 boards -e1000.txt - - info on Intel's E1000 line of gigabit ethernet boards -e1000e.txt - - README for the Intel Gigabit Ethernet Driver (e1000e). -eql.txt - - serial IP load balancing -fib_trie.txt - - Level Compressed Trie (LC-trie) notes: a structure for routing. -filter.txt - - Linux Socket Filtering -fore200e.txt - - FORE Systems PCA-200E/SBA-200E ATM NIC driver info. -framerelay.txt - - info on using Frame Relay/Data Link Connection Identifier (DLCI). -gen_stats.txt - - Generic networking statistics for netlink users. -generic-hdlc.txt - - The generic High Level Data Link Control (HDLC) layer. -generic_netlink.txt - - info on Generic Netlink -gianfar.txt - - Gianfar Ethernet Driver. -i40e.txt - - README for the Intel Ethernet Controller XL710 Driver (i40e). -i40evf.txt - - Short note on the Driver for the Intel(R) XL710 X710 Virtual Function -ieee802154.txt - - Linux IEEE 802.15.4 implementation, API and drivers -igb.txt - - README for the Intel Gigabit Ethernet Driver (igb). -igbvf.txt - - README for the Intel Gigabit Ethernet Driver (igbvf). -ip-sysctl.txt - - /proc/sys/net/ipv4/* variables -ip_dynaddr.txt - - IP dynamic address hack e.g. for auto-dialup links -ipddp.txt - - AppleTalk-IP Decapsulation and AppleTalk-IP Encapsulation -iphase.txt - - Interphase PCI ATM (i)Chip IA Linux driver info. -ipsec.txt - - Note on not compressing IPSec payload and resulting failed policy check. -ipv6.txt - - Options to the ipv6 kernel module. -ipvs-sysctl.txt - - Per-inode explanation of the /proc/sys/net/ipv4/vs interface. -irda.txt - - where to get IrDA (infrared) utilities and info for Linux. -ixgb.txt - - README for the Intel 10 Gigabit Ethernet Driver (ixgb). -ixgbe.txt - - README for the Intel 10 Gigabit Ethernet Driver (ixgbe). -ixgbevf.txt - - README for the Intel Virtual Function (VF) Driver (ixgbevf). -l2tp.txt - - User guide to the L2TP tunnel protocol. -lapb-module.txt - - programming information of the LAPB module. -ltpc.txt - - the Apple or Farallon LocalTalk PC card driver -mac80211-auth-assoc-deauth.txt - - authentication and association / deauth-disassoc with max80211 -mac80211-injection.txt - - HOWTO use packet injection with mac80211 -multiqueue.txt - - HOWTO for multiqueue network device support. -netconsole.txt - - The network console module netconsole.ko: configuration and notes. -netdev-features.txt - - Network interface features API description. -netdevices.txt - - info on network device driver functions exported to the kernel. -netif-msg.txt - - Design of the network interface message level setting (NETIF_MSG_*). -netlink_mmap.txt - - memory mapped I/O with netlink -nf_conntrack-sysctl.txt - - list of netfilter-sysctl knobs. -nfc.txt - - The Linux Near Field Communication (NFS) subsystem. -openvswitch.txt - - Open vSwitch developer documentation. -operstates.txt - - Overview of network interface operational states. -packet_mmap.txt - - User guide to memory mapped packet socket rings (PACKET_[RT]X_RING). -phonet.txt - - The Phonet packet protocol used in Nokia cellular modems. -phy.txt - - The PHY abstraction layer. -pktgen.txt - - User guide to the kernel packet generator (pktgen.ko). -policy-routing.txt - - IP policy-based routing -ppp_generic.txt - - Information about the generic PPP driver. -proc_net_tcp.txt - - Per inode overview of the /proc/net/tcp and /proc/net/tcp6 interfaces. -radiotap-headers.txt - - Background on radiotap headers. -ray_cs.txt - - Raylink Wireless LAN card driver info. -rds.txt - - Background on the reliable, ordered datagram delivery method RDS. -regulatory.txt - - Overview of the Linux wireless regulatory infrastructure. -rxrpc.txt - - Guide to the RxRPC protocol. -s2io.txt - - Release notes for Neterion Xframe I/II 10GbE driver. -scaling.txt - - Explanation of network scaling techniques: RSS, RPS, RFS, aRFS, XPS. -sctp.txt - - Notes on the Linux kernel implementation of the SCTP protocol. -secid.txt - - Explanation of the secid member in flow structures. -skfp.txt - - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info. -smc9.txt - - the driver for SMC's 9000 series of Ethernet cards -spider_net.txt - - README for the Spidernet Driver (as found in PS3 / Cell BE). -stmmac.txt - - README for the STMicro Synopsys Ethernet driver. -tc-actions-env-rules.txt - - rules for traffic control (tc) actions. -timestamping.txt - - overview of network packet timestamping variants. -tcp.txt - - short blurb on how TCP output takes place. -tcp-thin.txt - - kernel tuning options for low rate 'thin' TCP streams. -team.txt - - pointer to information for ethernet teaming devices. -tlan.txt - - ThunderLAN (Compaq Netelligent 10/100, Olicom OC-2xxx) driver info. -tproxy.txt - - Transparent proxy support user guide. -tuntap.txt - - TUN/TAP device driver, allowing user space Rx/Tx of packets. -udplite.txt - - UDP-Lite protocol (RFC 3828) introduction. -vortex.txt - - info on using 3Com Vortex (3c590, 3c592, 3c595, 3c597) Ethernet cards. -vxge.txt - - README for the Neterion X3100 PCIe Server Adapter. -vxlan.txt - - Virtual extensible LAN overview -x25.txt - - general info on X.25 development. -x25-iface.txt - - description of the X.25 Packet Layer to LAPB device interface. -xfrm_device.txt - - description of XFRM offload API -xfrm_proc.txt - - description of the statistics package for XFRM. -xfrm_sync.txt - - sync patches for XFRM enable migration of an SA between hosts. -xfrm_sysctl.txt - - description of the XFRM configuration options. -z8530drv.txt - - info about Linux driver for Z8530 based HDLC cards for AX.25 diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst index ff929cfab4f4..4ae4f9d8f8fe 100644 --- a/Documentation/networking/af_xdp.rst +++ b/Documentation/networking/af_xdp.rst @@ -159,8 +159,8 @@ log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050 and 3000 refers to the same chunk. -UMEM Completetion Ring -~~~~~~~~~~~~~~~~~~~~~~ +UMEM Completion Ring +~~~~~~~~~~~~~~~~~~~~ The Completion Ring is used transfer ownership of UMEM frames from kernel-space to user-space. Just like the Fill ring, UMEM indicies are diff --git a/Documentation/networking/defza.txt b/Documentation/networking/defza.txt new file mode 100644 index 000000000000..663e4a906751 --- /dev/null +++ b/Documentation/networking/defza.txt @@ -0,0 +1,57 @@ +Notes on the DEC FDDIcontroller 700 (DEFZA-xx) driver v.1.1.4. + + +DEC FDDIcontroller 700 is DEC's first-generation TURBOchannel FDDI +network card, designed in 1990 specifically for the DECstation 5000 +model 200 workstation. The board is a single attachment station and +it was manufactured in two variations, both of which are supported. + +First is the SAS MMF DEFZA-AA option, the original design implementing +the standard MMF-PMD, however with a pair of ST connectors rather than +the usual MIC connector. The other one is the SAS ThinWire/STP DEFZA-CA +option, denoted 700-C, with the network medium selectable by a switch +between the DEC proprietary ThinWire-PMD using a BNC connector and the +standard STP-PMD using a DE-9F connector. This option can interface to +a DECconcentrator 500 device and, in the case of the STP-PMD, also other +FDDI equipment and was designed to make it easier to transition from +existing IEEE 802.3 10BASE2 Ethernet and IEEE 802.5 Token Ring networks +by providing means to reuse existing cabling. + +This driver handles any number of cards installed in a single system. +They get fddi0, fddi1, etc. interface names assigned in the order of +increasing TURBOchannel slot numbers. + +The board only supports DMA on the receive side. Transmission involves +the use of PIO. As a result under a heavy transmission load there will +be a significant impact on system performance. + +The board supports a 64-entry CAM for matching destination addresses. +Two entries are preoccupied by the Directed Beacon and Ring Purger +multicast addresses and the rest is used as a multicast filter. An +all-multi mode is also supported for LLC frames and it is used if +requested explicitly or if the CAM overflows. The promiscuous mode +supports separate enables for LLC and SMT frames, but this driver +doesn't support changing them individually. + + +Known problems: + +None. + + +To do: + +5. MAC address change. The card does not support changing the Media + Access Controller's address registers but a similar effect can be + achieved by adding an alias to the CAM. There is no way to disable + matching against the original address though. + +7. Queueing incoming/outgoing SMT frames in the driver if the SMT + receive/RMC transmit ring is full. (?) + +8. Retrieving/reporting FDDI/SNMP stats. + + +Both success and failure reports are welcome. + +Maciej W. Rozycki <macro@linux-mips.org> diff --git a/Documentation/networking/devlink-params-bnxt.txt b/Documentation/networking/devlink-params-bnxt.txt new file mode 100644 index 000000000000..481aa303d5b4 --- /dev/null +++ b/Documentation/networking/devlink-params-bnxt.txt @@ -0,0 +1,18 @@ +enable_sriov [DEVICE, GENERIC] + Configuration mode: Permanent + +ignore_ari [DEVICE, GENERIC] + Configuration mode: Permanent + +msix_vec_per_pf_max [DEVICE, GENERIC] + Configuration mode: Permanent + +msix_vec_per_pf_min [DEVICE, GENERIC] + Configuration mode: Permanent + +gre_ver_check [DEVICE, DRIVER-SPECIFIC] + Generic Routing Encapsulation (GRE) version check will + be enabled in the device. If disabled, device skips + version checking for incoming packets. + Type: Boolean + Configuration mode: Permanent diff --git a/Documentation/networking/devlink-params.txt b/Documentation/networking/devlink-params.txt new file mode 100644 index 000000000000..ae444ffe73ac --- /dev/null +++ b/Documentation/networking/devlink-params.txt @@ -0,0 +1,42 @@ +Devlink configuration parameters +================================ +Following is the list of configuration parameters via devlink interface. +Each parameter can be generic or driver specific and are device level +parameters. + +Note that the driver-specific files should contain the generic params +they support to, with supported config modes. + +Each parameter can be set in different configuration modes: + runtime - set while driver is running, no reset required. + driverinit - applied while driver initializes, requires restart + driver by devlink reload command. + permanent - written to device's non-volatile memory, hard reset + required. + +Following is the list of parameters: +==================================== +enable_sriov [DEVICE, GENERIC] + Enable Single Root I/O Virtualisation (SRIOV) in + the device. + Type: Boolean + +ignore_ari [DEVICE, GENERIC] + Ignore Alternative Routing-ID Interpretation (ARI) + capability. If enabled, adapter will ignore ARI + capability even when platforms has the support + enabled and creates same number of partitions when + platform does not support ARI. + Type: Boolean + +msix_vec_per_pf_max [DEVICE, GENERIC] + Provides the maximum number of MSIX interrupts that + a device can create. Value is same across all + physical functions (PFs) in the device. + Type: u32 + +msix_vec_per_pf_min [DEVICE, GENERIC] + Provides the minimum number of MSIX interrupts required + for the device initialization. Value is same across all + physical functions (PFs) in the device. + Type: u32 diff --git a/Documentation/networking/dpaa2/ethernet-driver.rst b/Documentation/networking/dpaa2/ethernet-driver.rst new file mode 100644 index 000000000000..90ec940749e8 --- /dev/null +++ b/Documentation/networking/dpaa2/ethernet-driver.rst @@ -0,0 +1,185 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. include:: <isonum.txt> + +=============================== +DPAA2 Ethernet driver +=============================== + +:Copyright: |copy| 2017-2018 NXP + +This file provides documentation for the Freescale DPAA2 Ethernet driver. + +Supported Platforms +=================== +This driver provides networking support for Freescale DPAA2 SoCs, e.g. +LS2080A, LS2088A, LS1088A. + + +Architecture Overview +===================== +Unlike regular NICs, in the DPAA2 architecture there is no single hardware block +representing network interfaces; instead, several separate hardware resources +concur to provide the networking functionality: + +- network interfaces +- queues, channels +- buffer pools +- MAC/PHY + +All hardware resources are allocated and configured through the Management +Complex (MC) portals. MC abstracts most of these resources as DPAA2 objects +and exposes ABIs through which they can be configured and controlled. A few +hardware resources, like queues, do not have a corresponding MC object and +are treated as internal resources of other objects. + +For a more detailed description of the DPAA2 architecture and its object +abstractions see *Documentation/networking/dpaa2/overview.rst*. + +Each Linux net device is built on top of a Datapath Network Interface (DPNI) +object and uses Buffer Pools (DPBPs), I/O Portals (DPIOs) and Concentrators +(DPCONs). + +Configuration interface:: + + ----------------------- + | DPAA2 Ethernet Driver | + ----------------------- + . . . + . . . + . . . . . . . . . . . . + . . . + . . . + ---------- ---------- ----------- + | DPBP API | | DPNI API | | DPCON API | + ---------- ---------- ----------- + . . . software + ======= . ========== . ============ . =================== + . . . hardware + ------------------------------------------ + | MC hardware portals | + ------------------------------------------ + . . . + . . . + ------ ------ ------- + | DPBP | | DPNI | | DPCON | + ------ ------ ------- + +The DPNIs are network interfaces without a direct one-on-one mapping to PHYs. +DPBPs represent hardware buffer pools. Packet I/O is performed in the context +of DPCON objects, using DPIO portals for managing and communicating with the +hardware resources. + +Datapath (I/O) interface:: + + ----------------------------------------------- + | DPAA2 Ethernet Driver | + ----------------------------------------------- + | ^ ^ | | + | | | | | + enqueue| dequeue| data | dequeue| seed | + (Tx) | (Rx, TxC)| avail.| request| buffers| + | | notify| | | + | | | | | + V | | V V + ----------------------------------------------- + | DPIO Driver | + ----------------------------------------------- + | | | | | software + | | | | | ================ + | | | | | hardware + ----------------------------------------------- + | I/O hardware portals | + ----------------------------------------------- + | ^ ^ | | + | | | | | + | | | V | + V | ================ V + ---------------------- | ------------- + queues ---------------------- | | Buffer pool | + ---------------------- | ------------- + ======================= + Channel + +Datapath I/O (DPIO) portals provide enqueue and dequeue services, data +availability notifications and buffer pool management. DPIOs are shared between +all DPAA2 objects (and implicitly all DPAA2 kernel drivers) that work with data +frames, but must be affine to the CPUs for the purpose of traffic distribution. + +Frames are transmitted and received through hardware frame queues, which can be +grouped in channels for the purpose of hardware scheduling. The Ethernet driver +enqueues TX frames on egress queues and after transmission is complete a TX +confirmation frame is sent back to the CPU. + +When frames are available on ingress queues, a data availability notification +is sent to the CPU; notifications are raised per channel, so even if multiple +queues in the same channel have available frames, only one notification is sent. +After a channel fires a notification, is must be explicitly rearmed. + +Each network interface can have multiple Rx, Tx and confirmation queues affined +to CPUs, and one channel (DPCON) for each CPU that services at least one queue. +DPCONs are used to distribute ingress traffic to different CPUs via the cores' +affine DPIOs. + +The role of hardware buffer pools is storage of ingress frame data. Each network +interface has a privately owned buffer pool which it seeds with kernel allocated +buffers. + + +DPNIs are decoupled from PHYs; a DPNI can be connected to a PHY through a DPMAC +object or to another DPNI through an internal link, but the connection is +managed by MC and completely transparent to the Ethernet driver. + +:: + + --------- --------- --------- + | eth if1 | | eth if2 | | eth ifn | + --------- --------- --------- + . . . + . . . + . . . + --------------------------- + | DPAA2 Ethernet Driver | + --------------------------- + . . . + . . . + . . . + ------ ------ ------ ------- + | DPNI | | DPNI | | DPNI | | DPMAC |----+ + ------ ------ ------ ------- | + | | | | | + | | | | ----- + =========== ================== | PHY | + ----- + +Creating a Network Interface +============================ +A net device is created for each DPNI object probed on the MC bus. Each DPNI has +a number of properties which determine the network interface configuration +options and associated hardware resources. + +DPNI objects (and the other DPAA2 objects needed for a network interface) can be +added to a container on the MC bus in one of two ways: statically, through a +Datapath Layout Binary file (DPL) that is parsed by MC at boot time; or created +dynamically at runtime, via the DPAA2 objects APIs. + + +Features & Offloads +=================== +Hardware checksum offloading is supported for TCP and UDP over IPv4/6 frames. +The checksum offloads can be independently configured on RX and TX through +ethtool. + +Hardware offload of unicast and multicast MAC filtering is supported on the +ingress path and permanently enabled. + +Scatter-gather frames are supported on both RX and TX paths. On TX, SG support +is configurable via ethtool; on RX it is always enabled. + +The DPAA2 hardware can process jumbo Ethernet frames of up to 10K bytes. + +The Ethernet driver defines a static flow hashing scheme that distributes +traffic based on a 5-tuple key: src IP, dst IP, IP proto, L4 src port, +L4 dst port. No user configuration is supported for now. + +Hardware specific statistics for the network interface as well as some +non-standard driver stats can be consulted through ethtool -S option. diff --git a/Documentation/networking/dpaa2/index.rst b/Documentation/networking/dpaa2/index.rst index 10bea113a7bc..67bd87fe6c53 100644 --- a/Documentation/networking/dpaa2/index.rst +++ b/Documentation/networking/dpaa2/index.rst @@ -7,3 +7,4 @@ DPAA2 Documentation overview dpio-driver + ethernet-driver diff --git a/Documentation/networking/e100.rst b/Documentation/networking/e100.rst index f81111eba9c5..5e2839b4ec92 100644 --- a/Documentation/networking/e100.rst +++ b/Documentation/networking/e100.rst @@ -1,4 +1,5 @@ -============================================================== +.. SPDX-License-Identifier: GPL-2.0+ + Linux* Base Driver for the Intel(R) PRO/100 Family of Adapters ============================================================== diff --git a/Documentation/networking/e1000.rst b/Documentation/networking/e1000.rst index f10dd4086921..6379d4d20771 100644 --- a/Documentation/networking/e1000.rst +++ b/Documentation/networking/e1000.rst @@ -1,4 +1,5 @@ -=========================================================== +.. SPDX-License-Identifier: GPL-2.0+ + Linux* Base Driver for Intel(R) Ethernet Network Connection =========================================================== diff --git a/Documentation/networking/e1000e.rst b/Documentation/networking/e1000e.rst new file mode 100644 index 000000000000..33554e5416c5 --- /dev/null +++ b/Documentation/networking/e1000e.rst @@ -0,0 +1,382 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Driver for Intel(R) Ethernet Network Connection +====================================================== + +Intel Gigabit Linux driver. +Copyright(c) 2008-2018 Intel Corporation. + +Contents +======== + +- Identifying Your Adapter +- Command Line Parameters +- Additional Configurations +- Support + + +Identifying Your Adapter +======================== +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +https://www.intel.com/support + + +Command Line Parameters +======================= +If the driver is built as a module, the following optional parameters are used +by entering them on the command line with the modprobe command using this +syntax:: + + modprobe e1000e [<option>=<VAL1>,<VAL2>,...] + +There needs to be a <VAL#> for each network port in the system supported by +this driver. The values will be applied to each instance, in function order. +For example:: + + modprobe e1000e InterruptThrottleRate=16000,16000 + +In this case, there are two network ports supported by e1000e in the system. +The default value for each parameter is generally the recommended setting, +unless otherwise noted. + +NOTE: A descriptor describes a data buffer and attributes related to the data +buffer. This information is accessed by the hardware. + +InterruptThrottleRate +--------------------- +:Valid Range: 0,1,3,4,100-100000 +:Default Value: 3 + +Interrupt Throttle Rate controls the number of interrupts each interrupt +vector can generate per second. Increasing ITR lowers latency at the cost of +increased CPU utilization, though it may help throughput in some circumstances. + +Setting InterruptThrottleRate to a value greater or equal to 100 +will program the adapter to send out a maximum of that many interrupts +per second, even if more packets have come in. This reduces interrupt +load on the system and can lower CPU utilization under heavy load, +but will increase latency as packets are not processed as quickly. + +The default behaviour of the driver previously assumed a static +InterruptThrottleRate value of 8000, providing a good fallback value for +all traffic types, but lacking in small packet performance and latency. +The hardware can handle many more small packets per second however, and +for this reason an adaptive interrupt moderation algorithm was implemented. + +The driver has two adaptive modes (setting 1 or 3) in which +it dynamically adjusts the InterruptThrottleRate value based on the traffic +that it receives. After determining the type of incoming traffic in the last +timeframe, it will adjust the InterruptThrottleRate to an appropriate value +for that traffic. + +The algorithm classifies the incoming traffic every interval into +classes. Once the class is determined, the InterruptThrottleRate value is +adjusted to suit that traffic type the best. There are three classes defined: +"Bulk traffic", for large amounts of packets of normal size; "Low latency", +for small amounts of traffic and/or a significant percentage of small +packets; and "Lowest latency", for almost completely small packets or +minimal traffic. + + - 0: Off + Turns off any interrupt moderation and may improve small packet latency. + However, this is generally not suitable for bulk throughput traffic due + to the increased CPU utilization of the higher interrupt rate. + - 1: Dynamic mode + This mode attempts to moderate interrupts per vector while maintaining + very low latency. This can sometimes cause extra CPU utilization. If + planning on deploying e1000e in a latency sensitive environment, this + parameter should be considered. + - 3: Dynamic Conservative mode (default) + In dynamic conservative mode, the InterruptThrottleRate value is set to + 4000 for traffic that falls in class "Bulk traffic". If traffic falls in + the "Low latency" or "Lowest latency" class, the InterruptThrottleRate is + increased stepwise to 20000. This default mode is suitable for most + applications. + - 4: Simplified Balancing mode + In simplified mode the interrupt rate is based on the ratio of TX and + RX traffic. If the bytes per second rate is approximately equal, the + interrupt rate will drop as low as 2000 interrupts per second. If the + traffic is mostly transmit or mostly receive, the interrupt rate could + be as high as 8000. + - 100-100000: + Setting InterruptThrottleRate to a value greater or equal to 100 + will program the adapter to send at most that many interrupts per second, + even if more packets have come in. This reduces interrupt load on the + system and can lower CPU utilization under heavy load, but will increase + latency as packets are not processed as quickly. + +NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and +RxAbsIntDelay parameters. In other words, minimizing the receive and/or +transmit absolute delays does not force the controller to generate more +interrupts than what the Interrupt Throttle Rate allows. + +RxIntDelay +---------- +:Valid Range: 0-65535 (0=off) +:Default Value: 0 + +This value delays the generation of receive interrupts in units of 1.024 +microseconds. Receive interrupt reduction can improve CPU efficiency if +properly tuned for specific network traffic. Increasing this value adds extra +latency to frame reception and can end up decreasing the throughput of TCP +traffic. If the system is reporting dropped receives, this value may be set +too high, causing the driver to run out of available receive descriptors. + +CAUTION: When setting RxIntDelay to a value other than 0, adapters may hang +(stop transmitting) under certain network conditions. If this occurs a NETDEV +WATCHDOG message is logged in the system event log. In addition, the +controller is automatically reset, restoring the network connection. To +eliminate the potential for the hang ensure that RxIntDelay is set to 0. + +RxAbsIntDelay +------------- +:Valid Range: 0-65535 (0=off) +:Default Value: 8 + +This value, in units of 1.024 microseconds, limits the delay in which a +receive interrupt is generated. This value ensures that an interrupt is +generated after the initial packet is received within the set amount of time, +which is useful only if RxIntDelay is non-zero. Proper tuning, along with +RxIntDelay, may improve traffic throughput in specific network conditions. + +TxIntDelay +---------- +:Valid Range: 0-65535 (0=off) +:Default Value: 8 + +This value delays the generation of transmit interrupts in units of 1.024 +microseconds. Transmit interrupt reduction can improve CPU efficiency if +properly tuned for specific network traffic. If the system is reporting +dropped transmits, this value may be set too high causing the driver to run +out of available transmit descriptors. + +TxAbsIntDelay +------------- +:Valid Range: 0-65535 (0=off) +:Default Value: 32 + +This value, in units of 1.024 microseconds, limits the delay in which a +transmit interrupt is generated. It is useful only if TxIntDelay is non-zero. +It ensures that an interrupt is generated after the initial Packet is sent on +the wire within the set amount of time. Proper tuning, along with TxIntDelay, +may improve traffic throughput in specific network conditions. + +copybreak +--------- +:Valid Range: 0-xxxxxxx (0=off) +:Default Value: 256 + +The driver copies all packets below or equaling this size to a fresh receive +buffer before handing it up the stack. +This parameter differs from other parameters because it is a single (not 1,1,1 +etc.) parameter applied to all driver instances and it is also available +during runtime at /sys/module/e1000e/parameters/copybreak. + +To use copybreak, type:: + + modprobe e1000e.ko copybreak=128 + +SmartPowerDownEnable +-------------------- +:Valid Range: 0,1 +:Default Value: 0 (disabled) + +Allows the PHY to turn off in lower power states. The user can turn off this +parameter in supported chipsets. + +KumeranLockLoss +--------------- +:Valid Range: 0,1 +:Default Value: 1 (enabled) + +This workaround skips resetting the PHY at shutdown for the initial silicon +releases of ICH8 systems. + +IntMode +------- +:Valid Range: 0-2 +:Default Value: 0 + + +-------+----------------+ + | Value | Interrupt Mode | + +=======+================+ + | 0 | Legacy | + +-------+----------------+ + | 1 | MSI | + +-------+----------------+ + | 2 | MSI-X | + +-------+----------------+ + +IntMode allows load time control over the type of interrupt registered for by +the driver. MSI-X is required for multiple queue support, and some kernels and +combinations of kernel .config options will force a lower level of interrupt +support. + +This command will show different values for each type of interrupt:: + + cat /proc/interrupts + +CrcStripping +------------ +:Valid Range: 0,1 +:Default Value: 1 (enabled) + +Strip the CRC from received packets before sending up the network stack. If +you have a machine with a BMC enabled but cannot receive IPMI traffic after +loading or enabling the driver, try disabling this feature. + +WriteProtectNVM +--------------- +:Valid Range: 0,1 +:Default Value: 1 (enabled) + +If set to 1, configure the hardware to ignore all write/erase cycles to the +GbE region in the ICHx NVM (in order to prevent accidental corruption of the +NVM). This feature can be disabled by setting the parameter to 0 during initial +driver load. + +NOTE: The machine must be power cycled (full off/on) when enabling NVM writes +via setting the parameter to zero. Once the NVM has been locked (via the +parameter at 1 when the driver loads) it cannot be unlocked except via power +cycle. + +Debug +----- +:Valid Range: 0-16 (0=none,...,16=all) +:Default Value: 0 + +This parameter adjusts the level of debug messages displayed in the system logs. + + +Additional Features and Configurations +====================================== + +Jumbo Frames +------------ +Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) +to a value larger than the default value of 1500. + +Use the ifconfig command to increase the MTU size. For example, enter the +following where <x> is the interface number:: + + ifconfig eth<x> mtu 9000 up + +Alternatively, you can use the ip command as follows:: + + ip link set mtu 9000 dev eth<x> + ip link set up dev eth<x> + +This setting is not saved across reboots. The setting change can be made +permanent by adding 'MTU=9000' to the file: + +- For RHEL: /etc/sysconfig/network-scripts/ifcfg-eth<x> +- For SLES: /etc/sysconfig/network/<config_file> + +NOTE: The maximum MTU setting for Jumbo Frames is 8996. This value coincides +with the maximum Jumbo Frames size of 9018 bytes. + +NOTE: Using Jumbo frames at 10 or 100 Mbps is not supported and may result in +poor performance or loss of link. + +NOTE: The following adapters limit Jumbo Frames sized packets to a maximum of +4088 bytes: + + - Intel(R) 82578DM Gigabit Network Connection + - Intel(R) 82577LM Gigabit Network Connection + +The following adapters do not support Jumbo Frames: + + - Intel(R) PRO/1000 Gigabit Server Adapter + - Intel(R) PRO/1000 PM Network Connection + - Intel(R) 82562G 10/100 Network Connection + - Intel(R) 82562G-2 10/100 Network Connection + - Intel(R) 82562GT 10/100 Network Connection + - Intel(R) 82562GT-2 10/100 Network Connection + - Intel(R) 82562V 10/100 Network Connection + - Intel(R) 82562V-2 10/100 Network Connection + - Intel(R) 82566DC Gigabit Network Connection + - Intel(R) 82566DC-2 Gigabit Network Connection + - Intel(R) 82566DM Gigabit Network Connection + - Intel(R) 82566MC Gigabit Network Connection + - Intel(R) 82566MM Gigabit Network Connection + - Intel(R) 82567V-3 Gigabit Network Connection + - Intel(R) 82577LC Gigabit Network Connection + - Intel(R) 82578DC Gigabit Network Connection + +NOTE: Jumbo Frames cannot be configured on an 82579-based Network device if +MACSec is enabled on the system. + + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: + +https://www.kernel.org/pub/software/network/ethtool/ + +NOTE: When validating enable/disable tests on some parts (for example, 82578), +it is necessary to add a few seconds between tests when working with ethtool. + + +Speed and Duplex Configuration +------------------------------ +In addressing speed and duplex configuration issues, you need to distinguish +between copper-based adapters and fiber-based adapters. + +In the default mode, an Intel(R) Ethernet Network Adapter using copper +connections will attempt to auto-negotiate with its link partner to determine +the best setting. If the adapter cannot establish link with the link partner +using auto-negotiation, you may need to manually configure the adapter and link +partner to identical settings to establish link and pass packets. This should +only be needed when attempting to link with an older switch that does not +support auto-negotiation or one that has been forced to a specific speed or +duplex mode. Your link partner must match the setting you choose. 1 Gbps speeds +and higher cannot be forced. Use the autonegotiation advertising setting to +manually set devices for 1 Gbps and higher. + +Speed, duplex, and autonegotiation advertising are configured through the +ethtool* utility. + +Caution: Only experienced network administrators should force speed and duplex +or change autonegotiation advertising manually. The settings at the switch must +always match the adapter settings. Adapter performance may suffer or your +adapter may not operate if you configure the adapter differently from your +switch. + +An Intel(R) Ethernet Network Adapter using fiber-based connections, however, +will not attempt to auto-negotiate with its link partner since those adapters +operate only in full duplex and only at their native speed. + + +Enabling Wake on LAN* (WoL) +--------------------------- +WoL is configured through the ethtool* utility. + +WoL will be enabled on the system during the next shut down or reboot. For +this driver version, in order to enable WoL, the e1000e driver must be loaded +prior to shutting down or suspending the system. + +NOTE: Wake on LAN is only supported on port A for the following devices: +- Intel(R) PRO/1000 PT Dual Port Network Connection +- Intel(R) PRO/1000 PT Dual Port Server Connection +- Intel(R) PRO/1000 PT Dual Port Server Adapter +- Intel(R) PRO/1000 PF Dual Port Server Adapter +- Intel(R) PRO/1000 PT Quad Port Server Adapter +- Intel(R) Gigabit PT Quad Port Server ExpressModule + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/e1000e.txt b/Documentation/networking/e1000e.txt deleted file mode 100644 index 12089547baed..000000000000 --- a/Documentation/networking/e1000e.txt +++ /dev/null @@ -1,312 +0,0 @@ -Linux* Driver for Intel(R) Ethernet Network Connection -====================================================== - -Intel Gigabit Linux driver. -Copyright(c) 1999 - 2013 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Command Line Parameters -- Additional Configurations -- Support - -Identifying Your Adapter -======================== - -The e1000e driver supports all PCI Express Intel(R) Gigabit Network -Connections, except those that are 82575, 82576 and 82580-based*. - -* NOTE: The Intel(R) PRO/1000 P Dual Port Server Adapter is supported by - the e1000 driver, not the e1000e driver due to the 82546 part being used - behind a PCI Express bridge. - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/go/network/adapter/idguide.htm - -For the latest Intel network drivers for Linux, refer to the following -website. In the search field, enter your adapter name or type, or use the -networking link on the left to search for your adapter: - - http://support.intel.com/support/go/network/adapter/home.htm - -Command Line Parameters -======================= - -The default value for each parameter is generally the recommended setting, -unless otherwise noted. - -NOTES: For more information about the InterruptThrottleRate, - RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay - parameters, see the application note at: - http://www.intel.com/design/network/applnots/ap450.htm - -InterruptThrottleRate ---------------------- -Valid Range: 0,1,3,4,100-100000 (0=off, 1=dynamic, 3=dynamic conservative, - 4=simplified balancing) -Default Value: 3 - -The driver can limit the amount of interrupts per second that the adapter -will generate for incoming packets. It does this by writing a value to the -adapter that is based on the maximum amount of interrupts that the adapter -will generate per second. - -Setting InterruptThrottleRate to a value greater or equal to 100 -will program the adapter to send out a maximum of that many interrupts -per second, even if more packets have come in. This reduces interrupt -load on the system and can lower CPU utilization under heavy load, -but will increase latency as packets are not processed as quickly. - -The default behaviour of the driver previously assumed a static -InterruptThrottleRate value of 8000, providing a good fallback value for -all traffic types, but lacking in small packet performance and latency. -The hardware can handle many more small packets per second however, and -for this reason an adaptive interrupt moderation algorithm was implemented. - -The driver has two adaptive modes (setting 1 or 3) in which -it dynamically adjusts the InterruptThrottleRate value based on the traffic -that it receives. After determining the type of incoming traffic in the last -timeframe, it will adjust the InterruptThrottleRate to an appropriate value -for that traffic. - -The algorithm classifies the incoming traffic every interval into -classes. Once the class is determined, the InterruptThrottleRate value is -adjusted to suit that traffic type the best. There are three classes defined: -"Bulk traffic", for large amounts of packets of normal size; "Low latency", -for small amounts of traffic and/or a significant percentage of small -packets; and "Lowest latency", for almost completely small packets or -minimal traffic. - -In dynamic conservative mode, the InterruptThrottleRate value is set to 4000 -for traffic that falls in class "Bulk traffic". If traffic falls in the "Low -latency" or "Lowest latency" class, the InterruptThrottleRate is increased -stepwise to 20000. This default mode is suitable for most applications. - -For situations where low latency is vital such as cluster or -grid computing, the algorithm can reduce latency even more when -InterruptThrottleRate is set to mode 1. In this mode, which operates -the same as mode 3, the InterruptThrottleRate will be increased stepwise to -70000 for traffic in class "Lowest latency". - -In simplified mode the interrupt rate is based on the ratio of TX and -RX traffic. If the bytes per second rate is approximately equal, the -interrupt rate will drop as low as 2000 interrupts per second. If the -traffic is mostly transmit or mostly receive, the interrupt rate could -be as high as 8000. - -Setting InterruptThrottleRate to 0 turns off any interrupt moderation -and may improve small packet latency, but is generally not suitable -for bulk throughput traffic. - -NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and - RxAbsIntDelay parameters. In other words, minimizing the receive - and/or transmit absolute delays does not force the controller to - generate more interrupts than what the Interrupt Throttle Rate - allows. - -NOTE: When e1000e is loaded with default settings and multiple adapters - are in use simultaneously, the CPU utilization may increase non- - linearly. In order to limit the CPU utilization without impacting - the overall throughput, we recommend that you load the driver as - follows: - - modprobe e1000e InterruptThrottleRate=3000,3000,3000 - - This sets the InterruptThrottleRate to 3000 interrupts/sec for - the first, second, and third instances of the driver. The range - of 2000 to 3000 interrupts per second works on a majority of - systems and is a good starting point, but the optimal value will - be platform-specific. If CPU utilization is not a concern, use - RX_POLLING (NAPI) and default driver settings. - -RxIntDelay ----------- -Valid Range: 0-65535 (0=off) -Default Value: 0 - -This value delays the generation of receive interrupts in units of 1.024 -microseconds. Receive interrupt reduction can improve CPU efficiency if -properly tuned for specific network traffic. Increasing this value adds -extra latency to frame reception and can end up decreasing the throughput -of TCP traffic. If the system is reporting dropped receives, this value -may be set too high, causing the driver to run out of available receive -descriptors. - -CAUTION: When setting RxIntDelay to a value other than 0, adapters may - hang (stop transmitting) under certain network conditions. If - this occurs a NETDEV WATCHDOG message is logged in the system - event log. In addition, the controller is automatically reset, - restoring the network connection. To eliminate the potential - for the hang ensure that RxIntDelay is set to 0. - -RxAbsIntDelay -------------- -Valid Range: 0-65535 (0=off) -Default Value: 8 - -This value, in units of 1.024 microseconds, limits the delay in which a -receive interrupt is generated. Useful only if RxIntDelay is non-zero, -this value ensures that an interrupt is generated after the initial -packet is received within the set amount of time. Proper tuning, -along with RxIntDelay, may improve traffic throughput in specific network -conditions. - -TxIntDelay ----------- -Valid Range: 0-65535 (0=off) -Default Value: 8 - -This value delays the generation of transmit interrupts in units of -1.024 microseconds. Transmit interrupt reduction can improve CPU -efficiency if properly tuned for specific network traffic. If the -system is reporting dropped transmits, this value may be set too high -causing the driver to run out of available transmit descriptors. - -TxAbsIntDelay -------------- -Valid Range: 0-65535 (0=off) -Default Value: 32 - -This value, in units of 1.024 microseconds, limits the delay in which a -transmit interrupt is generated. Useful only if TxIntDelay is non-zero, -this value ensures that an interrupt is generated after the initial -packet is sent on the wire within the set amount of time. Proper tuning, -along with TxIntDelay, may improve traffic throughput in specific -network conditions. - -Copybreak ---------- -Valid Range: 0-xxxxxxx (0=off) -Default Value: 256 - -Driver copies all packets below or equaling this size to a fresh RX -buffer before handing it up the stack. - -This parameter is different than other parameters, in that it is a -single (not 1,1,1 etc.) parameter applied to all driver instances and -it is also available during runtime at -/sys/module/e1000e/parameters/copybreak - -SmartPowerDownEnable --------------------- -Valid Range: 0-1 -Default Value: 0 (disabled) - -Allows PHY to turn off in lower power states. The user can set this parameter -in supported chipsets. - -KumeranLockLoss ---------------- -Valid Range: 0-1 -Default Value: 1 (enabled) - -This workaround skips resetting the PHY at shutdown for the initial -silicon releases of ICH8 systems. - -IntMode -------- -Valid Range: 0-2 (0=legacy, 1=MSI, 2=MSI-X) -Default Value: 2 - -Allows changing the interrupt mode at module load time, without requiring a -recompile. If the driver load fails to enable a specific interrupt mode, the -driver will try other interrupt modes, from least to most compatible. The -interrupt order is MSI-X, MSI, Legacy. If specifying MSI (IntMode=1) -interrupts, only MSI and Legacy will be attempted. - -CrcStripping ------------- -Valid Range: 0-1 -Default Value: 1 (enabled) - -Strip the CRC from received packets before sending up the network stack. If -you have a machine with a BMC enabled but cannot receive IPMI traffic after -loading or enabling the driver, try disabling this feature. - -WriteProtectNVM ---------------- -Valid Range: 0,1 -Default Value: 1 - -If set to 1, configure the hardware to ignore all write/erase cycles to the -GbE region in the ICHx NVM (in order to prevent accidental corruption of the -NVM). This feature can be disabled by setting the parameter to 0 during initial -driver load. -NOTE: The machine must be power cycled (full off/on) when enabling NVM writes -via setting the parameter to zero. Once the NVM has been locked (via the -parameter at 1 when the driver loads) it cannot be unlocked except via power -cycle. - -Additional Configurations -========================= - - Jumbo Frames - ------------ - Jumbo Frames support is enabled by changing the MTU to a value larger than - the default of 1500. Use the ifconfig command to increase the MTU size. - For example: - - ifconfig eth<x> mtu 9000 up - - This setting is not saved across reboots. - - Notes: - - - The maximum MTU setting for Jumbo Frames is 9216. This value coincides - with the maximum Jumbo Frames size of 9234 bytes. - - - Using Jumbo frames at 10 or 100 Mbps is not supported and may result in - poor performance or loss of link. - - - Some adapters limit Jumbo Frames sized packets to a maximum of - 4096 bytes and some adapters do not support Jumbo Frames. - - - Jumbo Frames cannot be configured on an 82579-based Network device, if - MACSec is enabled on the system. - - ethtool - ------- - The driver utilizes the ethtool interface for driver configuration and - diagnostics, as well as displaying statistical information. We - strongly recommend downloading the latest version of ethtool at: - - https://kernel.org/pub/software/network/ethtool/ - - NOTE: When validating enable/disable tests on some parts (82578, for example) - you need to add a few seconds between tests when working with ethtool. - - Speed and Duplex - ---------------- - Speed and Duplex are configured through the ethtool* utility. For - instructions, refer to the ethtool man page. - - Enabling Wake on LAN* (WoL) - --------------------------- - WoL is configured through the ethtool* utility. For instructions on - enabling WoL with ethtool, refer to the ethtool man page. - - WoL will be enabled on the system during the next shut down or reboot. - For this driver version, in order to enable WoL, the e1000e driver must be - loaded when shutting down or rebooting the system. - - In most cases Wake On LAN is only supported on port A for multiple port - adapters. To verify if a port supports Wake on Lan run ethtool eth<X>. - -Support -======= - -For general information, go to the Intel support website at: - - www.intel.com/support/ - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://sourceforge.net/projects/e1000 - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index e6b4ebb2b243..2196b824e96c 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt @@ -203,11 +203,11 @@ opcodes as defined in linux/filter.h stand for: Instruction Addressing mode Description - ld 1, 2, 3, 4, 10 Load word into A + ld 1, 2, 3, 4, 12 Load word into A ldi 4 Load word into A ldh 1, 2 Load half-word into A ldb 1, 2 Load byte into A - ldx 3, 4, 5, 10 Load word into X + ldx 3, 4, 5, 12 Load word into X ldxi 4 Load word into X ldxb 5 Load byte into X @@ -216,14 +216,14 @@ opcodes as defined in linux/filter.h stand for: jmp 6 Jump to label ja 6 Jump to label - jeq 7, 8 Jump on A == k - jneq 8 Jump on A != k - jne 8 Jump on A != k - jlt 8 Jump on A < k - jle 8 Jump on A <= k - jgt 7, 8 Jump on A > k - jge 7, 8 Jump on A >= k - jset 7, 8 Jump on A & k + jeq 7, 8, 9, 10 Jump on A == <x> + jneq 9, 10 Jump on A != <x> + jne 9, 10 Jump on A != <x> + jlt 9, 10 Jump on A < <x> + jle 9, 10 Jump on A <= <x> + jgt 7, 8, 9, 10 Jump on A > <x> + jge 7, 8, 9, 10 Jump on A >= <x> + jset 7, 8, 9, 10 Jump on A & <x> add 0, 4 A + <x> sub 0, 4 A - <x> @@ -240,7 +240,7 @@ opcodes as defined in linux/filter.h stand for: tax Copy A into X txa Copy X into A - ret 4, 9 Return + ret 4, 11 Return The next table shows addressing formats from the 2nd column: @@ -254,9 +254,11 @@ The next table shows addressing formats from the 2nd column: 5 4*([k]&0xf) Lower nibble * 4 at byte offset k in the packet 6 L Jump label L 7 #k,Lt,Lf Jump to Lt if true, otherwise jump to Lf - 8 #k,Lt Jump to Lt if predicate is true - 9 a/%a Accumulator A - 10 extension BPF extension + 8 x/%x,Lt,Lf Jump to Lt if true, otherwise jump to Lf + 9 #k,Lt Jump to Lt if predicate is true + 10 x/%x,Lt Jump to Lt if predicate is true + 11 a/%a Accumulator A + 12 extension BPF extension The Linux kernel also has a couple of BPF extensions that are used along with the class of load instructions by "overloading" the k argument with @@ -1125,6 +1127,14 @@ pointer type. The types of pointers describe their base, as follows: PTR_TO_STACK Frame pointer. PTR_TO_PACKET skb->data. PTR_TO_PACKET_END skb->data + headlen; arithmetic forbidden. + PTR_TO_SOCKET Pointer to struct bpf_sock_ops, implicitly refcounted. + PTR_TO_SOCKET_OR_NULL + Either a pointer to a socket, or NULL; socket lookup + returns this type, which becomes a PTR_TO_SOCKET when + checked != NULL. PTR_TO_SOCKET is reference-counted, + so programs must release the reference through the + socket release function before the end of the program. + Arithmetic on these pointers is forbidden. However, a pointer may be offset from this base (as a result of pointer arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable offset'. The former is used when an exactly-known value (e.g. an immediate @@ -1171,6 +1181,13 @@ over the Ethernet header, then reads IHL and addes (IHL * 4), the resulting pointer will have a variable offset known to be 4n+2 for some n, so adding the 2 bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through that pointer are safe. +The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common +to all copies of the pointer returned from a socket lookup. This has similar +behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but +it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly +represents a reference to the corresponding 'struct sock'. To ensure that the +reference is not leaked, it is imperative to NULL-check the reference and in +the non-NULL case, and pass the valid reference to the socket release function. Direct packet access -------------------- @@ -1444,6 +1461,55 @@ Error: 8: (7a) *(u64 *)(r0 +0) = 1 R0 invalid mem access 'imm' +Program that performs a socket lookup then sets the pointer to NULL without +checking it: +value: + BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_MOV64_IMM(BPF_REG_3, 4), + BPF_MOV64_IMM(BPF_REG_4, 0), + BPF_MOV64_IMM(BPF_REG_5, 0), + BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), +Error: + 0: (b7) r2 = 0 + 1: (63) *(u32 *)(r10 -8) = r2 + 2: (bf) r2 = r10 + 3: (07) r2 += -8 + 4: (b7) r3 = 4 + 5: (b7) r4 = 0 + 6: (b7) r5 = 0 + 7: (85) call bpf_sk_lookup_tcp#65 + 8: (b7) r0 = 0 + 9: (95) exit + Unreleased reference id=1, alloc_insn=7 + +Program that performs a socket lookup but does not NULL-check the returned +value: + BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_MOV64_IMM(BPF_REG_3, 4), + BPF_MOV64_IMM(BPF_REG_4, 0), + BPF_MOV64_IMM(BPF_REG_5, 0), + BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), + BPF_EXIT_INSN(), +Error: + 0: (b7) r2 = 0 + 1: (63) *(u32 *)(r10 -8) = r2 + 2: (bf) r2 = r10 + 3: (07) r2 += -8 + 4: (b7) r3 = 4 + 5: (b7) r4 = 0 + 6: (b7) r5 = 0 + 7: (85) call bpf_sk_lookup_tcp#65 + 8: (95) exit + Unreleased reference id=1, alloc_insn=7 + Testing ------- diff --git a/Documentation/networking/fm10k.rst b/Documentation/networking/fm10k.rst new file mode 100644 index 000000000000..bf5e5942f28d --- /dev/null +++ b/Documentation/networking/fm10k.rst @@ -0,0 +1,141 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Driver for Intel(R) Ethernet Multi-host Controller +============================================================== + +August 20, 2018 +Copyright(c) 2015-2018 Intel Corporation. + +Contents +======== +- Identifying Your Adapter +- Additional Configurations +- Performance Tuning +- Known Issues +- Support + +Identifying Your Adapter +======================== +The driver in this release is compatible with devices based on the Intel(R) +Ethernet Multi-host Controller. + +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +http://www.intel.com/support + + +Flow Control +------------ +The Intel(R) Ethernet Switch Host Interface Driver does not support Flow +Control. It will not send pause frames. This may result in dropped frames. + + +Virtual Functions (VFs) +----------------------- +Use sysfs to enable VFs. +Valid Range: 0-64 + +For example:: + + echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs //enable VFs + echo 0 > /sys/class/net/$dev/device/sriov_numvfs //disable VFs + +NOTE: Neither the device nor the driver control how VFs are mapped into config +space. Bus layout will vary by operating system. On operating systems that +support it, you can check sysfs to find the mapping. + +NOTE: When SR-IOV mode is enabled, hardware VLAN filtering and VLAN tag +stripping/insertion will remain enabled. Please remove the old VLAN filter +before the new VLAN filter is added. For example:: + + ip link set eth0 vf 0 vlan 100 // set vlan 100 for VF 0 + ip link set eth0 vf 0 vlan 0 // Delete vlan 100 + ip link set eth0 vf 0 vlan 200 // set a new vlan 200 for VF 0 + + +Additional Features and Configurations +====================================== + +Jumbo Frames +------------ +Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) +to a value larger than the default value of 1500. + +Use the ifconfig command to increase the MTU size. For example, enter the +following where <x> is the interface number:: + + ifconfig eth<x> mtu 9000 up + +Alternatively, you can use the ip command as follows:: + + ip link set mtu 9000 dev eth<x> + ip link set up dev eth<x> + +This setting is not saved across reboots. The setting change can be made +permanent by adding 'MTU=9000' to the file: + +- For RHEL: /etc/sysconfig/network-scripts/ifcfg-eth<x> +- For SLES: /etc/sysconfig/network/<config_file> + +NOTE: The maximum MTU setting for Jumbo Frames is 15342. This value coincides +with the maximum Jumbo Frames size of 15364 bytes. + +NOTE: This driver will attempt to use multiple page sized buffers to receive +each jumbo packet. This should help to avoid buffer starvation issues when +allocating receive packets. + + +Generic Receive Offload, aka GRO +-------------------------------- +The driver supports the in-kernel software implementation of GRO. GRO has +shown that by coalescing Rx traffic into larger chunks of data, CPU +utilization can be significantly reduced when under large Rx load. GRO is an +evolution of the previously-used LRO interface. GRO is able to coalesce +other protocols besides TCP. It's also safe to use with configurations that +are problematic for LRO, namely bridging and iSCSI. + + + +Supported ethtool Commands and Options for Filtering +---------------------------------------------------- +-n --show-nfc + Retrieves the receive network flow classification configurations. + +rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 + Retrieves the hash options for the specified network traffic type. + +-N --config-nfc + Configures the receive network flow classification. + +rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 m|v|t|s|d|f|n|r + Configures the hash options for the specified network traffic type. + +- udp4: UDP over IPv4 +- udp6: UDP over IPv6 +- f Hash on bytes 0 and 1 of the Layer 4 header of the rx packet. +- n Hash on bytes 2 and 3 of the Layer 4 header of the rx packet. + + +Known Issues/Troubleshooting +============================ + +Enabling SR-IOV in a 64-bit Microsoft* Windows Server* 2012/R2 guest OS under Linux KVM +--------------------------------------------------------------------------------------- +KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM. This +includes traditional PCIe devices, as well as SR-IOV-capable devices based on +the Intel Ethernet Controller XL710. + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/i40e.rst b/Documentation/networking/i40e.rst new file mode 100644 index 000000000000..0cc16c525d10 --- /dev/null +++ b/Documentation/networking/i40e.rst @@ -0,0 +1,770 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Driver for the Intel(R) Ethernet Controller 700 Series +================================================================== + +Intel 40 Gigabit Linux driver. +Copyright(c) 1999-2018 Intel Corporation. + +Contents +======== + +- Overview +- Identifying Your Adapter +- Intel(R) Ethernet Flow Director +- Additional Configurations +- Known Issues +- Support + + +Driver information can be obtained using ethtool, lspci, and ifconfig. +Instructions on updating ethtool can be found in the section Additional +Configurations later in this document. + +For questions related to hardware requirements, refer to the documentation +supplied with your Intel adapter. All hardware requirements listed apply to use +with Linux. + + +Identifying Your Adapter +======================== +The driver is compatible with devices based on the following: + + * Intel(R) Ethernet Controller X710 + * Intel(R) Ethernet Controller XL710 + * Intel(R) Ethernet Network Connection X722 + * Intel(R) Ethernet Controller XXV710 + +For the best performance, make sure the latest NVM/FW is installed on your +device. + +For information on how to identify your adapter, and for the latest NVM/FW +images and Intel network drivers, refer to the Intel Support website: +https://www.intel.com/support + +SFP+ and QSFP+ Devices +---------------------- +For information about supported media, refer to this document: +https://www.intel.com/content/dam/www/public/us/en/documents/release-notes/xl710-ethernet-controller-feature-matrix.pdf + +NOTE: Some adapters based on the Intel(R) Ethernet Controller 700 Series only +support Intel Ethernet Optics modules. On these adapters, other modules are not +supported and will not function. In all cases Intel recommends using Intel +Ethernet Optics; other modules may function but are not validated by Intel. +Contact Intel for supported media types. + +NOTE: For connections based on Intel(R) Ethernet Controller 700 Series, support +is dependent on your system board. Please see your vendor for details. + +NOTE: In systems that do not have adequate airflow to cool the adapter and +optical modules, you must use high temperature optical modules. + +Virtual Functions (VFs) +----------------------- +Use sysfs to enable VFs. For example:: + + #echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs #enable VFs + #echo 0 > /sys/class/net/$dev/device/sriov_numvfs #disable VFs + +For example, the following instructions will configure PF eth0 and the first VF +on VLAN 10:: + + $ ip link set dev eth0 vf 0 vlan 10 + +VLAN Tag Packet Steering +------------------------ +Allows you to send all packets with a specific VLAN tag to a particular SR-IOV +virtual function (VF). Further, this feature allows you to designate a +particular VF as trusted, and allows that trusted VF to request selective +promiscuous mode on the Physical Function (PF). + +To set a VF as trusted or untrusted, enter the following command in the +Hypervisor:: + + # ip link set dev eth0 vf 1 trust [on|off] + +Once the VF is designated as trusted, use the following commands in the VM to +set the VF to promiscuous mode. + +:: + + For promiscuous all: + #ip link set eth2 promisc on + Where eth2 is a VF interface in the VM + + For promiscuous Multicast: + #ip link set eth2 allmulticast on + Where eth2 is a VF interface in the VM + +NOTE: By default, the ethtool priv-flag vf-true-promisc-support is set to +"off",meaning that promiscuous mode for the VF will be limited. To set the +promiscuous mode for the VF to true promiscuous and allow the VF to see all +ingress traffic, use the following command:: + + #ethtool -set-priv-flags p261p1 vf-true-promisc-support on + +The vf-true-promisc-support priv-flag does not enable promiscuous mode; rather, +it designates which type of promiscuous mode (limited or true) you will get +when you enable promiscuous mode using the ip link commands above. Note that +this is a global setting that affects the entire device. However,the +vf-true-promisc-support priv-flag is only exposed to the first PF of the +device. The PF remains in limited promiscuous mode (unless it is in MFP mode) +regardless of the vf-true-promisc-support setting. + +Now add a VLAN interface on the VF interface:: + + #ip link add link eth2 name eth2.100 type vlan id 100 + +Note that the order in which you set the VF to promiscuous mode and add the +VLAN interface does not matter (you can do either first). The end result in +this example is that the VF will get all traffic that is tagged with VLAN 100. + +Intel(R) Ethernet Flow Director +------------------------------- +The Intel Ethernet Flow Director performs the following tasks: + +- Directs receive packets according to their flows to different queues. +- Enables tight control on routing a flow in the platform. +- Matches flows and CPU cores for flow affinity. +- Supports multiple parameters for flexible flow classification and load + balancing (in SFP mode only). + +NOTE: The Linux i40e driver supports the following flow types: IPv4, TCPv4, and +UDPv4. For a given flow type, it supports valid combinations of IP addresses +(source or destination) and UDP/TCP ports (source and destination). For +example, you can supply only a source IP address, a source IP address and a +destination port, or any combination of one or more of these four parameters. + +NOTE: The Linux i40e driver allows you to filter traffic based on a +user-defined flexible two-byte pattern and offset by using the ethtool user-def +and mask fields. Only L3 and L4 flow types are supported for user-defined +flexible filters. For a given flow type, you must clear all Intel Ethernet Flow +Director filters before changing the input set (for that flow type). + +To enable or disable the Intel Ethernet Flow Director:: + + # ethtool -K ethX ntuple <on|off> + +When disabling ntuple filters, all the user programmed filters are flushed from +the driver cache and hardware. All needed filters must be re-added when ntuple +is re-enabled. + +To add a filter that directs packet to queue 2, use -U or -N switch:: + + # ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \ + 192.168.10.2 src-port 2000 dst-port 2001 action 2 [loc 1] + +To set a filter using only the source and destination IP address:: + + # ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \ + 192.168.10.2 action 2 [loc 1] + +To see the list of filters currently present:: + + # ethtool <-u|-n> ethX + +Application Targeted Routing (ATR) Perfect Filters +-------------------------------------------------- +ATR is enabled by default when the kernel is in multiple transmit queue mode. +An ATR Intel Ethernet Flow Director filter rule is added when a TCP-IP flow +starts and is deleted when the flow ends. When a TCP-IP Intel Ethernet Flow +Director rule is added from ethtool (Sideband filter), ATR is turned off by the +driver. To re-enable ATR, the sideband can be disabled with the ethtool -K +option. For example:: + + ethtool –K [adapter] ntuple [off|on] + +If sideband is re-enabled after ATR is re-enabled, ATR remains enabled until a +TCP-IP flow is added. When all TCP-IP sideband rules are deleted, ATR is +automatically re-enabled. + +Packets that match the ATR rules are counted in fdir_atr_match stats in +ethtool, which also can be used to verify whether ATR rules still exist. + +Sideband Perfect Filters +------------------------ +Sideband Perfect Filters are used to direct traffic that matches specified +characteristics. They are enabled through ethtool's ntuple interface. To add a +new filter use the following command:: + + ethtool -U <device> flow-type <type> src-ip <ip> dst-ip <ip> src-port <port> \ + dst-port <port> action <queue> + +Where: + <device> - the ethernet device to program + <type> - can be ip4, tcp4, udp4, or sctp4 + <ip> - the ip address to match on + <port> - the port number to match on + <queue> - the queue to direct traffic towards (-1 discards matching traffic) + +Use the following command to display all of the active filters:: + + ethtool -u <device> + +Use the following command to delete a filter:: + + ethtool -U <device> delete <N> + +Where <N> is the filter id displayed when printing all the active filters, and +may also have been specified using "loc <N>" when adding the filter. + +The following example matches TCP traffic sent from 192.168.0.1, port 5300, +directed to 192.168.0.5, port 80, and sends it to queue 7:: + + ethtool -U enp130s0 flow-type tcp4 src-ip 192.168.0.1 dst-ip 192.168.0.5 \ + src-port 5300 dst-port 80 action 7 + +For each flow-type, the programmed filters must all have the same matching +input set. For example, issuing the following two commands is acceptable:: + + ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7 + ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.5 src-port 55 action 10 + +Issuing the next two commands, however, is not acceptable, since the first +specifies src-ip and the second specifies dst-ip:: + + ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7 + ethtool -U enp130s0 flow-type ip4 dst-ip 192.168.0.5 src-port 55 action 10 + +The second command will fail with an error. You may program multiple filters +with the same fields, using different values, but, on one device, you may not +program two tcp4 filters with different matching fields. + +Matching on a sub-portion of a field is not supported by the i40e driver, thus +partial mask fields are not supported. + +The driver also supports matching user-defined data within the packet payload. +This flexible data is specified using the "user-def" field of the ethtool +command in the following way: + ++----------------------------+--------------------------+ +| 31 28 24 20 16 | 15 12 8 4 0 | ++----------------------------+--------------------------+ +| offset into packet payload | 2 bytes of flexible data | ++----------------------------+--------------------------+ + +For example, + +:: + + ... user-def 0x4FFFF ... + +tells the filter to look 4 bytes into the payload and match that value against +0xFFFF. The offset is based on the beginning of the payload, and not the +beginning of the packet. Thus + +:: + + flow-type tcp4 ... user-def 0x8BEAF ... + +would match TCP/IPv4 packets which have the value 0xBEAF 8 bytes into the +TCP/IPv4 payload. + +Note that ICMP headers are parsed as 4 bytes of header and 4 bytes of payload. +Thus to match the first byte of the payload, you must actually add 4 bytes to +the offset. Also note that ip4 filters match both ICMP frames as well as raw +(unknown) ip4 frames, where the payload will be the L3 payload of the IP4 frame. + +The maximum offset is 64. The hardware will only read up to 64 bytes of data +from the payload. The offset must be even because the flexible data is 2 bytes +long and must be aligned to byte 0 of the packet payload. + +The user-defined flexible offset is also considered part of the input set and +cannot be programmed separately for multiple filters of the same type. However, +the flexible data is not part of the input set and multiple filters may use the +same offset but match against different data. + +To create filters that direct traffic to a specific Virtual Function, use the +"action" parameter. Specify the action as a 64 bit value, where the lower 32 +bits represents the queue number, while the next 8 bits represent which VF. +Note that 0 is the PF, so the VF identifier is offset by 1. For example:: + + ... action 0x800000002 ... + +specifies to direct traffic to Virtual Function 7 (8 minus 1) into queue 2 of +that VF. + +Note that these filters will not break internal routing rules, and will not +route traffic that otherwise would not have been sent to the specified Virtual +Function. + +Setting the link-down-on-close Private Flag +------------------------------------------- +When the link-down-on-close private flag is set to "on", the port's link will +go down when the interface is brought down using the ifconfig ethX down command. + +Use ethtool to view and set link-down-on-close, as follows:: + + ethtool --show-priv-flags ethX + ethtool --set-priv-flags ethX link-down-on-close [on|off] + +Viewing Link Messages +--------------------- +Link messages will not be displayed to the console if the distribution is +restricting system messages. In order to see network driver link messages on +your console, set dmesg to eight by entering the following:: + + dmesg -n 8 + +NOTE: This setting is not saved across reboots. + +Jumbo Frames +------------ +Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) +to a value larger than the default value of 1500. + +Use the ifconfig command to increase the MTU size. For example, enter the +following where <x> is the interface number:: + + ifconfig eth<x> mtu 9000 up + +Alternatively, you can use the ip command as follows:: + + ip link set mtu 9000 dev eth<x> + ip link set up dev eth<x> + +This setting is not saved across reboots. The setting change can be made +permanent by adding 'MTU=9000' to the file:: + + /etc/sysconfig/network-scripts/ifcfg-eth<x> // for RHEL + /etc/sysconfig/network/<config_file> // for SLES + +NOTE: The maximum MTU setting for Jumbo Frames is 9702. This value coincides +with the maximum Jumbo Frames size of 9728 bytes. + +NOTE: This driver will attempt to use multiple page sized buffers to receive +each jumbo packet. This should help to avoid buffer starvation issues when +allocating receive packets. + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: +https://www.kernel.org/pub/software/network/ethtool/ + +Supported ethtool Commands and Options for Filtering +---------------------------------------------------- +-n --show-nfc + Retrieves the receive network flow classification configurations. + +rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 + Retrieves the hash options for the specified network traffic type. + +-N --config-nfc + Configures the receive network flow classification. + +rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 m|v|t|s|d|f|n|r... + Configures the hash options for the specified network traffic type. + +udp4 UDP over IPv4 +udp6 UDP over IPv6 + +f Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet. +n Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet. + +Speed and Duplex Configuration +------------------------------ +In addressing speed and duplex configuration issues, you need to distinguish +between copper-based adapters and fiber-based adapters. + +In the default mode, an Intel(R) Ethernet Network Adapter using copper +connections will attempt to auto-negotiate with its link partner to determine +the best setting. If the adapter cannot establish link with the link partner +using auto-negotiation, you may need to manually configure the adapter and link +partner to identical settings to establish link and pass packets. This should +only be needed when attempting to link with an older switch that does not +support auto-negotiation or one that has been forced to a specific speed or +duplex mode. Your link partner must match the setting you choose. 1 Gbps speeds +and higher cannot be forced. Use the autonegotiation advertising setting to +manually set devices for 1 Gbps and higher. + +NOTE: You cannot set the speed for devices based on the Intel(R) Ethernet +Network Adapter XXV710 based devices. + +Speed, duplex, and autonegotiation advertising are configured through the +ethtool* utility. + +Caution: Only experienced network administrators should force speed and duplex +or change autonegotiation advertising manually. The settings at the switch must +always match the adapter settings. Adapter performance may suffer or your +adapter may not operate if you configure the adapter differently from your +switch. + +An Intel(R) Ethernet Network Adapter using fiber-based connections, however, +will not attempt to auto-negotiate with its link partner since those adapters +operate only in full duplex and only at their native speed. + +NAPI +---- +NAPI (Rx polling mode) is supported in the i40e driver. +For more information on NAPI, see +https://wiki.linuxfoundation.org/networking/napi + +Flow Control +------------ +Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable +receiving and transmitting pause frames for i40e. When transmit is enabled, +pause frames are generated when the receive packet buffer crosses a predefined +threshold. When receive is enabled, the transmit unit will halt for the time +delay specified when a pause frame is received. + +NOTE: You must have a flow control capable link partner. + +Flow Control is on by default. + +Use ethtool to change the flow control settings. + +To enable or disable Rx or Tx Flow Control:: + + ethtool -A eth? rx <on|off> tx <on|off> + +Note: This command only enables or disables Flow Control if auto-negotiation is +disabled. If auto-negotiation is enabled, this command changes the parameters +used for auto-negotiation with the link partner. + +To enable or disable auto-negotiation:: + + ethtool -s eth? autoneg <on|off> + +Note: Flow Control auto-negotiation is part of link auto-negotiation. Depending +on your device, you may not be able to change the auto-negotiation setting. + +RSS Hash Flow +------------- +Allows you to set the hash bytes per flow type and any combination of one or +more options for Receive Side Scaling (RSS) hash byte configuration. + +:: + + # ethtool -N <dev> rx-flow-hash <type> <option> + +Where <type> is: + tcp4 signifying TCP over IPv4 + udp4 signifying UDP over IPv4 + tcp6 signifying TCP over IPv6 + udp6 signifying UDP over IPv6 +And <option> is one or more of: + s Hash on the IP source address of the Rx packet. + d Hash on the IP destination address of the Rx packet. + f Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet. + n Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet. + +MAC and VLAN anti-spoofing feature +---------------------------------- +When a malicious driver attempts to send a spoofed packet, it is dropped by the +hardware and not transmitted. +NOTE: This feature can be disabled for a specific Virtual Function (VF):: + + ip link set <pf dev> vf <vf id> spoofchk {off|on} + +IEEE 1588 Precision Time Protocol (PTP) Hardware Clock (PHC) +------------------------------------------------------------ +Precision Time Protocol (PTP) is used to synchronize clocks in a computer +network. PTP support varies among Intel devices that support this driver. Use +"ethtool -T <netdev name>" to get a definitive list of PTP capabilities +supported by the device. + +IEEE 802.1ad (QinQ) Support +--------------------------- +The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN +IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as +"tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks +allow L2 tunneling and the ability to segregate traffic within a particular +VLAN ID, among other uses. + +The following are examples of how to configure 802.1ad (QinQ):: + + ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24 + ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371 + +Where "24" and "371" are example VLAN IDs. + +NOTES: + Receive checksum offloads, cloud filters, and VLAN acceleration are not + supported for 802.1ad (QinQ) packets. + +VXLAN and GENEVE Overlay HW Offloading +-------------------------------------- +Virtual Extensible LAN (VXLAN) allows you to extend an L2 network over an L3 +network, which may be useful in a virtualized or cloud environment. Some +Intel(R) Ethernet Network devices perform VXLAN processing, offloading it from +the operating system. This reduces CPU utilization. + +VXLAN offloading is controlled by the Tx and Rx checksum offload options +provided by ethtool. That is, if Tx checksum offload is enabled, and the +adapter has the capability, VXLAN offloading is also enabled. + +Support for VXLAN and GENEVE HW offloading is dependent on kernel support of +the HW offloading features. + +Multiple Functions per Port +--------------------------- +Some adapters based on the Intel Ethernet Controller X710/XL710 support +multiple functions on a single physical port. Configure these functions through +the System Setup/BIOS. + +Minimum TX Bandwidth is the guaranteed minimum data transmission bandwidth, as +a percentage of the full physical port link speed, that the partition will +receive. The bandwidth the partition is awarded will never fall below the level +you specify. + +The range for the minimum bandwidth values is: +1 to ((100 minus # of partitions on the physical port) plus 1) +For example, if a physical port has 4 partitions, the range would be: +1 to ((100 - 4) + 1 = 97) + +The Maximum Bandwidth percentage represents the maximum transmit bandwidth +allocated to the partition as a percentage of the full physical port link +speed. The accepted range of values is 1-100. The value is used as a limiter, +should you chose that any one particular function not be able to consume 100% +of a port's bandwidth (should it be available). The sum of all the values for +Maximum Bandwidth is not restricted, because no more than 100% of a port's +bandwidth can ever be used. + +NOTE: X710/XXV710 devices fail to enable Max VFs (64) when Multiple Functions +per Port (MFP) and SR-IOV are enabled. An error from i40e is logged that says +"add vsi failed for VF N, aq_err 16". To workaround the issue, enable less than +64 virtual functions (VFs). + +Data Center Bridging (DCB) +-------------------------- +DCB is a configuration Quality of Service implementation in hardware. It uses +the VLAN priority tag (802.1p) to filter traffic. That means that there are 8 +different priorities that traffic can be filtered into. It also enables +priority flow control (802.1Qbb) which can limit or eliminate the number of +dropped packets during network stress. Bandwidth can be allocated to each of +these priorities, which is enforced at the hardware level (802.1Qaz). + +Adapter firmware implements LLDP and DCBX protocol agents as per 802.1AB and +802.1Qaz respectively. The firmware based DCBX agent runs in willing mode only +and can accept settings from a DCBX capable peer. Software configuration of +DCBX parameters via dcbtool/lldptool are not supported. + +NOTE: Firmware LLDP can be disabled by setting the private flag disable-fw-lldp. + +The i40e driver implements the DCB netlink interface layer to allow user-space +to communicate with the driver and query DCB configuration for the port. + +NOTE: +The kernel assumes that TC0 is available, and will disable Priority Flow +Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is +enabled when setting up DCB on your switch. + +Interrupt Rate Limiting +----------------------- +:Valid Range: 0-235 (0=no limit) + +The Intel(R) Ethernet Controller XL710 family supports an interrupt rate +limiting mechanism. The user can control, via ethtool, the number of +microseconds between interrupts. + +Syntax:: + + # ethtool -C ethX rx-usecs-high N + +The range of 0-235 microseconds provides an effective range of 4,310 to 250,000 +interrupts per second. The value of rx-usecs-high can be set independently of +rx-usecs and tx-usecs in the same ethtool command, and is also independent of +the adaptive interrupt moderation algorithm. The underlying hardware supports +granularity in 4-microsecond intervals, so adjacent values may result in the +same interrupt rate. + +One possible use case is the following:: + + # ethtool -C ethX adaptive-rx off adaptive-tx off rx-usecs-high 20 rx-usecs \ + 5 tx-usecs 5 + +The above command would disable adaptive interrupt moderation, and allow a +maximum of 5 microseconds before indicating a receive or transmit was complete. +However, instead of resulting in as many as 200,000 interrupts per second, it +limits total interrupts per second to 50,000 via the rx-usecs-high parameter. + +Performance Optimization +======================== +Driver defaults are meant to fit a wide variety of workloads, but if further +optimization is required we recommend experimenting with the following settings. + +NOTE: For better performance when processing small (64B) frame sizes, try +enabling Hyper threading in the BIOS in order to increase the number of logical +cores in the system and subsequently increase the number of queues available to +the adapter. + +Virtualized Environments +------------------------ +1. Disable XPS on both ends by using the included virt_perf_default script +or by running the following command as root:: + + for file in `ls /sys/class/net/<ethX>/queues/tx-*/xps_cpus`; + do echo 0 > $file; done + +2. Using the appropriate mechanism (vcpupin) in the vm, pin the cpu's to +individual lcpu's, making sure to use a set of cpu's included in the +device's local_cpulist: /sys/class/net/<ethX>/device/local_cpulist. + +3. Configure as many Rx/Tx queues in the VM as available. Do not rely on +the default setting of 1. + + +Non-virtualized Environments +---------------------------- +Pin the adapter's IRQs to specific cores by disabling the irqbalance service +and using the included set_irq_affinity script. Please see the script's help +text for further options. + +- The following settings will distribute the IRQs across all the cores evenly:: + + # scripts/set_irq_affinity -x all <interface1> , [ <interface2>, ... ] + +- The following settings will distribute the IRQs across all the cores that are + local to the adapter (same NUMA node):: + + # scripts/set_irq_affinity -x local <interface1> ,[ <interface2>, ... ] + +For very CPU intensive workloads, we recommend pinning the IRQs to all cores. + +For IP Forwarding: Disable Adaptive ITR and lower Rx and Tx interrupts per +queue using ethtool. + +- Setting rx-usecs and tx-usecs to 125 will limit interrupts to about 8000 + interrupts per second per queue. + +:: + + # ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 125 \ + tx-usecs 125 + +For lower CPU utilization: Disable Adaptive ITR and lower Rx and Tx interrupts +per queue using ethtool. + +- Setting rx-usecs and tx-usecs to 250 will limit interrupts to about 4000 + interrupts per second per queue. + +:: + + # ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 250 \ + tx-usecs 250 + +For lower latency: Disable Adaptive ITR and ITR by setting Rx and Tx to 0 using +ethtool. + +:: + + # ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 0 \ + tx-usecs 0 + +Application Device Queues (ADq) +------------------------------- +Application Device Queues (ADq) allows you to dedicate one or more queues to a +specific application. This can reduce latency for the specified application, +and allow Tx traffic to be rate limited per application. Follow the steps below +to set ADq. + +1. Create traffic classes (TCs). Maximum of 8 TCs can be created per interface. +The shaper bw_rlimit parameter is optional. + +Example: Sets up two tcs, tc0 and tc1, with 16 queues each and max tx rate set +to 1Gbit for tc0 and 3Gbit for tc1. + +:: + + # tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1 + queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit + max_rate 1Gbit 3Gbit + +map: priority mapping for up to 16 priorities to tcs (e.g. map 0 0 0 0 1 1 1 1 +sets priorities 0-3 to use tc0 and 4-7 to use tc1) + +queues: for each tc, <num queues>@<offset> (e.g. queues 16@0 16@16 assigns +16 queues to tc0 at offset 0 and 16 queues to tc1 at offset 16. Max total +number of queues for all tcs is 64 or number of cores, whichever is lower.) + +hw 1 mode channel: ‘channel’ with ‘hw’ set to 1 is a new new hardware +offload mode in mqprio that makes full use of the mqprio options, the +TCs, the queue configurations, and the QoS parameters. + +shaper bw_rlimit: for each tc, sets minimum and maximum bandwidth rates. +Totals must be equal or less than port speed. + +For example: min_rate 1Gbit 3Gbit: Verify bandwidth limit using network +monitoring tools such as ifstat or sar –n DEV [interval] [number of samples] + +2. Enable HW TC offload on interface:: + + # ethtool -K <interface> hw-tc-offload on + +3. Apply TCs to ingress (RX) flow of interface:: + + # tc qdisc add dev <interface> ingress + +NOTES: + - Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory. + - ADq is not compatible with cloud filters. + - Setting up channels via ethtool (ethtool -L) is not supported when the + TCs are configured using mqprio. + - You must have iproute2 latest version + - NVM version 6.01 or later is required. + - ADq cannot be enabled when any the following features are enabled: Data + Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband + Filters. + - If another driver (for example, DPDK) has set cloud filters, you cannot + enable ADq. + - Tunnel filters are not supported in ADq. If encapsulated packets do + arrive in non-tunnel mode, filtering will be done on the inner headers. + For example, for VXLAN traffic in non-tunnel mode, PCTYPE is identified + as a VXLAN encapsulated packet, outer headers are ignored. Therefore, + inner headers are matched. + - If a TC filter on a PF matches traffic over a VF (on the PF), that + traffic will be routed to the appropriate queue of the PF, and will + not be passed on the VF. Such traffic will end up getting dropped higher + up in the TCP/IP stack as it does not match PF address data. + - If traffic matches multiple TC filters that point to different TCs, + that traffic will be duplicated and sent to all matching TC queues. + The hardware switch mirrors the packet to a VSI list when multiple + filters are matched. + + +Known Issues/Troubleshooting +============================ + +NOTE: 1 Gb devices based on the Intel(R) Ethernet Network Connection X722 do +not support the following features: + + * Data Center Bridging (DCB) + * QOS + * VMQ + * SR-IOV + * Task Encapsulation offload (VXLAN, NVGRE) + * Energy Efficient Ethernet (EEE) + * Auto-media detect + +Unexpected Issues when the device driver and DPDK share a device +---------------------------------------------------------------- +Unexpected issues may result when an i40e device is in multi driver mode and +the kernel driver and DPDK driver are sharing the device. This is because +access to the global NIC resources is not synchronized between multiple +drivers. Any change to the global NIC configuration (writing to a global +register, setting global configuration by AQ, or changing switch modes) will +affect all ports and drivers on the device. Loading DPDK with the +"multi-driver" module parameter may mitigate some of the issues. + +TC0 must be enabled when setting up DCB on a switch +--------------------------------------------------- +The kernel assumes that TC0 is available, and will disable Priority Flow +Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is +enabled when setting up DCB on your switch. + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/i40e.txt b/Documentation/networking/i40e.txt deleted file mode 100644 index c2d6e1824b29..000000000000 --- a/Documentation/networking/i40e.txt +++ /dev/null @@ -1,190 +0,0 @@ -Linux Base Driver for the Intel(R) Ethernet Controller XL710 Family -=================================================================== - -Intel i40e Linux driver. -Copyright(c) 2013 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Additional Configurations -- Performance Tuning -- Known Issues -- Support - - -Identifying Your Adapter -======================== - -The driver in this release is compatible with the Intel Ethernet -Controller XL710 Family. - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/network/sb/CS-012904.htm - - -Enabling the driver -=================== - -The driver is enabled via the standard kernel configuration system, -using the make command: - - make config/oldconfig/menuconfig/etc. - -The driver is located in the menu structure at: - - -> Device Drivers - -> Network device support (NETDEVICES [=y]) - -> Ethernet driver support - -> Intel devices - -> Intel(R) Ethernet Controller XL710 Family - -Additional Configurations -========================= - - Generic Receive Offload (GRO) - ----------------------------- - The driver supports the in-kernel software implementation of GRO. GRO has - shown that by coalescing Rx traffic into larger chunks of data, CPU - utilization can be significantly reduced when under large Rx load. GRO is - an evolution of the previously-used LRO interface. GRO is able to coalesce - other protocols besides TCP. It's also safe to use with configurations that - are problematic for LRO, namely bridging and iSCSI. - - Ethtool - ------- - The driver utilizes the ethtool interface for driver configuration and - diagnostics, as well as displaying statistical information. The latest - ethtool version is required for this functionality. - - The latest release of ethtool can be found from - https://www.kernel.org/pub/software/network/ethtool - - - Flow Director n-ntuple traffic filters (FDir) - --------------------------------------------- - The driver utilizes the ethtool interface for configuring ntuple filters, - via "ethtool -N <device> <filter>". - - The sctp4, ip4, udp4, and tcp4 flow types are supported with the standard - fields including src-ip, dst-ip, src-port and dst-port. The driver only - supports fully enabling or fully masking the fields, so use of the mask - fields for partial matches is not supported. - - Additionally, the driver supports using the action to specify filters for a - Virtual Function. You can specify the action as a 64bit value, where the - lower 32 bits represents the queue number, while the next 8 bits represent - which VF. Note that 0 is the PF, so the VF identifier is offset by 1. For - example: - - ... action 0x800000002 ... - - Would indicate to direct traffic for Virtual Function 7 (8 minus 1) on queue - 2 of that VF. - - The driver also supports using the user-defined field to specify 2 bytes of - arbitrary data to match within the packet payload in addition to the regular - fields. The data is specified in the lower 32bits of the user-def field in - the following way: - - +----------------------------+---------------------------+ - | 31 28 24 20 16 | 15 12 8 4 0| - +----------------------------+---------------------------+ - | offset into packet payload | 2 bytes of flexible data | - +----------------------------+---------------------------+ - - As an example, - - ... user-def 0x4FFFF .... - - means to match the value 0xFFFF 4 bytes into the packet payload. Note that - the offset is based on the beginning of the payload, and not the beginning - of the packet. Thus - - flow-type tcp4 ... user-def 0x8BEAF .... - - would match TCP/IPv4 packets which have the value 0xBEAF 8bytes into the - TCP/IPv4 payload. - - For ICMP, the hardware parses the ICMP header as 4 bytes of header and 4 - bytes of payload, so if you want to match an ICMP frames payload you may need - to add 4 to the offset in order to match the data. - - Furthermore, the offset can only be up to a value of 64, as the hardware - will only read up to 64 bytes of data from the payload. It must also be even - as the flexible data is 2 bytes long and must be aligned to byte 0 of the - packet payload. - - When programming filters, the hardware is limited to using a single input - set for each flow type. This means that it is an error to program two - different filters with the same type that don't match on the same fields. - Thus the second of the following two commands will fail: - - ethtool -N <device> flow-type tcp4 src-ip 192.168.0.7 action 5 - ethtool -N <device> flow-type tcp4 dst-ip 192.168.15.18 action 1 - - This is because the first filter will be accepted and reprogram the input - set for TCPv4 filters, but the second filter will be unable to reprogram the - input set until all the conflicting TCPv4 filters are first removed. - - Note that the user-defined flexible offset is also considered part of the - input set and cannot be programmed separately for multiple filters of the - same type. However, the flexible data is not part of the input set and - multiple filters may use the same offset but match against different data. - - Data Center Bridging (DCB) - -------------------------- - DCB configuration is not currently supported. - - FCoE - ---- - The driver supports Fiber Channel over Ethernet (FCoE) and Data Center - Bridging (DCB) functionality. Configuring DCB and FCoE is outside the scope - of this driver doc. Refer to http://www.open-fcoe.org/ for FCoE project - information and http://www.open-lldp.org/ or email list - e1000-eedc@lists.sourceforge.net for DCB information. - - MAC and VLAN anti-spoofing feature - ---------------------------------- - When a malicious driver attempts to send a spoofed packet, it is dropped by - the hardware and not transmitted. An interrupt is sent to the PF driver - notifying it of the spoof attempt. - - When a spoofed packet is detected the PF driver will send the following - message to the system log (displayed by the "dmesg" command): - - Spoof event(s) detected on VF (n) - - Where n=the VF that attempted to do the spoofing. - - -Performance Tuning -================== - -An excellent article on performance tuning can be found at: - -http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Mark_Wagner.pdf - - -Known Issues -============ - - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://e1000.sourceforge.net - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sourceforge.net and copy -netdev@vger.kernel.org. diff --git a/Documentation/networking/i40evf.txt b/Documentation/networking/i40evf.txt deleted file mode 100644 index e9b3035b95d0..000000000000 --- a/Documentation/networking/i40evf.txt +++ /dev/null @@ -1,54 +0,0 @@ -Linux* Base Driver for Intel(R) Network Connection -================================================== - -Intel Ethernet Adaptive Virtual Function Linux driver. -Copyright(c) 2013-2017 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Known Issues/Troubleshooting -- Support - -This file describes the i40evf Linux* Base Driver. - -The i40evf driver supports the below mentioned virtual function -devices and can only be activated on kernels running the i40e or -newer Physical Function (PF) driver compiled with CONFIG_PCI_IOV. -The i40evf driver requires CONFIG_PCI_MSI to be enabled. - -The guest OS loading the i40evf driver must support MSI-X interrupts. - -Supported Hardware -================== -Intel XL710 X710 Virtual Function -Intel Ethernet Adaptive Virtual Function -Intel X722 Virtual Function - -Identifying Your Adapter -======================== - -For more information on how to identify your adapter, go to the -Adapter & Driver ID Guide at: - - http://support.intel.com/support/go/network/adapter/idguide.htm - -Known Issues/Troubleshooting -============================ - - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://sourceforge.net/projects/e1000 - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/iavf.rst b/Documentation/networking/iavf.rst new file mode 100644 index 000000000000..f8b42b64eb28 --- /dev/null +++ b/Documentation/networking/iavf.rst @@ -0,0 +1,281 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Driver for Intel(R) Ethernet Adaptive Virtual Function +================================================================== + +Intel Ethernet Adaptive Virtual Function Linux driver. +Copyright(c) 2013-2018 Intel Corporation. + +Contents +======== + +- Identifying Your Adapter +- Additional Configurations +- Known Issues/Troubleshooting +- Support + +This file describes the iavf Linux* Base Driver. This driver was formerly +called i40evf. + +The iavf driver supports the below mentioned virtual function devices and +can only be activated on kernels running the i40e or newer Physical Function +(PF) driver compiled with CONFIG_PCI_IOV. The iavf driver requires +CONFIG_PCI_MSI to be enabled. + +The guest OS loading the iavf driver must support MSI-X interrupts. + +Identifying Your Adapter +======================== +The driver in this kernel is compatible with devices based on the following: + * Intel(R) XL710 X710 Virtual Function + * Intel(R) X722 Virtual Function + * Intel(R) XXV710 Virtual Function + * Intel(R) Ethernet Adaptive Virtual Function + +For the best performance, make sure the latest NVM/FW is installed on your +device. + +For information on how to identify your adapter, and for the latest NVM/FW +images and Intel network drivers, refer to the Intel Support website: +http://www.intel.com/support + + +Additional Features and Configurations +====================================== + +Viewing Link Messages +--------------------- +Link messages will not be displayed to the console if the distribution is +restricting system messages. In order to see network driver link messages on +your console, set dmesg to eight by entering the following:: + + dmesg -n 8 + +NOTE: This setting is not saved across reboots. + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: +https://www.kernel.org/pub/software/network/ethtool/ + +Setting VLAN Tag Stripping +-------------------------- +If you have applications that require Virtual Functions (VFs) to receive +packets with VLAN tags, you can disable VLAN tag stripping for the VF. The +Physical Function (PF) processes requests issued from the VF to enable or +disable VLAN tag stripping. Note that if the PF has assigned a VLAN to a VF, +then requests from that VF to set VLAN tag stripping will be ignored. + +To enable/disable VLAN tag stripping for a VF, issue the following command +from inside the VM in which you are running the VF:: + + ethtool -K <if_name> rxvlan on/off + +or alternatively:: + + ethtool --offload <if_name> rxvlan on/off + +Adaptive Virtual Function +------------------------- +Adaptive Virtual Function (AVF) allows the virtual function driver, or VF, to +adapt to changing feature sets of the physical function driver (PF) with which +it is associated. This allows system administrators to update a PF without +having to update all the VFs associated with it. All AVFs have a single common +device ID and branding string. + +AVFs have a minimum set of features known as "base mode," but may provide +additional features depending on what features are available in the PF with +which the AVF is associated. The following are base mode features: + +- 4 Queue Pairs (QP) and associated Configuration Status Registers (CSRs) + for Tx/Rx. +- i40e descriptors and ring format. +- Descriptor write-back completion. +- 1 control queue, with i40e descriptors, CSRs and ring format. +- 5 MSI-X interrupt vectors and corresponding i40e CSRs. +- 1 Interrupt Throttle Rate (ITR) index. +- 1 Virtual Station Interface (VSI) per VF. +- 1 Traffic Class (TC), TC0 +- Receive Side Scaling (RSS) with 64 entry indirection table and key, + configured through the PF. +- 1 unicast MAC address reserved per VF. +- 16 MAC address filters for each VF. +- Stateless offloads - non-tunneled checksums. +- AVF device ID. +- HW mailbox is used for VF to PF communications (including on Windows). + +IEEE 802.1ad (QinQ) Support +--------------------------- +The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN +IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as +"tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks +allow L2 tunneling and the ability to segregate traffic within a particular +VLAN ID, among other uses. + +The following are examples of how to configure 802.1ad (QinQ):: + + ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24 + ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371 + +Where "24" and "371" are example VLAN IDs. + +NOTES: + Receive checksum offloads, cloud filters, and VLAN acceleration are not + supported for 802.1ad (QinQ) packets. + +Application Device Queues (ADq) +------------------------------- +Application Device Queues (ADq) allows you to dedicate one or more queues to a +specific application. This can reduce latency for the specified application, +and allow Tx traffic to be rate limited per application. Follow the steps below +to set ADq. + +1. Create traffic classes (TCs). Maximum of 8 TCs can be created per interface. +The shaper bw_rlimit parameter is optional. + +Example: Sets up two tcs, tc0 and tc1, with 16 queues each and max tx rate set +to 1Gbit for tc0 and 3Gbit for tc1. + +:: + + # tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1 + queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit + max_rate 1Gbit 3Gbit + +map: priority mapping for up to 16 priorities to tcs (e.g. map 0 0 0 0 1 1 1 1 +sets priorities 0-3 to use tc0 and 4-7 to use tc1) + +queues: for each tc, <num queues>@<offset> (e.g. queues 16@0 16@16 assigns +16 queues to tc0 at offset 0 and 16 queues to tc1 at offset 16. Max total +number of queues for all tcs is 64 or number of cores, whichever is lower.) + +hw 1 mode channel: ‘channel’ with ‘hw’ set to 1 is a new new hardware +offload mode in mqprio that makes full use of the mqprio options, the +TCs, the queue configurations, and the QoS parameters. + +shaper bw_rlimit: for each tc, sets minimum and maximum bandwidth rates. +Totals must be equal or less than port speed. + +For example: min_rate 1Gbit 3Gbit: Verify bandwidth limit using network +monitoring tools such as ifstat or sar –n DEV [interval] [number of samples] + +2. Enable HW TC offload on interface:: + + # ethtool -K <interface> hw-tc-offload on + +3. Apply TCs to ingress (RX) flow of interface:: + + # tc qdisc add dev <interface> ingress + +NOTES: + - Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory. + - ADq is not compatible with cloud filters. + - Setting up channels via ethtool (ethtool -L) is not supported when the TCs + are configured using mqprio. + - You must have iproute2 latest version + - NVM version 6.01 or later is required. + - ADq cannot be enabled when any the following features are enabled: Data + Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband Filters. + - If another driver (for example, DPDK) has set cloud filters, you cannot + enable ADq. + - Tunnel filters are not supported in ADq. If encapsulated packets do arrive + in non-tunnel mode, filtering will be done on the inner headers. For example, + for VXLAN traffic in non-tunnel mode, PCTYPE is identified as a VXLAN + encapsulated packet, outer headers are ignored. Therefore, inner headers are + matched. + - If a TC filter on a PF matches traffic over a VF (on the PF), that traffic + will be routed to the appropriate queue of the PF, and will not be passed on + the VF. Such traffic will end up getting dropped higher up in the TCP/IP + stack as it does not match PF address data. + - If traffic matches multiple TC filters that point to different TCs, that + traffic will be duplicated and sent to all matching TC queues. The hardware + switch mirrors the packet to a VSI list when multiple filters are matched. + + +Known Issues/Troubleshooting +============================ + +Traffic Is Not Being Passed Between VM and Client +------------------------------------------------- +You may not be able to pass traffic between a client system and a +Virtual Machine (VM) running on a separate host if the Virtual Function +(VF, or Virtual NIC) is not in trusted mode and spoof checking is enabled +on the VF. Note that this situation can occur in any combination of client, +host, and guest operating system. For information on how to set the VF to +trusted mode, refer to the section "VLAN Tag Packet Steering" in this +readme document. For information on setting spoof checking, refer to the +section "MAC and VLAN anti-spoofing feature" in this readme document. + +Do not unload port driver if VF with active VM is bound to it +------------------------------------------------------------- +Do not unload a port's driver if a Virtual Function (VF) with an active Virtual +Machine (VM) is bound to it. Doing so will cause the port to appear to hang. +Once the VM shuts down, or otherwise releases the VF, the command will complete. + +Virtual machine does not get link +--------------------------------- +If the virtual machine has more than one virtual port assigned to it, and those +virtual ports are bound to different physical ports, you may not get link on +all of the virtual ports. The following command may work around the issue:: + + ethtool -r <PF> + +Where <PF> is the PF interface in the host, for example: p5p1. You may need to +run the command more than once to get link on all virtual ports. + +MAC address of Virtual Function changes unexpectedly +---------------------------------------------------- +If a Virtual Function's MAC address is not assigned in the host, then the VF +(virtual function) driver will use a random MAC address. This random MAC +address may change each time the VF driver is reloaded. You can assign a static +MAC address in the host machine. This static MAC address will survive +a VF driver reload. + +Driver Buffer Overflow Fix +-------------------------- +The fix to resolve CVE-2016-8105, referenced in Intel SA-00069 +https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00069.html +is included in this and future versions of the driver. + +Multiple Interfaces on Same Ethernet Broadcast Network +------------------------------------------------------ +Due to the default ARP behavior on Linux, it is not possible to have one system +on two IP networks in the same Ethernet broadcast domain (non-partitioned +switch) behave as expected. All Ethernet interfaces will respond to IP traffic +for any IP address assigned to the system. This results in unbalanced receive +traffic. + +If you have multiple interfaces in a server, either turn on ARP filtering by +entering:: + + echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter + +NOTE: This setting is not saved across reboots. The configuration change can be +made permanent by adding the following line to the file /etc/sysctl.conf:: + + net.ipv4.conf.all.arp_filter = 1 + +Another alternative is to install the interfaces in separate broadcast domains +(either in different switches or in a switch partitioned to VLANs). + +Rx Page Allocation Errors +------------------------- +'Page allocation failure. order:0' errors may occur under stress. +This is caused by the way the Linux kernel reports this stressed condition. + + +Support +======= +For general information, go to the Intel support website at: + +https://support.intel.com + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on the supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net diff --git a/Documentation/networking/ice.rst b/Documentation/networking/ice.rst new file mode 100644 index 000000000000..1e4948c9e989 --- /dev/null +++ b/Documentation/networking/ice.rst @@ -0,0 +1,45 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Driver for the Intel(R) Ethernet Connection E800 Series +=================================================================== + +Intel ice Linux driver. +Copyright(c) 2018 Intel Corporation. + +Contents +======== + +- Enabling the driver +- Support + +The driver in this release supports Intel's E800 Series of products. For +more information, visit Intel's support page at https://support.intel.com. + +Enabling the driver +=================== +The driver is enabled via the standard kernel configuration system, +using the make command:: + + make oldconfig/silentoldconfig/menuconfig/etc. + +The driver is located in the menu structure at: + + -> Device Drivers + -> Network device support (NETDEVICES [=y]) + -> Ethernet driver support + -> Intel devices + -> Intel(R) Ethernet Connection E800 Series Support + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/ice.txt b/Documentation/networking/ice.txt deleted file mode 100644 index 6261c46378e1..000000000000 --- a/Documentation/networking/ice.txt +++ /dev/null @@ -1,39 +0,0 @@ -Intel(R) Ethernet Connection E800 Series Linux Driver -=================================================================== - -Intel ice Linux driver. -Copyright(c) 2018 Intel Corporation. - -Contents -======== -- Enabling the driver -- Support - -The driver in this release supports Intel's E800 Series of products. For -more information, visit Intel's support page at http://support.intel.com. - -Enabling the driver -=================== - -The driver is enabled via the standard kernel configuration system, -using the make command: - - Make oldconfig/silentoldconfig/menuconfig/etc. - -The driver is located in the menu structure at: - - -> Device Drivers - -> Network device support (NETDEVICES [=y]) - -> Ethernet driver support - -> Intel devices - -> Intel(R) Ethernet Connection E800 Series Support - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -If an issue is identified with the released source code, please email -the maintainer listed in the MAINTAINERS file. diff --git a/Documentation/networking/igb.rst b/Documentation/networking/igb.rst new file mode 100644 index 000000000000..ba16b86d5593 --- /dev/null +++ b/Documentation/networking/igb.rst @@ -0,0 +1,193 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Driver for Intel(R) Ethernet Network Connection +=========================================================== + +Intel Gigabit Linux driver. +Copyright(c) 1999-2018 Intel Corporation. + +Contents +======== + +- Identifying Your Adapter +- Command Line Parameters +- Additional Configurations +- Support + + +Identifying Your Adapter +======================== +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +http://www.intel.com/support + + +Command Line Parameters +======================== +If the driver is built as a module, the following optional parameters are used +by entering them on the command line with the modprobe command using this +syntax:: + + modprobe igb [<option>=<VAL1>,<VAL2>,...] + +There needs to be a <VAL#> for each network port in the system supported by +this driver. The values will be applied to each instance, in function order. +For example:: + + modprobe igb max_vfs=2,4 + +In this case, there are two network ports supported by igb in the system. + +NOTE: A descriptor describes a data buffer and attributes related to the data +buffer. This information is accessed by the hardware. + +max_vfs +------- +:Valid Range: 0-7 + +This parameter adds support for SR-IOV. It causes the driver to spawn up to +max_vfs worth of virtual functions. If the value is greater than 0 it will +also force the VMDq parameter to be 1 or more. + +The parameters for the driver are referenced by position. Thus, if you have a +dual port adapter, or more than one adapter in your system, and want N virtual +functions per port, you must specify a number for each port with each parameter +separated by a comma. For example:: + + modprobe igb max_vfs=4 + +This will spawn 4 VFs on the first port. + +:: + + modprobe igb max_vfs=2,4 + +This will spawn 2 VFs on the first port and 4 VFs on the second port. + +NOTE: Caution must be used in loading the driver with these parameters. +Depending on your system configuration, number of slots, etc., it is impossible +to predict in all cases where the positions would be on the command line. + +NOTE: Neither the device nor the driver control how VFs are mapped into config +space. Bus layout will vary by operating system. On operating systems that +support it, you can check sysfs to find the mapping. + +NOTE: When either SR-IOV mode or VMDq mode is enabled, hardware VLAN filtering +and VLAN tag stripping/insertion will remain enabled. Please remove the old +VLAN filter before the new VLAN filter is added. For example:: + + ip link set eth0 vf 0 vlan 100 // set vlan 100 for VF 0 + ip link set eth0 vf 0 vlan 0 // Delete vlan 100 + ip link set eth0 vf 0 vlan 200 // set a new vlan 200 for VF 0 + +Debug +----- +:Valid Range: 0-16 (0=none,...,16=all) +:Default Value: 0 + +This parameter adjusts the level debug messages displayed in the system logs. + + +Additional Features and Configurations +====================================== + +Jumbo Frames +------------ +Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) +to a value larger than the default value of 1500. + +Use the ifconfig command to increase the MTU size. For example, enter the +following where <x> is the interface number:: + + ifconfig eth<x> mtu 9000 up + +Alternatively, you can use the ip command as follows:: + + ip link set mtu 9000 dev eth<x> + ip link set up dev eth<x> + +This setting is not saved across reboots. The setting change can be made +permanent by adding 'MTU=9000' to the file: + +- For RHEL: /etc/sysconfig/network-scripts/ifcfg-eth<x> +- For SLES: /etc/sysconfig/network/<config_file> + +NOTE: The maximum MTU setting for Jumbo Frames is 9216. This value coincides +with the maximum Jumbo Frames size of 9234 bytes. + +NOTE: Using Jumbo frames at 10 or 100 Mbps is not supported and may result in +poor performance or loss of link. + + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: + +https://www.kernel.org/pub/software/network/ethtool/ + + +Enabling Wake on LAN* (WoL) +--------------------------- +WoL is configured through the ethtool* utility. + +WoL will be enabled on the system during the next shut down or reboot. For +this driver version, in order to enable WoL, the igb driver must be loaded +prior to shutting down or suspending the system. + +NOTE: Wake on LAN is only supported on port A of multi-port devices. Also +Wake On LAN is not supported for the following device: +- Intel(R) Gigabit VT Quad Port Server Adapter + + +Multiqueue +---------- +In this mode, a separate MSI-X vector is allocated for each queue and one for +"other" interrupts such as link status change and errors. All interrupts are +throttled via interrupt moderation. Interrupt moderation must be used to avoid +interrupt storms while the driver is processing one interrupt. The moderation +value should be at least as large as the expected time for the driver to +process an interrupt. Multiqueue is off by default. + +REQUIREMENTS: MSI-X support is required for Multiqueue. If MSI-X is not found, +the system will fallback to MSI or to Legacy interrupts. This driver supports +receive multiqueue on all kernels that support MSI-X. + +NOTE: On some kernels a reboot is required to switch between single queue mode +and multiqueue mode or vice-versa. + + +MAC and VLAN anti-spoofing feature +---------------------------------- +When a malicious driver attempts to send a spoofed packet, it is dropped by the +hardware and not transmitted. + +An interrupt is sent to the PF driver notifying it of the spoof attempt. When a +spoofed packet is detected, the PF driver will send the following message to +the system log (displayed by the "dmesg" command): +Spoof event(s) detected on VF(n), where n = the VF that attempted to do the +spoofing + + +Setting MAC Address, VLAN and Rate Limit Using IProute2 Tool +------------------------------------------------------------ +You can set a MAC address of a Virtual Function (VF), a default VLAN and the +rate limit using the IProute2 tool. Download the latest version of the +IProute2 tool from Sourceforge if your version does not have all the features +you require. + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/igb.txt b/Documentation/networking/igb.txt deleted file mode 100644 index f90643ef39c9..000000000000 --- a/Documentation/networking/igb.txt +++ /dev/null @@ -1,129 +0,0 @@ -Linux* Base Driver for Intel(R) Ethernet Network Connection -=========================================================== - -Intel Gigabit Linux driver. -Copyright(c) 1999 - 2013 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Additional Configurations -- Support - -Identifying Your Adapter -======================== - -This driver supports all 82575, 82576 and 82580-based Intel (R) gigabit network -connections. - -For specific information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/go/network/adapter/idguide.htm - -Command Line Parameters -======================= - -The default value for each parameter is generally the recommended setting, -unless otherwise noted. - -max_vfs -------- -Valid Range: 0-7 -Default Value: 0 - -This parameter adds support for SR-IOV. It causes the driver to spawn up to -max_vfs worth of virtual function. - -Additional Configurations -========================= - - Jumbo Frames - ------------ - Jumbo Frames support is enabled by changing the MTU to a value larger than - the default of 1500. Use the ip command to increase the MTU size. - For example: - - ip link set dev eth<x> mtu 9000 - - This setting is not saved across reboots. - - Notes: - - - The maximum MTU setting for Jumbo Frames is 9216. This value coincides - with the maximum Jumbo Frames size of 9234 bytes. - - - Using Jumbo frames at 10 or 100 Mbps is not supported and may result in - poor performance or loss of link. - - ethtool - ------- - The driver utilizes the ethtool interface for driver configuration and - diagnostics, as well as displaying statistical information. The latest - version of ethtool can be found at: - - https://www.kernel.org/pub/software/network/ethtool/ - - Enabling Wake on LAN* (WoL) - --------------------------- - WoL is configured through the ethtool* utility. - - For instructions on enabling WoL with ethtool, refer to the ethtool man page. - - WoL will be enabled on the system during the next shut down or reboot. - For this driver version, in order to enable WoL, the igb driver must be - loaded when shutting down or rebooting the system. - - Wake On LAN is only supported on port A of multi-port adapters. - - Wake On LAN is not supported for the Intel(R) Gigabit VT Quad Port Server - Adapter. - - Multiqueue - ---------- - In this mode, a separate MSI-X vector is allocated for each queue and one - for "other" interrupts such as link status change and errors. All - interrupts are throttled via interrupt moderation. Interrupt moderation - must be used to avoid interrupt storms while the driver is processing one - interrupt. The moderation value should be at least as large as the expected - time for the driver to process an interrupt. Multiqueue is off by default. - - REQUIREMENTS: MSI-X support is required for Multiqueue. If MSI-X is not - found, the system will fallback to MSI or to Legacy interrupts. - - MAC and VLAN anti-spoofing feature - ---------------------------------- - When a malicious driver attempts to send a spoofed packet, it is dropped by - the hardware and not transmitted. An interrupt is sent to the PF driver - notifying it of the spoof attempt. - - When a spoofed packet is detected the PF driver will send the following - message to the system log (displayed by the "dmesg" command): - - Spoof event(s) detected on VF(n) - - Where n=the VF that attempted to do the spoofing. - - Setting MAC Address, VLAN and Rate Limit Using IProute2 Tool - ------------------------------------------------------------ - You can set a MAC address of a Virtual Function (VF), a default VLAN and the - rate limit using the IProute2 tool. Download the latest version of the - iproute2 tool from Sourceforge if your version does not have all the - features you require. - - -Support -======= - -For general information, go to the Intel support website at: - - www.intel.com/support/ - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://sourceforge.net/projects/e1000 - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/igbvf.rst b/Documentation/networking/igbvf.rst new file mode 100644 index 000000000000..a8a9ffa4f8d3 --- /dev/null +++ b/Documentation/networking/igbvf.rst @@ -0,0 +1,64 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Virtual Function Driver for Intel(R) 1G Ethernet +============================================================ + +Intel Gigabit Virtual Function Linux driver. +Copyright(c) 1999-2018 Intel Corporation. + +Contents +======== +- Identifying Your Adapter +- Additional Configurations +- Support + +This driver supports Intel 82576-based virtual function devices-based virtual +function devices that can only be activated on kernels that support SR-IOV. + +SR-IOV requires the correct platform and OS support. + +The guest OS loading this driver must support MSI-X interrupts. + +For questions related to hardware requirements, refer to the documentation +supplied with your Intel adapter. All hardware requirements listed apply to use +with Linux. + +Driver information can be obtained using ethtool, lspci, and ifconfig. +Instructions on updating ethtool can be found in the section Additional +Configurations later in this document. + +NOTE: There is a limit of a total of 32 shared VLANs to 1 or more VFs. + + +Identifying Your Adapter +======================== +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +http://www.intel.com/support + + +Additional Features and Configurations +====================================== + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: + +https://www.kernel.org/pub/software/network/ethtool/ + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/igbvf.txt b/Documentation/networking/igbvf.txt deleted file mode 100644 index bd404735fb46..000000000000 --- a/Documentation/networking/igbvf.txt +++ /dev/null @@ -1,80 +0,0 @@ -Linux* Base Driver for Intel(R) Ethernet Network Connection -=========================================================== - -Intel Gigabit Linux driver. -Copyright(c) 1999 - 2013 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Additional Configurations -- Support - -This file describes the igbvf Linux* Base Driver for Intel Network Connection. - -The igbvf driver supports 82576-based virtual function devices that can only -be activated on kernels that support SR-IOV. SR-IOV requires the correct -platform and OS support. - -The igbvf driver requires the igb driver, version 2.0 or later. The igbvf -driver supports virtual functions generated by the igb driver with a max_vfs -value of 1 or greater. For more information on the max_vfs parameter refer -to the README included with the igb driver. - -The guest OS loading the igbvf driver must support MSI-X interrupts. - -This driver is only supported as a loadable module at this time. Intel is -not supplying patches against the kernel source to allow for static linking -of the driver. For questions related to hardware requirements, refer to the -documentation supplied with your Intel Gigabit adapter. All hardware -requirements listed apply to use with Linux. - -Instructions on updating ethtool can be found in the section "Additional -Configurations" later in this document. - -VLANs: There is a limit of a total of 32 shared VLANs to 1 or more VFs. - -Identifying Your Adapter -======================== - -The igbvf driver supports 82576-based virtual function devices that can only -be activated on kernels that support SR-IOV. - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/go/network/adapter/idguide.htm - -For the latest Intel network drivers for Linux, refer to the following -website. In the search field, enter your adapter name or type, or use the -networking link on the left to search for your adapter: - - http://downloadcenter.intel.com/scripts-df-external/Support_Intel.aspx - -Additional Configurations -========================= - - ethtool - ------- - The driver utilizes the ethtool interface for driver configuration and - diagnostics, as well as displaying statistical information. The ethtool - version 3.0 or later is required for this functionality, although we - strongly recommend downloading the latest version at: - - https://www.kernel.org/pub/software/network/ethtool/ - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://sourceforge.net/projects/e1000 - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index fcd710f2cc7a..bd89dae8d578 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -14,6 +14,16 @@ Contents: dpaa2/index e100 e1000 + e1000e + fm10k + igb + igbvf + ixgb + ixgbe + ixgbevf + i40e + iavf + ice kapi z8530book msg_zerocopy diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 960de8fe3f40..163b5ff1073c 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1442,6 +1442,14 @@ max_hbh_length - INTEGER header. Default: INT_MAX (unlimited) +skip_notify_on_dev_down - BOOLEAN + Controls whether an RTM_DELROUTE message is generated for routes + removed when a device is taken down or deleted. IPv4 does not + generate this message; IPv6 does by default. Setting this sysctl + to true skips the message, making IPv4 and IPv6 on par in relying + on userspace caches to track link events and evict routes. + Default: false (generate message) + IPv6 Fragmentation: ip6frag_high_thresh - INTEGER diff --git a/Documentation/networking/ixgb.rst b/Documentation/networking/ixgb.rst new file mode 100644 index 000000000000..8bd80e27843d --- /dev/null +++ b/Documentation/networking/ixgb.rst @@ -0,0 +1,467 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux Base Driver for 10 Gigabit Intel(R) Ethernet Network Connection +===================================================================== + +October 1, 2018 + + +Contents +======== + +- In This Release +- Identifying Your Adapter +- Command Line Parameters +- Improving Performance +- Additional Configurations +- Known Issues/Troubleshooting +- Support + + + +In This Release +=============== + +This file describes the ixgb Linux Base Driver for the 10 Gigabit Intel(R) +Network Connection. This driver includes support for Itanium(R)2-based +systems. + +For questions related to hardware requirements, refer to the documentation +supplied with your 10 Gigabit adapter. All hardware requirements listed apply +to use with Linux. + +The following features are available in this kernel: + - Native VLANs + - Channel Bonding (teaming) + - SNMP + +Channel Bonding documentation can be found in the Linux kernel source: +/Documentation/networking/bonding.txt + +The driver information previously displayed in the /proc filesystem is not +supported in this release. Alternatively, you can use ethtool (version 1.6 +or later), lspci, and iproute2 to obtain the same information. + +Instructions on updating ethtool can be found in the section "Additional +Configurations" later in this document. + + +Identifying Your Adapter +======================== + +The following Intel network adapters are compatible with the drivers in this +release: + ++------------+------------------------------+----------------------------------+ +| Controller | Adapter Name | Physical Layer | ++============+==============================+==================================+ +| 82597EX | Intel(R) PRO/10GbE LR/SR/CX4 | - 10G Base-LR (fiber) | +| | Server Adapters | - 10G Base-SR (fiber) | +| | | - 10G Base-CX4 (copper) | ++------------+------------------------------+----------------------------------+ + +For more information on how to identify your adapter, go to the Adapter & +Driver ID Guide at: + + https://support.intel.com + + +Command Line Parameters +======================= + +If the driver is built as a module, the following optional parameters are +used by entering them on the command line with the modprobe command using +this syntax:: + + modprobe ixgb [<option>=<VAL1>,<VAL2>,...] + +For example, with two 10GbE PCI adapters, entering:: + + modprobe ixgb TxDescriptors=80,128 + +loads the ixgb driver with 80 TX resources for the first adapter and 128 TX +resources for the second adapter. + +The default value for each parameter is generally the recommended setting, +unless otherwise noted. + +Copybreak +--------- +:Valid Range: 0-XXXX +:Default Value: 256 + + This is the maximum size of packet that is copied to a new buffer on + receive. + +Debug +----- +:Valid Range: 0-16 (0=none,...,16=all) +:Default Value: 0 + + This parameter adjusts the level of debug messages displayed in the + system logs. + +FlowControl +----------- +:Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx) +:Default Value: 1 if no EEPROM, otherwise read from EEPROM + + This parameter controls the automatic generation(Tx) and response(Rx) to + Ethernet PAUSE frames. There are hardware bugs associated with enabling + Tx flow control so beware. + +RxDescriptors +------------- +:Valid Range: 64-4096 +:Default Value: 1024 + + This value is the number of receive descriptors allocated by the driver. + Increasing this value allows the driver to buffer more incoming packets. + Each descriptor is 16 bytes. A receive buffer is also allocated for + each descriptor and can be either 2048, 4056, 8192, or 16384 bytes, + depending on the MTU setting. When the MTU size is 1500 or less, the + receive buffer size is 2048 bytes. When the MTU is greater than 1500 the + receive buffer size will be either 4056, 8192, or 16384 bytes. The + maximum MTU size is 16114. + +TxDescriptors +------------- +:Valid Range: 64-4096 +:Default Value: 256 + + This value is the number of transmit descriptors allocated by the driver. + Increasing this value allows the driver to queue more transmits. Each + descriptor is 16 bytes. + +RxIntDelay +---------- +:Valid Range: 0-65535 (0=off) +:Default Value: 72 + + This value delays the generation of receive interrupts in units of + 0.8192 microseconds. Receive interrupt reduction can improve CPU + efficiency if properly tuned for specific network traffic. Increasing + this value adds extra latency to frame reception and can end up + decreasing the throughput of TCP traffic. If the system is reporting + dropped receives, this value may be set too high, causing the driver to + run out of available receive descriptors. + +TxIntDelay +---------- +:Valid Range: 0-65535 (0=off) +:Default Value: 32 + + This value delays the generation of transmit interrupts in units of + 0.8192 microseconds. Transmit interrupt reduction can improve CPU + efficiency if properly tuned for specific network traffic. Increasing + this value adds extra latency to frame transmission and can end up + decreasing the throughput of TCP traffic. If this value is set too high, + it will cause the driver to run out of available transmit descriptors. + +XsumRX +------ +:Valid Range: 0-1 +:Default Value: 1 + + A value of '1' indicates that the driver should enable IP checksum + offload for received packets (both UDP and TCP) to the adapter hardware. + +RxFCHighThresh +-------------- +:Valid Range: 1,536-262,136 (0x600 - 0x3FFF8, 8 byte granularity) +:Default Value: 196,608 (0x30000) + + Receive Flow control high threshold (when we send a pause frame) + +RxFCLowThresh +------------- +:Valid Range: 64-262,136 (0x40 - 0x3FFF8, 8 byte granularity) +:Default Value: 163,840 (0x28000) + + Receive Flow control low threshold (when we send a resume frame) + +FCReqTimeout +------------ +:Valid Range: 1-65535 +:Default Value: 65535 + + Flow control request timeout (how long to pause the link partner's tx) + +IntDelayEnable +-------------- +:Value Range: 0,1 +:Default Value: 1 + + Interrupt Delay, 0 disables transmit interrupt delay and 1 enables it. + + +Improving Performance +===================== + +With the 10 Gigabit server adapters, the default Linux configuration will +very likely limit the total available throughput artificially. There is a set +of configuration changes that, when applied together, will increase the ability +of Linux to transmit and receive data. The following enhancements were +originally acquired from settings published at http://www.spec.org/web99/ for +various submitted results using Linux. + +NOTE: + These changes are only suggestions, and serve as a starting point for + tuning your network performance. + +The changes are made in three major ways, listed in order of greatest effect: + +- Use ip link to modify the mtu (maximum transmission unit) and the txqueuelen + parameter. +- Use sysctl to modify /proc parameters (essentially kernel tuning) +- Use setpci to modify the MMRBC field in PCI-X configuration space to increase + transmit burst lengths on the bus. + +NOTE: + setpci modifies the adapter's configuration registers to allow it to read + up to 4k bytes at a time (for transmits). However, for some systems the + behavior after modifying this register may be undefined (possibly errors of + some kind). A power-cycle, hard reset or explicitly setting the e6 register + back to 22 (setpci -d 8086:1a48 e6.b=22) may be required to get back to a + stable configuration. + +- COPY these lines and paste them into ixgb_perf.sh: + +:: + + #!/bin/bash + echo "configuring network performance , edit this file to change the interface + or device ID of 10GbE card" + # set mmrbc to 4k reads, modify only Intel 10GbE device IDs + # replace 1a48 with appropriate 10GbE device's ID installed on the system, + # if needed. + setpci -d 8086:1a48 e6.b=2e + # set the MTU (max transmission unit) - it requires your switch and clients + # to change as well. + # set the txqueuelen + # your ixgb adapter should be loaded as eth1 for this to work, change if needed + ip li set dev eth1 mtu 9000 txqueuelen 1000 up + # call the sysctl utility to modify /proc/sys entries + sysctl -p ./sysctl_ixgb.conf + +- COPY these lines and paste them into sysctl_ixgb.conf: + +:: + + # some of the defaults may be different for your kernel + # call this file with sysctl -p <this file> + # these are just suggested values that worked well to increase throughput in + # several network benchmark tests, your mileage may vary + + ### IPV4 specific settings + # turn TCP timestamp support off, default 1, reduces CPU use + net.ipv4.tcp_timestamps = 0 + # turn SACK support off, default on + # on systems with a VERY fast bus -> memory interface this is the big gainer + net.ipv4.tcp_sack = 0 + # set min/default/max TCP read buffer, default 4096 87380 174760 + net.ipv4.tcp_rmem = 10000000 10000000 10000000 + # set min/pressure/max TCP write buffer, default 4096 16384 131072 + net.ipv4.tcp_wmem = 10000000 10000000 10000000 + # set min/pressure/max TCP buffer space, default 31744 32256 32768 + net.ipv4.tcp_mem = 10000000 10000000 10000000 + + ### CORE settings (mostly for socket and UDP effect) + # set maximum receive socket buffer size, default 131071 + net.core.rmem_max = 524287 + # set maximum send socket buffer size, default 131071 + net.core.wmem_max = 524287 + # set default receive socket buffer size, default 65535 + net.core.rmem_default = 524287 + # set default send socket buffer size, default 65535 + net.core.wmem_default = 524287 + # set maximum amount of option memory buffers, default 10240 + net.core.optmem_max = 524287 + # set number of unprocessed input packets before kernel starts dropping them; default 300 + net.core.netdev_max_backlog = 300000 + +Edit the ixgb_perf.sh script if necessary to change eth1 to whatever interface +your ixgb driver is using and/or replace '1a48' with appropriate 10GbE device's +ID installed on the system. + +NOTE: + Unless these scripts are added to the boot process, these changes will + only last only until the next system reboot. + + +Resolving Slow UDP Traffic +-------------------------- +If your server does not seem to be able to receive UDP traffic as fast as it +can receive TCP traffic, it could be because Linux, by default, does not set +the network stack buffers as large as they need to be to support high UDP +transfer rates. One way to alleviate this problem is to allow more memory to +be used by the IP stack to store incoming data. + +For instance, use the commands:: + + sysctl -w net.core.rmem_max=262143 + +and:: + + sysctl -w net.core.rmem_default=262143 + +to increase the read buffer memory max and default to 262143 (256k - 1) from +defaults of max=131071 (128k - 1) and default=65535 (64k - 1). These variables +will increase the amount of memory used by the network stack for receives, and +can be increased significantly more if necessary for your application. + + +Additional Configurations +========================= + +Configuring the Driver on Different Distributions +------------------------------------------------- +Configuring a network driver to load properly when the system is started is +distribution dependent. Typically, the configuration process involves adding +an alias line to /etc/modprobe.conf as well as editing other system startup +scripts and/or configuration files. Many popular Linux distributions ship +with tools to make these changes for you. To learn the proper way to +configure a network device for your system, refer to your distribution +documentation. If during this process you are asked for the driver or module +name, the name for the Linux Base Driver for the Intel 10GbE Family of +Adapters is ixgb. + +Viewing Link Messages +--------------------- +Link messages will not be displayed to the console if the distribution is +restricting system messages. In order to see network driver link messages on +your console, set dmesg to eight by entering the following:: + + dmesg -n 8 + +NOTE: This setting is not saved across reboots. + +Jumbo Frames +------------ +The driver supports Jumbo Frames for all adapters. Jumbo Frames support is +enabled by changing the MTU to a value larger than the default of 1500. +The maximum value for the MTU is 16114. Use the ip command to +increase the MTU size. For example:: + + ip li set dev ethx mtu 9000 + +The maximum MTU setting for Jumbo Frames is 16114. This value coincides +with the maximum Jumbo Frames size of 16128. + +Ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The ethtool +version 1.6 or later is required for this functionality. + +The latest release of ethtool can be found from +https://www.kernel.org/pub/software/network/ethtool/ + +NOTE: + The ethtool version 1.6 only supports a limited set of ethtool options. + Support for a more complete ethtool feature set can be enabled by + upgrading to the latest version. + +NAPI +---- +NAPI (Rx polling mode) is supported in the ixgb driver. + +See https://wiki.linuxfoundation.org/networking/napi for more information on +NAPI. + + +Known Issues/Troubleshooting +============================ + +NOTE: + After installing the driver, if your Intel Network Connection is not + working, verify in the "In This Release" section of the readme that you have + installed the correct driver. + +Cable Interoperability Issue with Fujitsu XENPAK Module in SmartBits Chassis +---------------------------------------------------------------------------- +Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 +Server adapter is connected to a Fujitsu XENPAK CX4 module in a SmartBits +chassis using 15 m/24AWG cable assemblies manufactured by Fujitsu or Leoni. +The CRC errors may be received either by the Intel(R) PRO/10GbE CX4 +Server adapter or the SmartBits. If this situation occurs using a different +cable assembly may resolve the issue. + +Cable Interoperability Issues with HP Procurve 3400cl Switch Port +----------------------------------------------------------------- +Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 Server +adapter is connected to an HP Procurve 3400cl switch port using short cables +(1 m or shorter). If this situation occurs, using a longer cable may resolve +the issue. + +Excessive CRC errors may be observed using Fujitsu 24AWG cable assemblies that +Are 10 m or longer or where using a Leoni 15 m/24AWG cable assembly. The CRC +errors may be received either by the CX4 Server adapter or at the switch. If +this situation occurs, using a different cable assembly may resolve the issue. + +Jumbo Frames System Requirement +------------------------------- +Memory allocation failures have been observed on Linux systems with 64 MB +of RAM or less that are running Jumbo Frames. If you are using Jumbo +Frames, your system may require more than the advertised minimum +requirement of 64 MB of system memory. + +Performance Degradation with Jumbo Frames +----------------------------------------- +Degradation in throughput performance may be observed in some Jumbo frames +environments. If this is observed, increasing the application's socket buffer +size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help. +See the specific application manual and /usr/src/linux*/Documentation/ +networking/ip-sysctl.txt for more details. + +Allocating Rx Buffers when Using Jumbo Frames +--------------------------------------------- +Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if +the available memory is heavily fragmented. This issue may be seen with PCI-X +adapters or with packet split disabled. This can be reduced or eliminated +by changing the amount of available memory for receive buffer allocation, by +increasing /proc/sys/vm/min_free_kbytes. + +Multiple Interfaces on Same Ethernet Broadcast Network +------------------------------------------------------ +Due to the default ARP behavior on Linux, it is not possible to have +one system on two IP networks in the same Ethernet broadcast domain +(non-partitioned switch) behave as expected. All Ethernet interfaces +will respond to IP traffic for any IP address assigned to the system. +This results in unbalanced receive traffic. + +If you have multiple interfaces in a server, do either of the following: + + - Turn on ARP filtering by entering:: + + echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter + + - Install the interfaces in separate broadcast domains - either in + different switches or in a switch partitioned to VLANs. + +UDP Stress Test Dropped Packet Issue +-------------------------------------- +Under small packets UDP stress test with 10GbE driver, the Linux system +may drop UDP packets due to the fullness of socket buffers. You may want +to change the driver's Flow Control variables to the minimum value for +controlling packet reception. + +Tx Hangs Possible Under Stress +------------------------------ +Under stress conditions, if TX hangs occur, turning off TSO +"ethtool -K eth0 tso off" may resolve the problem. + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net diff --git a/Documentation/networking/ixgb.txt b/Documentation/networking/ixgb.txt deleted file mode 100644 index 09f71d71920a..000000000000 --- a/Documentation/networking/ixgb.txt +++ /dev/null @@ -1,433 +0,0 @@ -Linux Base Driver for 10 Gigabit Intel(R) Ethernet Network Connection -===================================================================== - -March 14, 2011 - - -Contents -======== - -- In This Release -- Identifying Your Adapter -- Building and Installation -- Command Line Parameters -- Improving Performance -- Additional Configurations -- Known Issues/Troubleshooting -- Support - - - -In This Release -=============== - -This file describes the ixgb Linux Base Driver for the 10 Gigabit Intel(R) -Network Connection. This driver includes support for Itanium(R)2-based -systems. - -For questions related to hardware requirements, refer to the documentation -supplied with your 10 Gigabit adapter. All hardware requirements listed apply -to use with Linux. - -The following features are available in this kernel: - - Native VLANs - - Channel Bonding (teaming) - - SNMP - -Channel Bonding documentation can be found in the Linux kernel source: -/Documentation/networking/bonding.txt - -The driver information previously displayed in the /proc filesystem is not -supported in this release. Alternatively, you can use ethtool (version 1.6 -or later), lspci, and iproute2 to obtain the same information. - -Instructions on updating ethtool can be found in the section "Additional -Configurations" later in this document. - - -Identifying Your Adapter -======================== - -The following Intel network adapters are compatible with the drivers in this -release: - -Controller Adapter Name Physical Layer ----------- ------------ -------------- -82597EX Intel(R) PRO/10GbE LR/SR/CX4 10G Base-LR (1310 nm optical fiber) - Server Adapters 10G Base-SR (850 nm optical fiber) - 10G Base-CX4(twin-axial copper cabling) - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/network/sb/CS-012904.htm - - -Building and Installation -========================= - -select m for "Intel(R) PRO/10GbE support" located at: - Location: - -> Device Drivers - -> Network device support (NETDEVICES [=y]) - -> Ethernet (10000 Mbit) (NETDEV_10000 [=y]) -1. make modules && make modules_install - -2. Load the module: - -    modprobe ixgb <parameter>=<value> - - The insmod command can be used if the full - path to the driver module is specified. For example: - - insmod /lib/modules/<KERNEL VERSION>/kernel/drivers/net/ixgb/ixgb.ko - - With 2.6 based kernels also make sure that older ixgb drivers are - removed from the kernel, before loading the new module: - - rmmod ixgb; modprobe ixgb - -3. Assign an IP address to the interface by entering the following, where - x is the interface number: - - ip addr add ethx <IP_address> - -4. Verify that the interface works. Enter the following, where <IP_address> - is the IP address for another machine on the same subnet as the interface - that is being tested: - - ping <IP_address> - - -Command Line Parameters -======================= - -If the driver is built as a module, the following optional parameters are -used by entering them on the command line with the modprobe command using -this syntax: - - modprobe ixgb [<option>=<VAL1>,<VAL2>,...] - -For example, with two 10GbE PCI adapters, entering: - - modprobe ixgb TxDescriptors=80,128 - -loads the ixgb driver with 80 TX resources for the first adapter and 128 TX -resources for the second adapter. - -The default value for each parameter is generally the recommended setting, -unless otherwise noted. - -FlowControl -Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx) -Default: Read from the EEPROM - If EEPROM is not detected, default is 1 - This parameter controls the automatic generation(Tx) and response(Rx) to - Ethernet PAUSE frames. There are hardware bugs associated with enabling - Tx flow control so beware. - -RxDescriptors -Valid Range: 64-512 -Default Value: 512 - This value is the number of receive descriptors allocated by the driver. - Increasing this value allows the driver to buffer more incoming packets. - Each descriptor is 16 bytes. A receive buffer is also allocated for - each descriptor and can be either 2048, 4056, 8192, or 16384 bytes, - depending on the MTU setting. When the MTU size is 1500 or less, the - receive buffer size is 2048 bytes. When the MTU is greater than 1500 the - receive buffer size will be either 4056, 8192, or 16384 bytes. The - maximum MTU size is 16114. - -RxIntDelay -Valid Range: 0-65535 (0=off) -Default Value: 72 - This value delays the generation of receive interrupts in units of - 0.8192 microseconds. Receive interrupt reduction can improve CPU - efficiency if properly tuned for specific network traffic. Increasing - this value adds extra latency to frame reception and can end up - decreasing the throughput of TCP traffic. If the system is reporting - dropped receives, this value may be set too high, causing the driver to - run out of available receive descriptors. - -TxDescriptors -Valid Range: 64-4096 -Default Value: 256 - This value is the number of transmit descriptors allocated by the driver. - Increasing this value allows the driver to queue more transmits. Each - descriptor is 16 bytes. - -XsumRX -Valid Range: 0-1 -Default Value: 1 - A value of '1' indicates that the driver should enable IP checksum - offload for received packets (both UDP and TCP) to the adapter hardware. - - -Improving Performance -===================== - -With the 10 Gigabit server adapters, the default Linux configuration will -very likely limit the total available throughput artificially. There is a set -of configuration changes that, when applied together, will increase the ability -of Linux to transmit and receive data. The following enhancements were -originally acquired from settings published at http://www.spec.org/web99/ for -various submitted results using Linux. - -NOTE: These changes are only suggestions, and serve as a starting point for - tuning your network performance. - -The changes are made in three major ways, listed in order of greatest effect: -- Use ip link to modify the mtu (maximum transmission unit) and the txqueuelen - parameter. -- Use sysctl to modify /proc parameters (essentially kernel tuning) -- Use setpci to modify the MMRBC field in PCI-X configuration space to increase - transmit burst lengths on the bus. - -NOTE: setpci modifies the adapter's configuration registers to allow it to read -up to 4k bytes at a time (for transmits). However, for some systems the -behavior after modifying this register may be undefined (possibly errors of -some kind). A power-cycle, hard reset or explicitly setting the e6 register -back to 22 (setpci -d 8086:1a48 e6.b=22) may be required to get back to a -stable configuration. - -- COPY these lines and paste them into ixgb_perf.sh: -#!/bin/bash -echo "configuring network performance , edit this file to change the interface -or device ID of 10GbE card" -# set mmrbc to 4k reads, modify only Intel 10GbE device IDs -# replace 1a48 with appropriate 10GbE device's ID installed on the system, -# if needed. -setpci -d 8086:1a48 e6.b=2e -# set the MTU (max transmission unit) - it requires your switch and clients -# to change as well. -# set the txqueuelen -# your ixgb adapter should be loaded as eth1 for this to work, change if needed -ip li set dev eth1 mtu 9000 txqueuelen 1000 up -# call the sysctl utility to modify /proc/sys entries -sysctl -p ./sysctl_ixgb.conf -- END ixgb_perf.sh - -- COPY these lines and paste them into sysctl_ixgb.conf: -# some of the defaults may be different for your kernel -# call this file with sysctl -p <this file> -# these are just suggested values that worked well to increase throughput in -# several network benchmark tests, your mileage may vary - -### IPV4 specific settings -# turn TCP timestamp support off, default 1, reduces CPU use -net.ipv4.tcp_timestamps = 0 -# turn SACK support off, default on -# on systems with a VERY fast bus -> memory interface this is the big gainer -net.ipv4.tcp_sack = 0 -# set min/default/max TCP read buffer, default 4096 87380 174760 -net.ipv4.tcp_rmem = 10000000 10000000 10000000 -# set min/pressure/max TCP write buffer, default 4096 16384 131072 -net.ipv4.tcp_wmem = 10000000 10000000 10000000 -# set min/pressure/max TCP buffer space, default 31744 32256 32768 -net.ipv4.tcp_mem = 10000000 10000000 10000000 - -### CORE settings (mostly for socket and UDP effect) -# set maximum receive socket buffer size, default 131071 -net.core.rmem_max = 524287 -# set maximum send socket buffer size, default 131071 -net.core.wmem_max = 524287 -# set default receive socket buffer size, default 65535 -net.core.rmem_default = 524287 -# set default send socket buffer size, default 65535 -net.core.wmem_default = 524287 -# set maximum amount of option memory buffers, default 10240 -net.core.optmem_max = 524287 -# set number of unprocessed input packets before kernel starts dropping them; default 300 -net.core.netdev_max_backlog = 300000 -- END sysctl_ixgb.conf - -Edit the ixgb_perf.sh script if necessary to change eth1 to whatever interface -your ixgb driver is using and/or replace '1a48' with appropriate 10GbE device's -ID installed on the system. - -NOTE: Unless these scripts are added to the boot process, these changes will - only last only until the next system reboot. - - -Resolving Slow UDP Traffic --------------------------- -If your server does not seem to be able to receive UDP traffic as fast as it -can receive TCP traffic, it could be because Linux, by default, does not set -the network stack buffers as large as they need to be to support high UDP -transfer rates. One way to alleviate this problem is to allow more memory to -be used by the IP stack to store incoming data. - -For instance, use the commands: - sysctl -w net.core.rmem_max=262143 -and - sysctl -w net.core.rmem_default=262143 -to increase the read buffer memory max and default to 262143 (256k - 1) from -defaults of max=131071 (128k - 1) and default=65535 (64k - 1). These variables -will increase the amount of memory used by the network stack for receives, and -can be increased significantly more if necessary for your application. - - -Additional Configurations -========================= - - Configuring the Driver on Different Distributions - ------------------------------------------------- - Configuring a network driver to load properly when the system is started is - distribution dependent. Typically, the configuration process involves adding - an alias line to /etc/modprobe.conf as well as editing other system startup - scripts and/or configuration files. Many popular Linux distributions ship - with tools to make these changes for you. To learn the proper way to - configure a network device for your system, refer to your distribution - documentation. If during this process you are asked for the driver or module - name, the name for the Linux Base Driver for the Intel 10GbE Family of - Adapters is ixgb. - - Viewing Link Messages - --------------------- - Link messages will not be displayed to the console if the distribution is - restricting system messages. In order to see network driver link messages on - your console, set dmesg to eight by entering the following: - - dmesg -n 8 - - NOTE: This setting is not saved across reboots. - - - Jumbo Frames - ------------ - The driver supports Jumbo Frames for all adapters. Jumbo Frames support is - enabled by changing the MTU to a value larger than the default of 1500. - The maximum value for the MTU is 16114. Use the ip command to - increase the MTU size. For example: - - ip li set dev ethx mtu 9000 - - The maximum MTU setting for Jumbo Frames is 16114. This value coincides - with the maximum Jumbo Frames size of 16128. - - - ethtool - ------- - The driver utilizes the ethtool interface for driver configuration and - diagnostics, as well as displaying statistical information. The ethtool - version 1.6 or later is required for this functionality. - - The latest release of ethtool can be found from - https://www.kernel.org/pub/software/network/ethtool/ - - NOTE: The ethtool version 1.6 only supports a limited set of ethtool options. - Support for a more complete ethtool feature set can be enabled by - upgrading to the latest version. - - - NAPI - ---- - - NAPI (Rx polling mode) is supported in the ixgb driver. NAPI is enabled - or disabled based on the configuration of the kernel. see CONFIG_IXGB_NAPI - - See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI. - - -Known Issues/Troubleshooting -============================ - - NOTE: After installing the driver, if your Intel Network Connection is not - working, verify in the "In This Release" section of the readme that you have - installed the correct driver. - - Intel(R) PRO/10GbE CX4 Server Adapter Cable Interoperability Issue with - Fujitsu XENPAK Module in SmartBits Chassis - --------------------------------------------------------------------- - Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 - Server adapter is connected to a Fujitsu XENPAK CX4 module in a SmartBits - chassis using 15 m/24AWG cable assemblies manufactured by Fujitsu or Leoni. - The CRC errors may be received either by the Intel(R) PRO/10GbE CX4 - Server adapter or the SmartBits. If this situation occurs using a different - cable assembly may resolve the issue. - - CX4 Server Adapter Cable Interoperability Issues with HP Procurve 3400cl - Switch Port - ------------------------------------------------------------------------ - Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 Server - adapter is connected to an HP Procurve 3400cl switch port using short cables - (1 m or shorter). If this situation occurs, using a longer cable may resolve - the issue. - - Excessive CRC errors may be observed using Fujitsu 24AWG cable assemblies that - Are 10 m or longer or where using a Leoni 15 m/24AWG cable assembly. The CRC - errors may be received either by the CX4 Server adapter or at the switch. If - this situation occurs, using a different cable assembly may resolve the issue. - - - Jumbo Frames System Requirement - ------------------------------- - Memory allocation failures have been observed on Linux systems with 64 MB - of RAM or less that are running Jumbo Frames. If you are using Jumbo - Frames, your system may require more than the advertised minimum - requirement of 64 MB of system memory. - - - Performance Degradation with Jumbo Frames - ----------------------------------------- - Degradation in throughput performance may be observed in some Jumbo frames - environments. If this is observed, increasing the application's socket buffer - size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help. - See the specific application manual and /usr/src/linux*/Documentation/ - networking/ip-sysctl.txt for more details. - - - Allocating Rx Buffers when Using Jumbo Frames - --------------------------------------------- - Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if - the available memory is heavily fragmented. This issue may be seen with PCI-X - adapters or with packet split disabled. This can be reduced or eliminated - by changing the amount of available memory for receive buffer allocation, by - increasing /proc/sys/vm/min_free_kbytes. - - - Multiple Interfaces on Same Ethernet Broadcast Network - ------------------------------------------------------ - Due to the default ARP behavior on Linux, it is not possible to have - one system on two IP networks in the same Ethernet broadcast domain - (non-partitioned switch) behave as expected. All Ethernet interfaces - will respond to IP traffic for any IP address assigned to the system. - This results in unbalanced receive traffic. - - If you have multiple interfaces in a server, do either of the following: - - - Turn on ARP filtering by entering: - echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter - - - Install the interfaces in separate broadcast domains - either in - different switches or in a switch partitioned to VLANs. - - - UDP Stress Test Dropped Packet Issue - -------------------------------------- - Under small packets UDP stress test with 10GbE driver, the Linux system - may drop UDP packets due to the fullness of socket buffers. You may want - to change the driver's Flow Control variables to the minimum value for - controlling packet reception. - - - Tx Hangs Possible Under Stress - ------------------------------ - Under stress conditions, if TX hangs occur, turning off TSO - "ethtool -K eth0 tso off" may resolve the problem. - - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://sourceforge.net/projects/e1000 - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/ixgbe.rst b/Documentation/networking/ixgbe.rst new file mode 100644 index 000000000000..725fc697fd8f --- /dev/null +++ b/Documentation/networking/ixgbe.rst @@ -0,0 +1,527 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Driver for the Intel(R) Ethernet 10 Gigabit PCI Express Adapters +============================================================================= + +Intel 10 Gigabit Linux driver. +Copyright(c) 1999-2018 Intel Corporation. + +Contents +======== + +- Identifying Your Adapter +- Command Line Parameters +- Additional Configurations +- Known Issues +- Support + +Identifying Your Adapter +======================== +The driver is compatible with devices based on the following: + + * Intel(R) Ethernet Controller 82598 + * Intel(R) Ethernet Controller 82599 + * Intel(R) Ethernet Controller X520 + * Intel(R) Ethernet Controller X540 + * Intel(R) Ethernet Controller x550 + * Intel(R) Ethernet Controller X552 + * Intel(R) Ethernet Controller X553 + +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +https://www.intel.com/support + +SFP+ Devices with Pluggable Optics +---------------------------------- + +82599-BASED ADAPTERS +~~~~~~~~~~~~~~~~~~~~ +NOTES: +- If your 82599-based Intel(R) Network Adapter came with Intel optics or is an +Intel(R) Ethernet Server Adapter X520-2, then it only supports Intel optics +and/or the direct attach cables listed below. +- When 82599-based SFP+ devices are connected back to back, they should be set +to the same Speed setting via ethtool. Results may vary if you mix speed +settings. + ++---------------+---------------------------------------+------------------+ +| Supplier | Type | Part Numbers | ++===============+=======================================+==================+ +| SR Modules | ++---------------+---------------------------------------+------------------+ +| Intel | DUAL RATE 1G/10G SFP+ SR (bailed) | FTLX8571D3BCV-IT | ++---------------+---------------------------------------+------------------+ +| Intel | DUAL RATE 1G/10G SFP+ SR (bailed) | AFBR-703SDZ-IN2 | ++---------------+---------------------------------------+------------------+ +| Intel | DUAL RATE 1G/10G SFP+ SR (bailed) | AFBR-703SDDZ-IN1 | ++---------------+---------------------------------------+------------------+ +| LR Modules | ++---------------+---------------------------------------+------------------+ +| Intel | DUAL RATE 1G/10G SFP+ LR (bailed) | FTLX1471D3BCV-IT | ++---------------+---------------------------------------+------------------+ +| Intel | DUAL RATE 1G/10G SFP+ LR (bailed) | AFCT-701SDZ-IN2 | ++---------------+---------------------------------------+------------------+ +| Intel | DUAL RATE 1G/10G SFP+ LR (bailed) | AFCT-701SDDZ-IN1 | ++---------------+---------------------------------------+------------------+ + +The following is a list of 3rd party SFP+ modules that have received some +testing. Not all modules are applicable to all devices. + ++---------------+---------------------------------------+------------------+ +| Supplier | Type | Part Numbers | ++===============+=======================================+==================+ +| Finisar | SFP+ SR bailed, 10g single rate | FTLX8571D3BCL | ++---------------+---------------------------------------+------------------+ +| Avago | SFP+ SR bailed, 10g single rate | AFBR-700SDZ | ++---------------+---------------------------------------+------------------+ +| Finisar | SFP+ LR bailed, 10g single rate | FTLX1471D3BCL | ++---------------+---------------------------------------+------------------+ +| Finisar | DUAL RATE 1G/10G SFP+ SR (No Bail) | FTLX8571D3QCV-IT | ++---------------+---------------------------------------+------------------+ +| Avago | DUAL RATE 1G/10G SFP+ SR (No Bail) | AFBR-703SDZ-IN1 | ++---------------+---------------------------------------+------------------+ +| Finisar | DUAL RATE 1G/10G SFP+ LR (No Bail) | FTLX1471D3QCV-IT | ++---------------+---------------------------------------+------------------+ +| Avago | DUAL RATE 1G/10G SFP+ LR (No Bail) | AFCT-701SDZ-IN1 | ++---------------+---------------------------------------+------------------+ +| Finisar | 1000BASE-T SFP | FCLF8522P2BTL | ++---------------+---------------------------------------+------------------+ +| Avago | 1000BASE-T | ABCU-5710RZ | ++---------------+---------------------------------------+------------------+ +| HP | 1000BASE-SX SFP | 453153-001 | ++---------------+---------------------------------------+------------------+ + +82599-based adapters support all passive and active limiting direct attach +cables that comply with SFF-8431 v4.1 and SFF-8472 v10.4 specifications. + +Laser turns off for SFP+ when ifconfig ethX down +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +"ifconfig ethX down" turns off the laser for 82599-based SFP+ fiber adapters. +"ifconfig ethX up" turns on the laser. +Alternatively, you can use "ip link set [down/up] dev ethX" to turn the +laser off and on. + + +82599-based QSFP+ Adapters +~~~~~~~~~~~~~~~~~~~~~~~~~~ +NOTES: +- If your 82599-based Intel(R) Network Adapter came with Intel optics, it only +supports Intel optics. +- 82599-based QSFP+ adapters only support 4x10 Gbps connections. 1x40 Gbps +connections are not supported. QSFP+ link partners must be configured for +4x10 Gbps. +- 82599-based QSFP+ adapters do not support automatic link speed detection. +The link speed must be configured to either 10 Gbps or 1 Gbps to match the link +partners speed capabilities. Incorrect speed configurations will result in +failure to link. +- Intel(R) Ethernet Converged Network Adapter X520-Q1 only supports the optics +and direct attach cables listed below. + ++---------------+---------------------------------------+------------------+ +| Supplier | Type | Part Numbers | ++===============+=======================================+==================+ +| Intel | DUAL RATE 1G/10G QSFP+ SRL (bailed) | E10GQSFPSR | ++---------------+---------------------------------------+------------------+ + +82599-based QSFP+ adapters support all passive and active limiting QSFP+ +direct attach cables that comply with SFF-8436 v4.1 specifications. + +82598-BASED ADAPTERS +~~~~~~~~~~~~~~~~~~~~ +NOTES: +- Intel(r) Ethernet Network Adapters that support removable optical modules +only support their original module type (for example, the Intel(R) 10 Gigabit +SR Dual Port Express Module only supports SR optical modules). If you plug in +a different type of module, the driver will not load. +- Hot Swapping/hot plugging optical modules is not supported. +- Only single speed, 10 gigabit modules are supported. +- LAN on Motherboard (LOMs) may support DA, SR, or LR modules. Other module +types are not supported. Please see your system documentation for details. + +The following is a list of SFP+ modules and direct attach cables that have +received some testing. Not all modules are applicable to all devices. + ++---------------+---------------------------------------+------------------+ +| Supplier | Type | Part Numbers | ++===============+=======================================+==================+ +| Finisar | SFP+ SR bailed, 10g single rate | FTLX8571D3BCL | ++---------------+---------------------------------------+------------------+ +| Avago | SFP+ SR bailed, 10g single rate | AFBR-700SDZ | ++---------------+---------------------------------------+------------------+ +| Finisar | SFP+ LR bailed, 10g single rate | FTLX1471D3BCL | ++---------------+---------------------------------------+------------------+ + +82598-based adapters support all passive direct attach cables that comply with +SFF-8431 v4.1 and SFF-8472 v10.4 specifications. Active direct attach cables +are not supported. + +Third party optic modules and cables referred to above are listed only for the +purpose of highlighting third party specifications and potential +compatibility, and are not recommendations or endorsements or sponsorship of +any third party's product by Intel. Intel is not endorsing or promoting +products made by any third party and the third party reference is provided +only to share information regarding certain optic modules and cables with the +above specifications. There may be other manufacturers or suppliers, producing +or supplying optic modules and cables with similar or matching descriptions. +Customers must use their own discretion and diligence to purchase optic +modules and cables from any third party of their choice. Customers are solely +responsible for assessing the suitability of the product and/or devices and +for the selection of the vendor for purchasing any product. THE OPTIC MODULES +AND CABLES REFERRED TO ABOVE ARE NOT WARRANTED OR SUPPORTED BY INTEL. INTEL +ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED +WARRANTY, RELATING TO SALE AND/OR USE OF SUCH THIRD PARTY PRODUCTS OR +SELECTION OF VENDOR BY CUSTOMERS. + +Command Line Parameters +======================= + +max_vfs +------- +:Valid Range: 1-63 + +This parameter adds support for SR-IOV. It causes the driver to spawn up to +max_vfs worth of virtual functions. +If the value is greater than 0 it will also force the VMDq parameter to be 1 or +more. + +NOTE: This parameter is only used on kernel 3.7.x and below. On kernel 3.8.x +and above, use sysfs to enable VFs. Also, for Red Hat distributions, this +parameter is only used on version 6.6 and older. For version 6.7 and newer, use +sysfs. For example:: + + #echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs // enable VFs + #echo 0 > /sys/class/net/$dev/device/sriov_numvfs //disable VFs + +The parameters for the driver are referenced by position. Thus, if you have a +dual port adapter, or more than one adapter in your system, and want N virtual +functions per port, you must specify a number for each port with each parameter +separated by a comma. For example:: + + modprobe ixgbe max_vfs=4 + +This will spawn 4 VFs on the first port. + +:: + + modprobe ixgbe max_vfs=2,4 + +This will spawn 2 VFs on the first port and 4 VFs on the second port. + +NOTE: Caution must be used in loading the driver with these parameters. +Depending on your system configuration, number of slots, etc., it is impossible +to predict in all cases where the positions would be on the command line. + +NOTE: Neither the device nor the driver control how VFs are mapped into config +space. Bus layout will vary by operating system. On operating systems that +support it, you can check sysfs to find the mapping. + +NOTE: When either SR-IOV mode or VMDq mode is enabled, hardware VLAN filtering +and VLAN tag stripping/insertion will remain enabled. Please remove the old +VLAN filter before the new VLAN filter is added. For example, + +:: + + ip link set eth0 vf 0 vlan 100 // set VLAN 100 for VF 0 + ip link set eth0 vf 0 vlan 0 // Delete VLAN 100 + ip link set eth0 vf 0 vlan 200 // set a new VLAN 200 for VF 0 + +With kernel 3.6, the driver supports the simultaneous usage of max_vfs and DCB +features, subject to the constraints described below. Prior to kernel 3.6, the +driver did not support the simultaneous operation of max_vfs greater than 0 and +the DCB features (multiple traffic classes utilizing Priority Flow Control and +Extended Transmission Selection). + +When DCB is enabled, network traffic is transmitted and received through +multiple traffic classes (packet buffers in the NIC). The traffic is associated +with a specific class based on priority, which has a value of 0 through 7 used +in the VLAN tag. When SR-IOV is not enabled, each traffic class is associated +with a set of receive/transmit descriptor queue pairs. The number of queue +pairs for a given traffic class depends on the hardware configuration. When +SR-IOV is enabled, the descriptor queue pairs are grouped into pools. The +Physical Function (PF) and each Virtual Function (VF) is allocated a pool of +receive/transmit descriptor queue pairs. When multiple traffic classes are +configured (for example, DCB is enabled), each pool contains a queue pair from +each traffic class. When a single traffic class is configured in the hardware, +the pools contain multiple queue pairs from the single traffic class. + +The number of VFs that can be allocated depends on the number of traffic +classes that can be enabled. The configurable number of traffic classes for +each enabled VF is as follows: +0 - 15 VFs = Up to 8 traffic classes, depending on device support +16 - 31 VFs = Up to 4 traffic classes +32 - 63 VFs = 1 traffic class + +When VFs are configured, the PF is allocated one pool as well. The PF supports +the DCB features with the constraint that each traffic class will only use a +single queue pair. When zero VFs are configured, the PF can support multiple +queue pairs per traffic class. + +allow_unsupported_sfp +--------------------- +:Valid Range: 0,1 +:Default Value: 0 (disabled) + +This parameter allows unsupported and untested SFP+ modules on 82599-based +adapters, as long as the type of module is known to the driver. + +debug +----- +:Valid Range: 0-16 (0=none,...,16=all) +:Default Value: 0 + +This parameter adjusts the level of debug messages displayed in the system +logs. + + +Additional Features and Configurations +====================================== + +Flow Control +------------ +Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable +receiving and transmitting pause frames for ixgbe. When transmit is enabled, +pause frames are generated when the receive packet buffer crosses a predefined +threshold. When receive is enabled, the transmit unit will halt for the time +delay specified when a pause frame is received. + +NOTE: You must have a flow control capable link partner. + +Flow Control is enabled by default. + +Use ethtool to change the flow control settings. To enable or disable Rx or +Tx Flow Control:: + + ethtool -A eth? rx <on|off> tx <on|off> + +Note: This command only enables or disables Flow Control if auto-negotiation is +disabled. If auto-negotiation is enabled, this command changes the parameters +used for auto-negotiation with the link partner. + +To enable or disable auto-negotiation:: + + ethtool -s eth? autoneg <on|off> + +Note: Flow Control auto-negotiation is part of link auto-negotiation. Depending +on your device, you may not be able to change the auto-negotiation setting. + +NOTE: For 82598 backplane cards entering 1 gigabit mode, flow control default +behavior is changed to off. Flow control in 1 gigabit mode on these devices can +lead to transmit hangs. + +Intel(R) Ethernet Flow Director +------------------------------- +The Intel Ethernet Flow Director performs the following tasks: + +- Directs receive packets according to their flows to different queues. +- Enables tight control on routing a flow in the platform. +- Matches flows and CPU cores for flow affinity. +- Supports multiple parameters for flexible flow classification and load + balancing (in SFP mode only). + +NOTE: Intel Ethernet Flow Director masking works in the opposite manner from +subnet masking. In the following command:: + + #ethtool -N eth11 flow-type ip4 src-ip 172.4.1.2 m 255.0.0.0 dst-ip \ + 172.21.1.1 m 255.128.0.0 action 31 + +The src-ip value that is written to the filter will be 0.4.1.2, not 172.0.0.0 +as might be expected. Similarly, the dst-ip value written to the filter will be +0.21.1.1, not 172.0.0.0. + +To enable or disable the Intel Ethernet Flow Director:: + + # ethtool -K ethX ntuple <on|off> + +When disabling ntuple filters, all the user programmed filters are flushed from +the driver cache and hardware. All needed filters must be re-added when ntuple +is re-enabled. + +To add a filter that directs packet to queue 2, use -U or -N switch:: + + # ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \ + 192.168.10.2 src-port 2000 dst-port 2001 action 2 [loc 1] + +To see the list of filters currently present:: + + # ethtool <-u|-n> ethX + +Sideband Perfect Filters +------------------------ +Sideband Perfect Filters are used to direct traffic that matches specified +characteristics. They are enabled through ethtool's ntuple interface. To add a +new filter use the following command:: + + ethtool -U <device> flow-type <type> src-ip <ip> dst-ip <ip> src-port <port> \ + dst-port <port> action <queue> + +Where: + <device> - the ethernet device to program + <type> - can be ip4, tcp4, udp4, or sctp4 + <ip> - the IP address to match on + <port> - the port number to match on + <queue> - the queue to direct traffic towards (-1 discards the matched traffic) + +Use the following command to delete a filter:: + + ethtool -U <device> delete <N> + +Where <N> is the filter id displayed when printing all the active filters, and +may also have been specified using "loc <N>" when adding the filter. + +The following example matches TCP traffic sent from 192.168.0.1, port 5300, +directed to 192.168.0.5, port 80, and sends it to queue 7:: + + ethtool -U enp130s0 flow-type tcp4 src-ip 192.168.0.1 dst-ip 192.168.0.5 \ + src-port 5300 dst-port 80 action 7 + +For each flow-type, the programmed filters must all have the same matching +input set. For example, issuing the following two commands is acceptable:: + + ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7 + ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.5 src-port 55 action 10 + +Issuing the next two commands, however, is not acceptable, since the first +specifies src-ip and the second specifies dst-ip:: + + ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7 + ethtool -U enp130s0 flow-type ip4 dst-ip 192.168.0.5 src-port 55 action 10 + +The second command will fail with an error. You may program multiple filters +with the same fields, using different values, but, on one device, you may not +program two TCP4 filters with different matching fields. + +Matching on a sub-portion of a field is not supported by the ixgbe driver, thus +partial mask fields are not supported. + +To create filters that direct traffic to a specific Virtual Function, use the +"user-def" parameter. Specify the user-def as a 64 bit value, where the lower 32 +bits represents the queue number, while the next 8 bits represent which VF. +Note that 0 is the PF, so the VF identifier is offset by 1. For example:: + + ... user-def 0x800000002 ... + +specifies to direct traffic to Virtual Function 7 (8 minus 1) into queue 2 of +that VF. + +Note that these filters will not break internal routing rules, and will not +route traffic that otherwise would not have been sent to the specified Virtual +Function. + +Jumbo Frames +------------ +Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) +to a value larger than the default value of 1500. + +Use the ifconfig command to increase the MTU size. For example, enter the +following where <x> is the interface number:: + + ifconfig eth<x> mtu 9000 up + +Alternatively, you can use the ip command as follows:: + + ip link set mtu 9000 dev eth<x> + ip link set up dev eth<x> + +This setting is not saved across reboots. The setting change can be made +permanent by adding 'MTU=9000' to the file:: + + /etc/sysconfig/network-scripts/ifcfg-eth<x> // for RHEL + /etc/sysconfig/network/<config_file> // for SLES + +NOTE: The maximum MTU setting for Jumbo Frames is 9710. This value coincides +with the maximum Jumbo Frames size of 9728 bytes. + +NOTE: This driver will attempt to use multiple page sized buffers to receive +each jumbo packet. This should help to avoid buffer starvation issues when +allocating receive packets. + +NOTE: For 82599-based network connections, if you are enabling jumbo frames in +a virtual function (VF), jumbo frames must first be enabled in the physical +function (PF). The VF MTU setting cannot be larger than the PF MTU. + +Generic Receive Offload, aka GRO +-------------------------------- +The driver supports the in-kernel software implementation of GRO. GRO has +shown that by coalescing Rx traffic into larger chunks of data, CPU +utilization can be significantly reduced when under large Rx load. GRO is an +evolution of the previously-used LRO interface. GRO is able to coalesce +other protocols besides TCP. It's also safe to use with configurations that +are problematic for LRO, namely bridging and iSCSI. + +Data Center Bridging (DCB) +-------------------------- +NOTE: +The kernel assumes that TC0 is available, and will disable Priority Flow +Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is +enabled when setting up DCB on your switch. + +DCB is a configuration Quality of Service implementation in hardware. It uses +the VLAN priority tag (802.1p) to filter traffic. That means that there are 8 +different priorities that traffic can be filtered into. It also enables +priority flow control (802.1Qbb) which can limit or eliminate the number of +dropped packets during network stress. Bandwidth can be allocated to each of +these priorities, which is enforced at the hardware level (802.1Qaz). + +Adapter firmware implements LLDP and DCBX protocol agents as per 802.1AB and +802.1Qaz respectively. The firmware based DCBX agent runs in willing mode only +and can accept settings from a DCBX capable peer. Software configuration of +DCBX parameters via dcbtool/lldptool are not supported. + +The ixgbe driver implements the DCB netlink interface layer to allow user-space +to communicate with the driver and query DCB configuration for the port. + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: +https://www.kernel.org/pub/software/network/ethtool/ + +FCoE +---- +The ixgbe driver supports Fiber Channel over Ethernet (FCoE) and Data Center +Bridging (DCB). This code has no default effect on the regular driver +operation. Configuring DCB and FCoE is outside the scope of this README. Refer +to http://www.open-fcoe.org/ for FCoE project information and contact +ixgbe-eedc@lists.sourceforge.net for DCB information. + +MAC and VLAN anti-spoofing feature +---------------------------------- +When a malicious driver attempts to send a spoofed packet, it is dropped by the +hardware and not transmitted. + +An interrupt is sent to the PF driver notifying it of the spoof attempt. When a +spoofed packet is detected, the PF driver will send the following message to +the system log (displayed by the "dmesg" command):: + + ixgbe ethX: ixgbe_spoof_check: n spoofed packets detected + +where "x" is the PF interface number; and "n" is number of spoofed packets. +NOTE: This feature can be disabled for a specific Virtual Function (VF):: + + ip link set <pf dev> vf <vf id> spoofchk {off|on} + + +Known Issues/Troubleshooting +============================ + +Enabling SR-IOV in a 64-bit Microsoft* Windows Server* 2012/R2 guest OS +----------------------------------------------------------------------- +Linux KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM. +This includes traditional PCIe devices, as well as SR-IOV-capable devices based +on the Intel Ethernet Controller XL710. + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/ixgbe.txt b/Documentation/networking/ixgbe.txt deleted file mode 100644 index 687835415707..000000000000 --- a/Documentation/networking/ixgbe.txt +++ /dev/null @@ -1,349 +0,0 @@ -Linux* Base Driver for the Intel(R) Ethernet 10 Gigabit PCI Express Family of -Adapters -============================================================================= - -Intel 10 Gigabit Linux driver. -Copyright(c) 1999 - 2013 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Additional Configurations -- Performance Tuning -- Known Issues -- Support - -Identifying Your Adapter -======================== - -The driver in this release is compatible with 82598, 82599 and X540-based -Intel Network Connections. - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/network/sb/CS-012904.htm - -SFP+ Devices with Pluggable Optics ----------------------------------- - -82599-BASED ADAPTERS - -NOTES: If your 82599-based Intel(R) Network Adapter came with Intel optics, or -is an Intel(R) Ethernet Server Adapter X520-2, then it only supports Intel -optics and/or the direct attach cables listed below. - -When 82599-based SFP+ devices are connected back to back, they should be set to -the same Speed setting via ethtool. Results may vary if you mix speed settings. -82598-based adapters support all passive direct attach cables that comply -with SFF-8431 v4.1 and SFF-8472 v10.4 specifications. Active direct attach -cables are not supported. - -Supplier Type Part Numbers - -SR Modules -Intel DUAL RATE 1G/10G SFP+ SR (bailed) FTLX8571D3BCV-IT -Intel DUAL RATE 1G/10G SFP+ SR (bailed) AFBR-703SDDZ-IN1 -Intel DUAL RATE 1G/10G SFP+ SR (bailed) AFBR-703SDZ-IN2 -LR Modules -Intel DUAL RATE 1G/10G SFP+ LR (bailed) FTLX1471D3BCV-IT -Intel DUAL RATE 1G/10G SFP+ LR (bailed) AFCT-701SDDZ-IN1 -Intel DUAL RATE 1G/10G SFP+ LR (bailed) AFCT-701SDZ-IN2 - -The following is a list of 3rd party SFP+ modules and direct attach cables that -have received some testing. Not all modules are applicable to all devices. - -Supplier Type Part Numbers - -Finisar SFP+ SR bailed, 10g single rate FTLX8571D3BCL -Avago SFP+ SR bailed, 10g single rate AFBR-700SDZ -Finisar SFP+ LR bailed, 10g single rate FTLX1471D3BCL - -Finisar DUAL RATE 1G/10G SFP+ SR (No Bail) FTLX8571D3QCV-IT -Avago DUAL RATE 1G/10G SFP+ SR (No Bail) AFBR-703SDZ-IN1 -Finisar DUAL RATE 1G/10G SFP+ LR (No Bail) FTLX1471D3QCV-IT -Avago DUAL RATE 1G/10G SFP+ LR (No Bail) AFCT-701SDZ-IN1 -Finistar 1000BASE-T SFP FCLF8522P2BTL -Avago 1000BASE-T SFP ABCU-5710RZ - -82599-based adapters support all passive and active limiting direct attach -cables that comply with SFF-8431 v4.1 and SFF-8472 v10.4 specifications. - -Laser turns off for SFP+ when device is down -------------------------------------------- -"ip link set down" turns off the laser for 82599-based SFP+ fiber adapters. -"ip link set up" turns on the laser. - - -82598-BASED ADAPTERS - -NOTES for 82598-Based Adapters: -- Intel(R) Network Adapters that support removable optical modules only support - their original module type (i.e., the Intel(R) 10 Gigabit SR Dual Port - Express Module only supports SR optical modules). If you plug in a different - type of module, the driver will not load. -- Hot Swapping/hot plugging optical modules is not supported. -- Only single speed, 10 gigabit modules are supported. -- LAN on Motherboard (LOMs) may support DA, SR, or LR modules. Other module - types are not supported. Please see your system documentation for details. - -The following is a list of 3rd party SFP+ modules and direct attach cables that -have received some testing. Not all modules are applicable to all devices. - -Supplier Type Part Numbers - -Finisar SFP+ SR bailed, 10g single rate FTLX8571D3BCL -Avago SFP+ SR bailed, 10g single rate AFBR-700SDZ -Finisar SFP+ LR bailed, 10g single rate FTLX1471D3BCL - -82598-based adapters support all passive direct attach cables that comply -with SFF-8431 v4.1 and SFF-8472 v10.4 specifications. Active direct attach -cables are not supported. - - -Flow Control ------------- -Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable -receiving and transmitting pause frames for ixgbe. When TX is enabled, PAUSE -frames are generated when the receive packet buffer crosses a predefined -threshold. When rx is enabled, the transmit unit will halt for the time delay -specified when a PAUSE frame is received. - -Flow Control is enabled by default. If you want to disable a flow control -capable link partner, use ethtool: - - ethtool -A eth? autoneg off RX off TX off - -NOTE: For 82598 backplane cards entering 1 gig mode, flow control default -behavior is changed to off. Flow control in 1 gig mode on these devices can -lead to Tx hangs. - -Intel(R) Ethernet Flow Director -------------------------------- -Supports advanced filters that direct receive packets by their flows to -different queues. Enables tight control on routing a flow in the platform. -Matches flows and CPU cores for flow affinity. Supports multiple parameters -for flexible flow classification and load balancing. - -Flow director is enabled only if the kernel is multiple TX queue capable. - -An included script (set_irq_affinity.sh) automates setting the IRQ to CPU -affinity. - -You can verify that the driver is using Flow Director by looking at the counter -in ethtool: fdir_miss and fdir_match. - -Other ethtool Commands: -To enable Flow Director - ethtool -K ethX ntuple on -To add a filter - Use -U switch. e.g., ethtool -U ethX flow-type tcp4 src-ip 10.0.128.23 - action 1 -To see the list of filters currently present: - ethtool -u ethX - -Perfect Filter: Perfect filter is an interface to load the filter table that -funnels all flow into queue_0 unless an alternative queue is specified using -"action". In that case, any flow that matches the filter criteria will be -directed to the appropriate queue. - -If the queue is defined as -1, filter will drop matching packets. - -To account for filter matches and misses, there are two stats in ethtool: -fdir_match and fdir_miss. In addition, rx_queue_N_packets shows the number of -packets processed by the Nth queue. - -NOTE: Receive Packet Steering (RPS) and Receive Flow Steering (RFS) are not -compatible with Flow Director. IF Flow Director is enabled, these will be -disabled. - -The following three parameters impact Flow Director. - -FdirMode --------- -Valid Range: 0-2 (0=off, 1=ATR, 2=Perfect filter mode) -Default Value: 1 - - Flow Director filtering modes. - -FdirPballoc ------------ -Valid Range: 0-2 (0=64k, 1=128k, 2=256k) -Default Value: 0 - - Flow Director allocated packet buffer size. - -AtrSampleRate --------------- -Valid Range: 1-100 -Default Value: 20 - - Software ATR Tx packet sample rate. For example, when set to 20, every 20th - packet, looks to see if the packet will create a new flow. - -Node ----- -Valid Range: 0-n -Default Value: 1 (off) - - 0 - n: where n is the number of NUMA nodes (i.e. 0 - 3) currently online in - your system - 1: turns this option off - - The Node parameter will allow you to pick which NUMA node you want to have - the adapter allocate memory on. - -max_vfs -------- -Valid Range: 1-63 -Default Value: 0 - - If the value is greater than 0 it will also force the VMDq parameter to be 1 - or more. - - This parameter adds support for SR-IOV. It causes the driver to spawn up to - max_vfs worth of virtual function. - - -Additional Configurations -========================= - - Jumbo Frames - ------------ - The driver supports Jumbo Frames for all adapters. Jumbo Frames support is - enabled by changing the MTU to a value larger than the default of 1500. - The maximum value for the MTU is 16110. Use the ip command to - increase the MTU size. For example: - - ip link set dev ethx mtu 9000 - - The maximum MTU setting for Jumbo Frames is 9710. This value coincides - with the maximum Jumbo Frames size of 9728. - - Generic Receive Offload, aka GRO - -------------------------------- - The driver supports the in-kernel software implementation of GRO. GRO has - shown that by coalescing Rx traffic into larger chunks of data, CPU - utilization can be significantly reduced when under large Rx load. GRO is an - evolution of the previously-used LRO interface. GRO is able to coalesce - other protocols besides TCP. It's also safe to use with configurations that - are problematic for LRO, namely bridging and iSCSI. - - Data Center Bridging, aka DCB - ----------------------------- - DCB is a configuration Quality of Service implementation in hardware. - It uses the VLAN priority tag (802.1p) to filter traffic. That means - that there are 8 different priorities that traffic can be filtered into. - It also enables priority flow control which can limit or eliminate the - number of dropped packets during network stress. Bandwidth can be - allocated to each of these priorities, which is enforced at the hardware - level. - - To enable DCB support in ixgbe, you must enable the DCB netlink layer to - allow the userspace tools (see below) to communicate with the driver. - This can be found in the kernel configuration here: - - -> Networking support - -> Networking options - -> Data Center Bridging support - - Once this is selected, DCB support must be selected for ixgbe. This can - be found here: - - -> Device Drivers - -> Network device support (NETDEVICES [=y]) - -> Ethernet (10000 Mbit) (NETDEV_10000 [=y]) - -> Intel(R) 10GbE PCI Express adapters support - -> Data Center Bridging (DCB) Support - - After these options are selected, you must rebuild your kernel and your - modules. - - In order to use DCB, userspace tools must be downloaded and installed. - The dcbd tools can be found at: - - http://e1000.sf.net - - Ethtool - ------- - The driver utilizes the ethtool interface for driver configuration and - diagnostics, as well as displaying statistical information. The latest - ethtool version is required for this functionality. - - The latest release of ethtool can be found from - https://www.kernel.org/pub/software/network/ethtool/ - - FCoE - ---- - This release of the ixgbe driver contains new code to enable users to use - Fiber Channel over Ethernet (FCoE) and Data Center Bridging (DCB) - functionality that is supported by the 82598-based hardware. This code has - no default effect on the regular driver operation, and configuring DCB and - FCoE is outside the scope of this driver README. Refer to - http://www.open-fcoe.org/ for FCoE project information and contact - e1000-eedc@lists.sourceforge.net for DCB information. - - MAC and VLAN anti-spoofing feature - ---------------------------------- - When a malicious driver attempts to send a spoofed packet, it is dropped by - the hardware and not transmitted. An interrupt is sent to the PF driver - notifying it of the spoof attempt. - - When a spoofed packet is detected the PF driver will send the following - message to the system log (displayed by the "dmesg" command): - - Spoof event(s) detected on VF (n) - - Where n=the VF that attempted to do the spoofing. - - -Performance Tuning -================== - -An excellent article on performance tuning can be found at: - -http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Mark_Wagner.pdf - - -Known Issues -============ - - Enabling SR-IOV in a 32-bit or 64-bit Microsoft* Windows* Server 2008/R2 - Guest OS using Intel (R) 82576-based GbE or Intel (R) 82599-based 10GbE - controller under KVM - ------------------------------------------------------------------------ - KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM. This - includes traditional PCIe devices, as well as SR-IOV-capable devices using - Intel 82576-based and 82599-based controllers. - - While direct assignment of a PCIe device or an SR-IOV Virtual Function (VF) - to a Linux-based VM running 2.6.32 or later kernel works fine, there is a - known issue with Microsoft Windows Server 2008 VM that results in a "yellow - bang" error. This problem is within the KVM VMM itself, not the Intel driver, - or the SR-IOV logic of the VMM, but rather that KVM emulates an older CPU - model for the guests, and this older CPU model does not support MSI-X - interrupts, which is a requirement for Intel SR-IOV. - - If you wish to use the Intel 82576 or 82599-based controllers in SR-IOV mode - with KVM and a Microsoft Windows Server 2008 guest try the following - workaround. The workaround is to tell KVM to emulate a different model of CPU - when using qemu to create the KVM guest: - - "-cpu qemu64,model=13" - - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://e1000.sourceforge.net - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/ixgbevf.rst b/Documentation/networking/ixgbevf.rst new file mode 100644 index 000000000000..56cde6366c2f --- /dev/null +++ b/Documentation/networking/ixgbevf.rst @@ -0,0 +1,66 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Linux* Base Virtual Function Driver for Intel(R) 10G Ethernet +============================================================= + +Intel 10 Gigabit Virtual Function Linux driver. +Copyright(c) 1999-2018 Intel Corporation. + +Contents +======== + +- Identifying Your Adapter +- Known Issues +- Support + +This driver supports 82599, X540, X550, and X552-based virtual function devices +that can only be activated on kernels that support SR-IOV. + +For questions related to hardware requirements, refer to the documentation +supplied with your Intel adapter. All hardware requirements listed apply to use +with Linux. + + +Identifying Your Adapter +======================== +The driver is compatible with devices based on the following: + + * Intel(R) Ethernet Controller 82598 + * Intel(R) Ethernet Controller 82599 + * Intel(R) Ethernet Controller X520 + * Intel(R) Ethernet Controller X540 + * Intel(R) Ethernet Controller x550 + * Intel(R) Ethernet Controller X552 + * Intel(R) Ethernet Controller X553 + +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +https://www.intel.com/support + +Known Issues/Troubleshooting +============================ + +SR-IOV requires the correct platform and OS support. + +The guest OS loading this driver must support MSI-X interrupts. + +This driver is only supported as a loadable module at this time. Intel is not +supplying patches against the kernel source to allow for static linking of the +drivers. + +VLANs: There is a limit of a total of 64 shared VLANs to 1 or more VFs. + + +Support +======= +For general information, go to the Intel support website at: + +https://www.intel.com/support/ + +or the Intel Wired Networking project hosted by Sourceforge at: + +https://sourceforge.net/projects/e1000 + +If an issue is identified with the released source code on a supported kernel +with a supported adapter, email the specific information related to the issue +to e1000-devel@lists.sf.net. diff --git a/Documentation/networking/ixgbevf.txt b/Documentation/networking/ixgbevf.txt deleted file mode 100644 index 53d8d2a5a6a3..000000000000 --- a/Documentation/networking/ixgbevf.txt +++ /dev/null @@ -1,52 +0,0 @@ -Linux* Base Driver for Intel(R) Ethernet Network Connection -=========================================================== - -Intel Gigabit Linux driver. -Copyright(c) 1999 - 2013 Intel Corporation. - -Contents -======== - -- Identifying Your Adapter -- Known Issues/Troubleshooting -- Support - -This file describes the ixgbevf Linux* Base Driver for Intel Network -Connection. - -The ixgbevf driver supports 82599-based virtual function devices that can only -be activated on kernels with CONFIG_PCI_IOV enabled. - -The ixgbevf driver supports virtual functions generated by the ixgbe driver -with a max_vfs value of 1 or greater. - -The guest OS loading the ixgbevf driver must support MSI-X interrupts. - -VLANs: There is a limit of a total of 32 shared VLANs to 1 or more VFs. - -Identifying Your Adapter -======================== - -For more information on how to identify your adapter, go to the Adapter & -Driver ID Guide at: - - http://support.intel.com/support/go/network/adapter/idguide.htm - -Known Issues/Troubleshooting -============================ - - -Support -======= - -For general information, go to the Intel support website at: - - http://support.intel.com - -or the Intel Wired Networking project hosted by Sourceforge at: - - http://sourceforge.net/projects/e1000 - -If an issue is identified with the released source code on the supported -kernel with a supported adapter, email the specific information related -to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/netvsc.txt b/Documentation/networking/netvsc.txt index 92f5b31392fa..3bfa635bbbd5 100644 --- a/Documentation/networking/netvsc.txt +++ b/Documentation/networking/netvsc.txt @@ -45,6 +45,15 @@ Features like packets and significantly reduces CPU usage under heavy Rx load. + Large Receive Offload (LRO), or Receive Side Coalescing (RSC) + ------------------------------------------------------------- + The driver supports LRO/RSC in the vSwitch feature. It reduces the per packet + processing overhead by coalescing multiple TCP segments when possible. The + feature is enabled by default on VMs running on Windows Server 2019 and + later. It may be changed by ethtool command: + ethtool -K eth0 lro on + ethtool -K eth0 lro off + SR-IOV support -------------- Hyper-V supports SR-IOV as a hardware acceleration option. If SR-IOV diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt index b5407163d53b..605e00cdd6be 100644 --- a/Documentation/networking/rxrpc.txt +++ b/Documentation/networking/rxrpc.txt @@ -1069,6 +1069,31 @@ The kernel interface functions are as follows: This function may transmit a PING ACK. + (*) Get reply timestamp. + + bool rxrpc_kernel_get_reply_time(struct socket *sock, + struct rxrpc_call *call, + ktime_t *_ts) + + This allows the timestamp on the first DATA packet of the reply of a + client call to be queried, provided that it is still in the Rx ring. If + successful, the timestamp will be stored into *_ts and true will be + returned; false will be returned otherwise. + + (*) Get remote client epoch. + + u32 rxrpc_kernel_get_epoch(struct socket *sock, + struct rxrpc_call *call) + + This allows the epoch that's contained in packets of an incoming client + call to be queried. This value is returned. The function always + successful if the call is still in progress. It shouldn't be called once + the call has expired. Note that calling this on a local client call only + returns the local epoch. + + This value can be used to determine if the remote client has been + restarted as it shouldn't change otherwise. + ======================= CONFIGURABLE PARAMETERS diff --git a/Documentation/networking/tcp.txt b/Documentation/networking/tcp.txt deleted file mode 100644 index 9c7139d57e57..000000000000 --- a/Documentation/networking/tcp.txt +++ /dev/null @@ -1,101 +0,0 @@ -TCP protocol -============ - -Last updated: 3 June 2017 - -Contents -======== - -- Congestion control -- How the new TCP output machine [nyi] works - -Congestion control -================== - -The following variables are used in the tcp_sock for congestion control: -snd_cwnd The size of the congestion window -snd_ssthresh Slow start threshold. We are in slow start if - snd_cwnd is less than this. -snd_cwnd_cnt A counter used to slow down the rate of increase - once we exceed slow start threshold. -snd_cwnd_clamp This is the maximum size that snd_cwnd can grow to. -snd_cwnd_stamp Timestamp for when congestion window last validated. -snd_cwnd_used Used as a highwater mark for how much of the - congestion window is in use. It is used to adjust - snd_cwnd down when the link is limited by the - application rather than the network. - -As of 2.6.13, Linux supports pluggable congestion control algorithms. -A congestion control mechanism can be registered through functions in -tcp_cong.c. The functions used by the congestion control mechanism are -registered via passing a tcp_congestion_ops struct to -tcp_register_congestion_control. As a minimum, the congestion control -mechanism must provide a valid name and must implement either ssthresh, -cong_avoid and undo_cwnd hooks or the "omnipotent" cong_control hook. - -Private data for a congestion control mechanism is stored in tp->ca_priv. -tcp_ca(tp) returns a pointer to this space. This is preallocated space - it -is important to check the size of your private data will fit this space, or -alternatively, space could be allocated elsewhere and a pointer to it could -be stored here. - -There are three kinds of congestion control algorithms currently: The -simplest ones are derived from TCP reno (highspeed, scalable) and just -provide an alternative congestion window calculation. More complex -ones like BIC try to look at other events to provide better -heuristics. There are also round trip time based algorithms like -Vegas and Westwood+. - -Good TCP congestion control is a complex problem because the algorithm -needs to maintain fairness and performance. Please review current -research and RFC's before developing new modules. - -The default congestion control mechanism is chosen based on the -DEFAULT_TCP_CONG Kconfig parameter. If you really want a particular default -value then you can set it using sysctl net.ipv4.tcp_congestion_control. The -module will be autoloaded if needed and you will get the expected protocol. If -you ask for an unknown congestion method, then the sysctl attempt will fail. - -If you remove a TCP congestion control module, then you will get the next -available one. Since reno cannot be built as a module, and cannot be -removed, it will always be available. - -How the new TCP output machine [nyi] works. -=========================================== - -Data is kept on a single queue. The skb->users flag tells us if the frame is -one that has been queued already. To add a frame we throw it on the end. Ack -walks down the list from the start. - -We keep a set of control flags - - - sk->tcp_pend_event - - TCP_PEND_ACK Ack needed - TCP_ACK_NOW Needed now - TCP_WINDOW Window update check - TCP_WINZERO Zero probing - - - sk->transmit_queue The transmission frame begin - sk->transmit_new First new frame pointer - sk->transmit_end Where to add frames - - sk->tcp_last_tx_ack Last ack seen - sk->tcp_dup_ack Dup ack count for fast retransmit - - -Frames are queued for output by tcp_write. We do our best to send the frames -off immediately if possible, but otherwise queue and compute the body -checksum in the copy. - -When a write is done we try to clear any pending events and piggy back them. -If the window is full we queue full sized frames. On the first timeout in -zero window we split this. - -On a timer we walk the retransmit list to send any retransmits, update the -backoff timers etc. A change of route table stamp causes a change of header -and recompute. We add any new tcp level headers and refinish the checksum -before sending. - diff --git a/Documentation/networking/xfrm_device.txt b/Documentation/networking/xfrm_device.txt index 50c34ca65efe..267f55b5f54a 100644 --- a/Documentation/networking/xfrm_device.txt +++ b/Documentation/networking/xfrm_device.txt @@ -68,6 +68,10 @@ and an indication of whether it is for Rx or Tx. The driver should - verify the algorithm is supported for offloads - store the SA information (key, salt, target-ip, protocol, etc) - enable the HW offload of the SA + - return status value: + 0 success + -EOPNETSUPP offload not supported, try SW IPsec + other fail the request The driver can also set an offload_handle in the SA, an opaque void pointer that can be used to convey context into the fast-path offload requests. diff --git a/Documentation/parisc/00-INDEX b/Documentation/parisc/00-INDEX deleted file mode 100644 index cbd060961f43..000000000000 --- a/Documentation/parisc/00-INDEX +++ /dev/null @@ -1,6 +0,0 @@ -00-INDEX - - this file. -debugging - - some debugging hints for real-mode code -registers - - current/planned usage of registers diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX deleted file mode 100644 index 7f3c2def2cac..000000000000 --- a/Documentation/power/00-INDEX +++ /dev/null @@ -1,44 +0,0 @@ -00-INDEX - - This file -apm-acpi.txt - - basic info about the APM and ACPI support. -basic-pm-debugging.txt - - Debugging suspend and resume -charger-manager.txt - - Battery charger management. -admin-guide/devices.rst - - How drivers interact with system-wide power management -drivers-testing.txt - - Testing suspend and resume support in device drivers -freezing-of-tasks.txt - - How processes and controlled during suspend -interface.txt - - Power management user interface in /sys/power -opp.txt - - Operating Performance Point library -pci.txt - - How the PCI Subsystem Does Power Management -pm_qos_interface.txt - - info on Linux PM Quality of Service interface -power_supply_class.txt - - Tells userspace about battery, UPS, AC or DC power supply properties -runtime_pm.txt - - Power management framework for I/O devices. -s2ram.txt - - How to get suspend to ram working (and debug it when it isn't) -states.txt - - System power management states -suspend-and-cpuhotplug.txt - - Explains the interaction between Suspend-to-RAM (S3) and CPU hotplug -swsusp-and-swap-files.txt - - Using swap files with software suspend (to disk) -swsusp-dmcrypt.txt - - How to use dm-crypt and software suspend (to disk) together -swsusp.txt - - Goals, implementation, and usage of software suspend (ACPI S3) -tricks.txt - - How to trick software suspend (to disk) into working when it isn't -userland-swsusp.txt - - Experimental implementation of software suspend in userspace -video.txt - - Video issues during resume from suspend diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt index cc87adf44c0a..236d1fb13640 100644 --- a/Documentation/power/swsusp.txt +++ b/Documentation/power/swsusp.txt @@ -56,7 +56,7 @@ If you want to limit the suspend image size to N bytes, do echo N > /sys/power/image_size -before suspend (it is limited to 500 MB by default). +before suspend (it is limited to around 2/5 of available RAM by default). . The resume process checks for the presence of the resume device, if found, it then checks the contents for the hibernation image signature. diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX deleted file mode 100644 index 9dc845cf7d88..000000000000 --- a/Documentation/powerpc/00-INDEX +++ /dev/null @@ -1,34 +0,0 @@ -Index of files in Documentation/powerpc. If you think something about -Linux/PPC needs an entry here, needs correction or you've written one -please mail me. - Cort Dougan (cort@fsmlabs.com) - -00-INDEX - - this file -bootwrapper.txt - - Information on how the powerpc kernel is wrapped for boot on various - different platforms. -cpu_features.txt - - info on how we support a variety of CPUs with minimal compile-time - options. -cxl.txt - - Overview of the CXL driver. -eeh-pci-error-recovery.txt - - info on PCI Bus EEH Error Recovery -firmware-assisted-dump.txt - - Documentation on the firmware assisted dump mechanism "fadump". -hvcs.txt - - IBM "Hypervisor Virtual Console Server" Installation Guide -mpc52xx.txt - - Linux 2.6.x on MPC52xx family -pmu-ebb.txt - - Description of the API for using the PMU with Event Based Branches. -qe_firmware.txt - - describes the layout of firmware binaries for the Freescale QUICC - Engine and the code that parses and uploads the microcode therein. -ptrace.txt - - Information on the ptrace interfaces for hardware debug registers. -transactional_memory.txt - - Overview of the Power8 transactional memory support. -dscr.txt - - Overview DSCR (Data Stream Control Register) support. diff --git a/Documentation/preempt-locking.txt b/Documentation/preempt-locking.txt index c945062be66c..509f5a422d57 100644 --- a/Documentation/preempt-locking.txt +++ b/Documentation/preempt-locking.txt @@ -3,7 +3,6 @@ Proper Locking Under a Preemptible Kernel: Keeping Kernel Code Preempt-Safe =========================================================================== :Author: Robert Love <rml@tech9.net> -:Last Updated: 28 Aug 2002 Introduction @@ -92,11 +91,12 @@ any locks or interrupts are disabled, since preemption is implicitly disabled in those cases. But keep in mind that 'irqs disabled' is a fundamentally unsafe way of -disabling preemption - any spin_unlock() decreasing the preemption count -to 0 might trigger a reschedule. A simple printk() might trigger a reschedule. -So use this implicit preemption-disabling property only if you know that the -affected codepath does not do any of this. Best policy is to use this only for -small, atomic code that you wrote and which calls no complex functions. +disabling preemption - any cond_resched() or cond_resched_lock() might trigger +a reschedule if the preempt count is 0. A simple printk() might trigger a +reschedule. So use this implicit preemption-disabling property only if you +know that the affected codepath does not do any of this. Best policy is to use +this only for small, atomic code that you wrote and which calls no complex +functions. Example:: diff --git a/Documentation/process/2.Process.rst b/Documentation/process/2.Process.rst index 51d0349c7809..ae020d84d7c4 100644 --- a/Documentation/process/2.Process.rst +++ b/Documentation/process/2.Process.rst @@ -82,7 +82,7 @@ As an example, here is how the 4.16 development cycle went (all dates in March 11 4.16-rc5 March 18 4.16-rc6 March 25 4.16-rc7 - April 1 4.17 stable release + April 1 4.16 stable release ============== =============================== How do the developers decide when to close the development cycle and create diff --git a/Documentation/process/adding-syscalls.rst b/Documentation/process/adding-syscalls.rst index 0d4f29bc798b..88a7d5c8bb2f 100644 --- a/Documentation/process/adding-syscalls.rst +++ b/Documentation/process/adding-syscalls.rst @@ -232,7 +232,7 @@ normally be optional, so add a ``CONFIG`` option (typically to by the option. - Make the option depend on EXPERT if it should be hidden from normal users. - Make any new source files implementing the function dependent on the CONFIG - option in the Makefile (e.g. ``obj-$(CONFIG_XYZZY_SYSCALL) += xyzzy.c``). + option in the Makefile (e.g. ``obj-$(CONFIG_XYZZY_SYSCALL) += xyzzy.o``). - Double check that the kernel still builds with the new CONFIG option turned off. diff --git a/Documentation/process/code-of-conduct-interpretation.rst b/Documentation/process/code-of-conduct-interpretation.rst new file mode 100644 index 000000000000..e899f14a4ba2 --- /dev/null +++ b/Documentation/process/code-of-conduct-interpretation.rst @@ -0,0 +1,156 @@ +.. _code_of_conduct_interpretation: + +Linux Kernel Contributor Covenant Code of Conduct Interpretation +================================================================ + +The :ref:`code_of_conduct` is a general document meant to +provide a set of rules for almost any open source community. Every +open-source community is unique and the Linux kernel is no exception. +Because of this, this document describes how we in the Linux kernel +community will interpret it. We also do not expect this interpretation +to be static over time, and will adjust it as needed. + +The Linux kernel development effort is a very personal process compared +to "traditional" ways of developing software. Your contributions and +ideas behind them will be carefully reviewed, often resulting in +critique and criticism. The review will almost always require +improvements before the material can be included in the +kernel. Know that this happens because everyone involved wants to see +the best possible solution for the overall success of Linux. This +development process has been proven to create the most robust operating +system kernel ever, and we do not want to do anything to cause the +quality of submission and eventual result to ever decrease. + +Maintainers +----------- + +The Code of Conduct uses the term "maintainers" numerous times. In the +kernel community, a "maintainer" is anyone who is responsible for a +subsystem, driver, or file, and is listed in the MAINTAINERS file in the +kernel source tree. + +Responsibilities +---------------- + +The Code of Conduct mentions rights and responsibilities for +maintainers, and this needs some further clarifications. + +First and foremost, it is a reasonable expectation to have maintainers +lead by example. + +That being said, our community is vast and broad, and there is no new +requirement for maintainers to unilaterally handle how other people +behave in the parts of the community where they are active. That +responsibility is upon all of us, and ultimately the Code of Conduct +documents final escalation paths in case of unresolved concerns +regarding conduct issues. + +Maintainers should be willing to help when problems occur, and work with +others in the community when needed. Do not be afraid to reach out to +the Technical Advisory Board (TAB) or other maintainers if you're +uncertain how to handle situations that come up. It will not be +considered a violation report unless you want it to be. If you are +uncertain about approaching the TAB or any other maintainers, please +reach out to our conflict mediator, Mishi Choudhary <mishi@linux.com>. + +In the end, "be kind to each other" is really what the end goal is for +everybody. We know everyone is human and we all fail at times, but the +primary goal for all of us should be to work toward amicable resolutions +of problems. Enforcement of the code of conduct will only be a last +resort option. + +Our goal of creating a robust and technically advanced operating system +and the technical complexity involved naturally require expertise and +decision-making. + +The required expertise varies depending on the area of contribution. It +is determined mainly by context and technical complexity and only +secondary by the expectations of contributors and maintainers. + +Both the expertise expectations and decision-making are subject to +discussion, but at the very end there is a basic necessity to be able to +make decisions in order to make progress. This prerogative is in the +hands of maintainers and project's leadership and is expected to be used +in good faith. + +As a consequence, setting expertise expectations, making decisions and +rejecting unsuitable contributions are not viewed as a violation of the +Code of Conduct. + +While maintainers are in general welcoming to newcomers, their capacity +of helping contributors overcome the entry hurdles is limited, so they +have to set priorities. This, also, is not to be seen as a violation of +the Code of Conduct. The kernel community is aware of that and provides +entry level programs in various forms like kernelnewbies.org. + +Scope +----- + +The Linux kernel community primarily interacts on a set of public email +lists distributed around a number of different servers controlled by a +number of different companies or individuals. All of these lists are +defined in the MAINTAINERS file in the kernel source tree. Any emails +sent to those mailing lists are considered covered by the Code of +Conduct. + +Developers who use the kernel.org bugzilla, and other subsystem bugzilla +or bug tracking tools should follow the guidelines of the Code of +Conduct. The Linux kernel community does not have an "official" project +email address, or "official" social media address. Any activity +performed using a kernel.org email account must follow the Code of +Conduct as published for kernel.org, just as any individual using a +corporate email account must follow the specific rules of that +corporation. + +The Code of Conduct does not prohibit continuing to include names, email +addresses, and associated comments in mailing list messages, kernel +change log messages, or code comments. + +Interaction in other forums is covered by whatever rules apply to said +forums and is in general not covered by the Code of Conduct. Exceptions +may be considered for extreme circumstances. + +Contributions submitted for the kernel should use appropriate language. +Content that already exists predating the Code of Conduct will not be +addressed now as a violation. Inappropriate language can be seen as a +bug, though; such bugs will be fixed more quickly if any interested +parties submit patches to that effect. Expressions that are currently +part of the user/kernel API, or reflect terminology used in published +standards or specifications, are not considered bugs. + +Enforcement +----------- + +The address listed in the Code of Conduct goes to the Code of Conduct +Committee. The exact members receiving these emails at any given time +are listed at https://kernel.org/code-of-conduct.html. Members can not +access reports made before they joined or after they have left the +committee. + +The initial Code of Conduct Committee consists of volunteer members of +the TAB, as well as a professional mediator acting as a neutral third +party. The first task of the committee is to establish documented +processes, which will be made public. + +Any member of the committee, including the mediator, can be contacted +directly if a reporter does not wish to include the full committee in a +complaint or concern. + +The Code of Conduct Committee reviews the cases according to the +processes (see above) and consults with the TAB as needed and +appropriate, for instance to request and receive information about the +kernel community. + +Any decisions by the committee will be brought to the TAB, for +implementation of enforcement with the relevant maintainers if needed. +A decision by the Code of Conduct Committee can be overturned by the TAB +by a two-thirds vote. + +At quarterly intervals, the Code of Conduct Committee and TAB will +provide a report summarizing the anonymised reports that the Code of +Conduct committee has received and their status, as well details of any +overridden decisions including complete and identifiable voting details. + +We expect to establish a different process for Code of Conduct Committee +staffing beyond the bootstrap period. This document will be updated +with that information when this occurs. diff --git a/Documentation/process/code-of-conduct.rst b/Documentation/process/code-of-conduct.rst index ab7c24b5478c..be50294aebd5 100644 --- a/Documentation/process/code-of-conduct.rst +++ b/Documentation/process/code-of-conduct.rst @@ -1,3 +1,5 @@ +.. _code_of_conduct: + Contributor Covenant Code of Conduct ++++++++++++++++++++++++++++++++++++ @@ -63,19 +65,22 @@ Enforcement =========== Instances of abusive, harassing, or otherwise unacceptable behavior may be -reported by contacting the Technical Advisory Board (TAB) at -<tab@lists.linux-foundation.org>. All complaints will be reviewed and -investigated and will result in a response that is deemed necessary and -appropriate to the circumstances. The TAB is obligated to maintain -confidentiality with regard to the reporter of an incident. Further details of -specific enforcement policies may be posted separately. - -Maintainers who do not follow or enforce the Code of Conduct in good faith may -face temporary or permanent repercussions as determined by other members of the -project’s leadership. +reported by contacting the Code of Conduct Committee at +<conduct@kernel.org>. All complaints will be reviewed and investigated +and will result in a response that is deemed necessary and appropriate +to the circumstances. The Code of Conduct Committee is obligated to +maintain confidentiality with regard to the reporter of an incident. +Further details of specific enforcement policies may be posted +separately. Attribution =========== This Code of Conduct is adapted from the Contributor Covenant, version 1.4, available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html + +Interpretation +============== + +See the :ref:`code_of_conduct_interpretation` document for how the Linux +kernel community will be interpreting this document. diff --git a/Documentation/process/deprecated.rst b/Documentation/process/deprecated.rst new file mode 100644 index 000000000000..0ef5a63c06ba --- /dev/null +++ b/Documentation/process/deprecated.rst @@ -0,0 +1,119 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================================================================== +Deprecated Interfaces, Language Features, Attributes, and Conventions +===================================================================== + +In a perfect world, it would be possible to convert all instances of +some deprecated API into the new API and entirely remove the old API in +a single development cycle. However, due to the size of the kernel, the +maintainership hierarchy, and timing, it's not always feasible to do these +kinds of conversions at once. This means that new instances may sneak into +the kernel while old ones are being removed, only making the amount of +work to remove the API grow. In order to educate developers about what +has been deprecated and why, this list has been created as a place to +point when uses of deprecated things are proposed for inclusion in the +kernel. + +__deprecated +------------ +While this attribute does visually mark an interface as deprecated, +it `does not produce warnings during builds any more +<https://git.kernel.org/linus/771c035372a036f83353eef46dbb829780330234>`_ +because one of the standing goals of the kernel is to build without +warnings and no one was actually doing anything to remove these deprecated +interfaces. While using `__deprecated` is nice to note an old API in +a header file, it isn't the full solution. Such interfaces must either +be fully removed from the kernel, or added to this file to discourage +others from using them in the future. + +open-coded arithmetic in allocator arguments +-------------------------------------------- +Dynamic size calculations (especially multiplication) should not be +performed in memory allocator (or similar) function arguments due to the +risk of them overflowing. This could lead to values wrapping around and a +smaller allocation being made than the caller was expecting. Using those +allocations could lead to linear overflows of heap memory and other +misbehaviors. (One exception to this is literal values where the compiler +can warn if they might overflow. Though using literals for arguments as +suggested below is also harmless.) + +For example, do not use ``count * size`` as an argument, as in:: + + foo = kmalloc(count * size, GFP_KERNEL); + +Instead, the 2-factor form of the allocator should be used:: + + foo = kmalloc_array(count, size, GFP_KERNEL); + +If no 2-factor form is available, the saturate-on-overflow helpers should +be used:: + + bar = vmalloc(array_size(count, size)); + +Another common case to avoid is calculating the size of a structure with +a trailing array of others structures, as in:: + + header = kzalloc(sizeof(*header) + count * sizeof(*header->item), + GFP_KERNEL); + +Instead, use the helper:: + + header = kzalloc(struct_size(header, item, count), GFP_KERNEL); + +See :c:func:`array_size`, :c:func:`array3_size`, and :c:func:`struct_size`, +for more details as well as the related :c:func:`check_add_overflow` and +:c:func:`check_mul_overflow` family of functions. + +simple_strtol(), simple_strtoll(), simple_strtoul(), simple_strtoull() +---------------------------------------------------------------------- +The :c:func:`simple_strtol`, :c:func:`simple_strtoll`, +:c:func:`simple_strtoul`, and :c:func:`simple_strtoull` functions +explicitly ignore overflows, which may lead to unexpected results +in callers. The respective :c:func:`kstrtol`, :c:func:`kstrtoll`, +:c:func:`kstrtoul`, and :c:func:`kstrtoull` functions tend to be the +correct replacements, though note that those require the string to be +NUL or newline terminated. + +strcpy() +-------- +:c:func:`strcpy` performs no bounds checking on the destination +buffer. This could result in linear overflows beyond the +end of the buffer, leading to all kinds of misbehaviors. While +`CONFIG_FORTIFY_SOURCE=y` and various compiler flags help reduce the +risk of using this function, there is no good reason to add new uses of +this function. The safe replacement is :c:func:`strscpy`. + +strncpy() on NUL-terminated strings +----------------------------------- +Use of :c:func:`strncpy` does not guarantee that the destination buffer +will be NUL terminated. This can lead to various linear read overflows +and other misbehavior due to the missing termination. It also NUL-pads the +destination buffer if the source contents are shorter than the destination +buffer size, which may be a needless performance penalty for callers using +only NUL-terminated strings. The safe replacement is :c:func:`strscpy`. +(Users of :c:func:`strscpy` still needing NUL-padding will need an +explicit :c:func:`memset` added.) + +If a caller is using non-NUL-terminated strings, :c:func:`strncpy()` can +still be used, but destinations should be marked with the `__nonstring +<https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html>`_ +attribute to avoid future compiler warnings. + +strlcpy() +--------- +:c:func:`strlcpy` reads the entire source buffer first, possibly exceeding +the given limit of bytes to copy. This is inefficient and can lead to +linear read overflows if a source string is not NUL-terminated. The +safe replacement is :c:func:`strscpy`. + +Variable Length Arrays (VLAs) +----------------------------- +Using stack VLAs produces much worse machine code than statically +sized stack arrays. While these non-trivial `performance issues +<https://git.kernel.org/linus/02361bc77888>`_ are reason enough to +eliminate VLAs, they are also a security risk. Dynamic growth of a stack +array may exceed the remaining memory in the stack segment. This could +lead to a crash, possible overwriting sensitive contents at the end of the +stack (when built without `CONFIG_THREAD_INFO_IN_TASK=y`), or overwriting +memory adjacent to the stack (when built without `CONFIG_VMAP_STACK=y`) diff --git a/Documentation/process/howto.rst b/Documentation/process/howto.rst index 130bf0f48875..dcb25f94188e 100644 --- a/Documentation/process/howto.rst +++ b/Documentation/process/howto.rst @@ -57,12 +57,13 @@ of doing things. Legal Issues ------------ -The Linux kernel source code is released under the GPL. Please see the -file, COPYING, in the main directory of the source tree, for details on -the license. If you have further questions about the license, please -contact a lawyer, and do not ask on the Linux kernel mailing list. The -people on the mailing lists are not lawyers, and you should not rely on -their statements on legal matters. +The Linux kernel source code is released under the GPL. Please see the file +COPYING in the main directory of the source tree. The Linux kernel licensing +rules and how to use `SPDX <https://spdx.org/>`_ identifiers in source code are +descibed in :ref:`Documentation/process/license-rules.rst <kernel_licensing>`. +If you have further questions about the license, please contact a lawyer, and do +not ask on the Linux kernel mailing list. The people on the mailing lists are +not lawyers, and you should not rely on their statements on legal matters. For common questions and answers about the GPL, please see: diff --git a/Documentation/process/index.rst b/Documentation/process/index.rst index 9ae3e317bddf..757808526d9a 100644 --- a/Documentation/process/index.rst +++ b/Documentation/process/index.rst @@ -19,8 +19,10 @@ Below are the essential guides that every developer should read. .. toctree:: :maxdepth: 1 + license-rules howto code-of-conduct + code-of-conduct-interpretation development-process submitting-patches coding-style @@ -41,6 +43,7 @@ Other guides to the community that are of interest to most developers are: stable-kernel-rules submit-checklist kernel-docs + deprecated These are some overall technical guides that have been put here for now for lack of a better place. diff --git a/Documentation/process/license-rules.rst b/Documentation/process/license-rules.rst index 8ea26325fe3f..2bb8c0fc2238 100644 --- a/Documentation/process/license-rules.rst +++ b/Documentation/process/license-rules.rst @@ -1,5 +1,7 @@ .. SPDX-License-Identifier: GPL-2.0 +.. _kernel_licensing: + Linux kernel licensing rules ============================ diff --git a/Documentation/s390/00-INDEX b/Documentation/s390/00-INDEX deleted file mode 100644 index 317f0378ae01..000000000000 --- a/Documentation/s390/00-INDEX +++ /dev/null @@ -1,28 +0,0 @@ -00-INDEX - - this file. -3270.ChangeLog - - ChangeLog for the UTS Global 3270-support patch (outdated). -3270.txt - - how to use the IBM 3270 display system support. -cds.txt - - s390 common device support (common I/O layer). -CommonIO - - common I/O layer command line parameters, procfs and debugfs entries -config3270.sh - - example configuration for 3270 devices. -DASD - - information on the DASD disk device driver. -Debugging390.txt - - hints for debugging on s390 systems. -driver-model.txt - - information on s390 devices and the driver model. -monreader.txt - - information on accessing the z/VM monitor stream from Linux. -qeth.txt - - HiperSockets Bridge Port Support. -s390dbf.txt - - information on using the s390 debug feature. -vfio-ccw.txt - information on the vfio-ccw I/O subchannel driver. -zfcpdump.txt - - information on the s390 SCSI dump tool. diff --git a/Documentation/scheduler/00-INDEX b/Documentation/scheduler/00-INDEX deleted file mode 100644 index eccf7ad2e7f9..000000000000 --- a/Documentation/scheduler/00-INDEX +++ /dev/null @@ -1,18 +0,0 @@ -00-INDEX - - this file. -sched-arch.txt - - CPU Scheduler implementation hints for architecture specific code. -sched-bwc.txt - - CFS bandwidth control overview. -sched-design-CFS.txt - - goals, design and implementation of the Completely Fair Scheduler. -sched-domains.txt - - information on scheduling domains. -sched-nice-design.txt - - How and why the scheduler's nice levels are implemented. -sched-rt-group.txt - - real-time group scheduling. -sched-deadline.txt - - deadline scheduling. -sched-stats.txt - - information on schedstats (Linux Scheduler Statistics). diff --git a/Documentation/scheduler/completion.txt b/Documentation/scheduler/completion.txt index 656cf803c006..e5b9df4d8078 100644 --- a/Documentation/scheduler/completion.txt +++ b/Documentation/scheduler/completion.txt @@ -1,146 +1,187 @@ -completions - wait for completion handling -========================================== - -This document was originally written based on 3.18.0 (linux-next) +Completions - "wait for completion" barrier APIs +================================================ Introduction: ------------- -If you have one or more threads of execution that must wait for some process +If you have one or more threads that must wait for some kernel activity to have reached a point or a specific state, completions can provide a race-free solution to this problem. Semantically they are somewhat like a -pthread_barrier and have similar use-cases. +pthread_barrier() and have similar use-cases. Completions are a code synchronization mechanism which is preferable to any -misuse of locks. Any time you think of using yield() or some quirky -msleep(1) loop to allow something else to proceed, you probably want to -look into using one of the wait_for_completion*() calls instead. The -advantage of using completions is clear intent of the code, but also more -efficient code as both threads can continue until the result is actually -needed. - -Completions are built on top of the generic event infrastructure in Linux, -with the event reduced to a simple flag (appropriately called "done") in -struct completion that tells the waiting threads of execution if they -can continue safely. - -As completions are scheduling related, the code is found in +misuse of locks/semaphores and busy-loops. Any time you think of using +yield() or some quirky msleep(1) loop to allow something else to proceed, +you probably want to look into using one of the wait_for_completion*() +calls and complete() instead. + +The advantage of using completions is that they have a well defined, focused +purpose which makes it very easy to see the intent of the code, but they +also result in more efficient code as all threads can continue execution +until the result is actually needed, and both the waiting and the signalling +is highly efficient using low level scheduler sleep/wakeup facilities. + +Completions are built on top of the waitqueue and wakeup infrastructure of +the Linux scheduler. The event the threads on the waitqueue are waiting for +is reduced to a simple flag in 'struct completion', appropriately called "done". + +As completions are scheduling related, the code can be found in kernel/sched/completion.c. Usage: ------ -There are three parts to using completions, the initialization of the -struct completion, the waiting part through a call to one of the variants of -wait_for_completion() and the signaling side through a call to complete() -or complete_all(). Further there are some helper functions for checking the -state of completions. +There are three main parts to using completions: + + - the initialization of the 'struct completion' synchronization object + - the waiting part through a call to one of the variants of wait_for_completion(), + - the signaling side through a call to complete() or complete_all(). + +There are also some helper functions for checking the state of completions. +Note that while initialization must happen first, the waiting and signaling +part can happen in any order. I.e. it's entirely normal for a thread +to have marked a completion as 'done' before another thread checks whether +it has to wait for it. -To use completions one needs to include <linux/completion.h> and -create a variable of type struct completion. The structure used for -handling of completions is: +To use completions you need to #include <linux/completion.h> and +create a static or dynamic variable of type 'struct completion', +which has only two fields: struct completion { unsigned int done; wait_queue_head_t wait; }; -providing the wait queue to place tasks on for waiting and the flag for -indicating the state of affairs. +This provides the ->wait waitqueue to place tasks on for waiting (if any), and +the ->done completion flag for indicating whether it's completed or not. -Completions should be named to convey the intent of the waiter. A good -example is: +Completions should be named to refer to the event that is being synchronized on. +A good example is: wait_for_completion(&early_console_added); complete(&early_console_added); -Good naming (as always) helps code readability. +Good, intuitive naming (as always) helps code readability. Naming a completion +'complete' is not helpful unless the purpose is super obvious... Initializing completions: ------------------------- -Initialization of dynamically allocated completions, often embedded in -other structures, is done with: +Dynamically allocated completion objects should preferably be embedded in data +structures that are assured to be alive for the life-time of the function/driver, +to prevent races with asynchronous complete() calls from occurring. + +Particular care should be taken when using the _timeout() or _killable()/_interruptible() +variants of wait_for_completion(), as it must be assured that memory de-allocation +does not happen until all related activities (complete() or reinit_completion()) +have taken place, even if these wait functions return prematurely due to a timeout +or a signal triggering. + +Initializing of dynamically allocated completion objects is done via a call to +init_completion(): - void init_completion(&done); + init_completion(&dynamic_object->done); -Initialization is accomplished by initializing the wait queue and setting -the default state to "not available", that is, "done" is set to 0. +In this call we initialize the waitqueue and set ->done to 0, i.e. "not completed" +or "not done". The re-initialization function, reinit_completion(), simply resets the -done element to "not available", thus again to 0, without touching the -wait queue. Calling init_completion() twice on the same completion object is +->done field to 0 ("not done"), without touching the waitqueue. +Callers of this function must make sure that there are no racy +wait_for_completion() calls going on in parallel. + +Calling init_completion() on the same completion object twice is most likely a bug as it re-initializes the queue to an empty queue and -enqueued tasks could get "lost" - use reinit_completion() in that case. +enqueued tasks could get "lost" - use reinit_completion() in that case, +but be aware of other races. + +For static declaration and initialization, macros are available. + +For static (or global) declarations in file scope you can use DECLARE_COMPLETION(): -For static declaration and initialization, macros are available. These are: + static DECLARE_COMPLETION(setup_done); + DECLARE_COMPLETION(setup_done); - static DECLARE_COMPLETION(setup_done) +Note that in this case the completion is boot time (or module load time) +initialized to 'not done' and doesn't require an init_completion() call. -used for static declarations in file scope. Within functions the static -initialization should always use: +When a completion is declared as a local variable within a function, +then the initialization should always use DECLARE_COMPLETION_ONSTACK() +explicitly, not just to make lockdep happy, but also to make it clear +that limited scope had been considered and is intentional: DECLARE_COMPLETION_ONSTACK(setup_done) -suitable for automatic/local variables on the stack and will make lockdep -happy. Note also that one needs to make *sure* the completion passed to -work threads remains in-scope, and no references remain to on-stack data -when the initiating function returns. +Note that when using completion objects as local variables you must be +acutely aware of the short life time of the function stack: the function +must not return to a calling context until all activities (such as waiting +threads) have ceased and the completion object is completely unused. -Using on-stack completions for code that calls any of the _timeout or -_interruptible/_killable variants is not advisable as they will require -additional synchronization to prevent the on-stack completion object in -the timeout/signal cases from going out of scope. Consider using dynamically -allocated completions when intending to use the _interruptible/_killable -or _timeout variants of wait_for_completion(). +To emphasise this again: in particular when using some of the waiting API variants +with more complex outcomes, such as the timeout or signalling (_timeout(), +_killable() and _interruptible()) variants, the wait might complete +prematurely while the object might still be in use by another thread - and a return +from the wait_on_completion*() caller function will deallocate the function +stack and cause subtle data corruption if a complete() is done in some +other thread. Simple testing might not trigger these kinds of races. +If unsure, use dynamically allocated completion objects, preferably embedded +in some other long lived object that has a boringly long life time which +exceeds the life time of any helper threads using the completion object, +or has a lock or other synchronization mechanism to make sure complete() +is not called on a freed object. + +A naive DECLARE_COMPLETION() on the stack triggers a lockdep warning. Waiting for completions: ------------------------ -For a thread of execution to wait for some concurrent work to finish, it -calls wait_for_completion() on the initialized completion structure. +For a thread to wait for some concurrent activity to finish, it +calls wait_for_completion() on the initialized completion structure: + + void wait_for_completion(struct completion *done) + A typical usage scenario is: + CPU#1 CPU#2 + struct completion setup_done; + init_completion(&setup_done); - initialize_work(...,&setup_done,...) + initialize_work(...,&setup_done,...); - /* run non-dependent code */ /* do setup */ + /* run non-dependent code */ /* do setup */ - wait_for_completion(&setup_done); complete(setup_done) + wait_for_completion(&setup_done); complete(setup_done); -This is not implying any temporal order on wait_for_completion() and the -call to complete() - if the call to complete() happened before the call +This is not implying any particular order between wait_for_completion() and +the call to complete() - if the call to complete() happened before the call to wait_for_completion() then the waiting side simply will continue -immediately as all dependencies are satisfied if not it will block until +immediately as all dependencies are satisfied; if not, it will block until completion is signaled by complete(). Note that wait_for_completion() is calling spin_lock_irq()/spin_unlock_irq(), so it can only be called safely when you know that interrupts are enabled. -Calling it from hard-irq or irqs-off atomic contexts will result in -hard-to-detect spurious enabling of interrupts. - -wait_for_completion(): - - void wait_for_completion(struct completion *done): +Calling it from IRQs-off atomic contexts will result in hard-to-detect +spurious enabling of interrupts. The default behavior is to wait without a timeout and to mark the task as uninterruptible. wait_for_completion() and its variants are only safe in process context (as they can sleep) but not in atomic context, -interrupt context, with disabled irqs. or preemption is disabled - see also +interrupt context, with disabled IRQs, or preemption is disabled - see also try_wait_for_completion() below for handling completion in atomic/interrupt context. As all variants of wait_for_completion() can (obviously) block for a long -time, you probably don't want to call this with held mutexes. +time depending on the nature of the activity they are waiting for, so in +most cases you probably don't want to call this with held mutexes. -Variants available: -------------------- +wait_for_completion*() variants available: +------------------------------------------ The below variants all return status and this status should be checked in most(/all) cases - in cases where the status is deliberately not checked you @@ -148,51 +189,53 @@ probably want to make a note explaining this (e.g. see arch/arm/kernel/smp.c:__cpu_up()). A common problem that occurs is to have unclean assignment of return types, -so care should be taken with assigning return-values to variables of proper -type. Checking for the specific meaning of return values also has been found -to be quite inaccurate e.g. constructs like -if (!wait_for_completion_interruptible_timeout(...)) would execute the same -code path for successful completion and for the interrupted case - which is -probably not what you want. +so take care to assign return-values to variables of the proper type. + +Checking for the specific meaning of return values also has been found +to be quite inaccurate, e.g. constructs like: + + if (!wait_for_completion_interruptible_timeout(...)) + +... would execute the same code path for successful completion and for the +interrupted case - which is probably not what you want. int wait_for_completion_interruptible(struct completion *done) -This function marks the task TASK_INTERRUPTIBLE. If a signal was received -while waiting it will return -ERESTARTSYS; 0 otherwise. +This function marks the task TASK_INTERRUPTIBLE while it is waiting. +If a signal was received while waiting it will return -ERESTARTSYS; 0 otherwise. - unsigned long wait_for_completion_timeout(struct completion *done, - unsigned long timeout) + unsigned long wait_for_completion_timeout(struct completion *done, unsigned long timeout) The task is marked as TASK_UNINTERRUPTIBLE and will wait at most 'timeout' -(in jiffies). If timeout occurs it returns 0 else the remaining time in -jiffies (but at least 1). Timeouts are preferably calculated with -msecs_to_jiffies() or usecs_to_jiffies(). If the returned timeout value is -deliberately ignored a comment should probably explain why (e.g. see -drivers/mfd/wm8350-core.c wm8350_read_auxadc()) +jiffies. If a timeout occurs it returns 0, else the remaining time in +jiffies (but at least 1). + +Timeouts are preferably calculated with msecs_to_jiffies() or usecs_to_jiffies(), +to make the code largely HZ-invariant. + +If the returned timeout value is deliberately ignored a comment should probably explain +why (e.g. see drivers/mfd/wm8350-core.c wm8350_read_auxadc()). - long wait_for_completion_interruptible_timeout( - struct completion *done, unsigned long timeout) + long wait_for_completion_interruptible_timeout(struct completion *done, unsigned long timeout) This function passes a timeout in jiffies and marks the task as TASK_INTERRUPTIBLE. If a signal was received it will return -ERESTARTSYS; -otherwise it returns 0 if the completion timed out or the remaining time in +otherwise it returns 0 if the completion timed out, or the remaining time in jiffies if completion occurred. Further variants include _killable which uses TASK_KILLABLE as the -designated tasks state and will return -ERESTARTSYS if it is interrupted or -else 0 if completion was achieved. There is a _timeout variant as well: +designated tasks state and will return -ERESTARTSYS if it is interrupted, +or 0 if completion was achieved. There is a _timeout variant as well: long wait_for_completion_killable(struct completion *done) - long wait_for_completion_killable_timeout(struct completion *done, - unsigned long timeout) + long wait_for_completion_killable_timeout(struct completion *done, unsigned long timeout) The _io variants wait_for_completion_io() behave the same as the non-_io -variants, except for accounting waiting time as waiting on IO, which has -an impact on how the task is accounted in scheduling stats. +variants, except for accounting waiting time as 'waiting on IO', which has +an impact on how the task is accounted in scheduling/IO stats: void wait_for_completion_io(struct completion *done) - unsigned long wait_for_completion_io_timeout(struct completion *done - unsigned long timeout) + unsigned long wait_for_completion_io_timeout(struct completion *done, unsigned long timeout) Signaling completions: @@ -200,31 +243,32 @@ Signaling completions: A thread that wants to signal that the conditions for continuation have been achieved calls complete() to signal exactly one of the waiters that it can -continue. +continue: void complete(struct completion *done) -or calls complete_all() to signal all current and future waiters. +... or calls complete_all() to signal all current and future waiters: void complete_all(struct completion *done) The signaling will work as expected even if completions are signaled before a thread starts waiting. This is achieved by the waiter "consuming" -(decrementing) the done element of struct completion. Waiting threads +(decrementing) the done field of 'struct completion'. Waiting threads wakeup order is the same in which they were enqueued (FIFO order). If complete() is called multiple times then this will allow for that number of waiters to continue - each call to complete() will simply increment the -done element. Calling complete_all() multiple times is a bug though. Both -complete() and complete_all() can be called in hard-irq/atomic context safely. +done field. Calling complete_all() multiple times is a bug though. Both +complete() and complete_all() can be called in IRQ/atomic context safely. -There only can be one thread calling complete() or complete_all() on a -particular struct completion at any time - serialized through the wait +There can only be one thread calling complete() or complete_all() on a +particular 'struct completion' at any time - serialized through the wait queue spinlock. Any such concurrent calls to complete() or complete_all() probably are a design bug. -Signaling completion from hard-irq context is fine as it will appropriately -lock with spin_lock_irqsave/spin_unlock_irqrestore and it will never sleep. +Signaling completion from IRQ context is fine as it will appropriately +lock with spin_lock_irqsave()/spin_unlock_irqrestore() and it will never +sleep. try_wait_for_completion()/completion_done(): @@ -236,7 +280,7 @@ else it consumes one posted completion and returns true. bool try_wait_for_completion(struct completion *done) -Finally, to check the state of a completion without changing it in any way, +Finally, to check the state of a completion without changing it in any way, call completion_done(), which returns false if there are no posted completions that were not yet consumed by waiters (implying that there are waiters) and true otherwise; @@ -244,4 +288,4 @@ waiters) and true otherwise; bool completion_done(struct completion *done) Both try_wait_for_completion() and completion_done() are safe to be called in -hard-irq or atomic context. +IRQ or atomic context. diff --git a/Documentation/scsi/00-INDEX b/Documentation/scsi/00-INDEX deleted file mode 100644 index bb4a76f823e1..000000000000 --- a/Documentation/scsi/00-INDEX +++ /dev/null @@ -1,108 +0,0 @@ -00-INDEX - - this file -53c700.txt - - info on driver for 53c700 based adapters -BusLogic.txt - - info on driver for adapters with BusLogic chips -ChangeLog.1992-1997 - - Changes to scsi files, if not listed elsewhere -ChangeLog.arcmsr - - Changes to driver for ARECA's SATA RAID controller cards -ChangeLog.ips - - IBM ServeRAID driver Changelog -ChangeLog.lpfc - - Changes to lpfc driver -ChangeLog.megaraid - - Changes to LSI megaraid controller. -ChangeLog.megaraid_sas - - Changes to serial attached scsi version of LSI megaraid controller. -ChangeLog.ncr53c8xx - - Changes to ncr53c8xx driver -ChangeLog.sym53c8xx - - Changes to sym53c8xx driver -ChangeLog.sym53c8xx_2 - - Changes to second generation of sym53c8xx driver -FlashPoint.txt - - info on driver for BusLogic FlashPoint adapters -LICENSE.FlashPoint - - Licence of the Flashpoint driver -LICENSE.qla2xxx - - License for QLogic Linux Fibre Channel HBA Driver firmware. -LICENSE.qla4xxx - - License for QLogic Linux iSCSI HBA Driver. -Mylex.txt - - info on driver for Mylex adapters -NinjaSCSI.txt - - info on WorkBiT NinjaSCSI-32/32Bi driver -aacraid.txt - - Driver supporting Adaptec RAID controllers -advansys.txt - - List of Advansys Host Adapters -aha152x.txt - - info on driver for Adaptec AHA152x based adapters -aic79xx.txt - - Adaptec Ultra320 SCSI host adapters -aic7xxx.txt - - info on driver for Adaptec controllers -arcmsr_spec.txt - - ARECA FIRMWARE SPEC (for IOP331 adapter) -bfa.txt - - Brocade FC/FCOE adapter driver. -bnx2fc.txt - - FCoE hardware offload for Broadcom network interfaces. -cxgb3i.txt - - Chelsio iSCSI Linux Driver -dc395x.txt - - README file for the dc395x SCSI driver -dpti.txt - - info on driver for DPT SmartRAID and Adaptec I2O RAID based adapters -dtc3x80.txt - - info on driver for DTC 2x80 based adapters -g_NCR5380.txt - - info on driver for NCR5380 and NCR53c400 based adapters -hpsa.txt - - HP Smart Array Controller SCSI driver. -hptiop.txt - - HIGHPOINT ROCKETRAID 3xxx RAID DRIVER -libsas.txt - - Serial Attached SCSI management layer. -link_power_management_policy.txt - - Link power management options. -lpfc.txt - - LPFC driver release notes -megaraid.txt - - Common Management Module, shared code handling ioctls for LSI drivers -ncr53c8xx.txt - - info on driver for NCR53c8xx based adapters -osd.txt - Object-Based Storage Device, command set introduction. -osst.txt - - info on driver for OnStream SC-x0 SCSI tape -ppa.txt - - info on driver for IOmega zip drive -qlogicfas.txt - - info on driver for QLogic FASxxx based adapters -scsi-changer.txt - - README for the SCSI media changer driver -scsi-generic.txt - - info on the sg driver for generic (non-disk/CD/tape) SCSI devices. -scsi-parameters.txt - - List of SCSI-parameters to pass to the kernel at module load-time. -scsi.txt - - short blurb on using SCSI support as a module. -scsi_mid_low_api.txt - - info on API between SCSI layer and low level drivers -scsi_eh.txt - - info on SCSI midlayer error handling infrastructure -scsi_fc_transport.txt - - SCSI Fiber Channel Tansport -st.txt - - info on scsi tape driver -sym53c500_cs.txt - - info on PCMCIA driver for Symbios Logic 53c500 based adapters -sym53c8xx_2.txt - - info on second generation driver for sym53c8xx based adapters -tmscsim.txt - - info on driver for AM53c974 based adapters -ufs.txt - - info on Universal Flash Storage(UFS) and UFS host controller driver. diff --git a/Documentation/scsi/ufs.txt b/Documentation/scsi/ufs.txt index 41a6164592aa..520b5b033256 100644 --- a/Documentation/scsi/ufs.txt +++ b/Documentation/scsi/ufs.txt @@ -128,6 +128,26 @@ The current UFSHCD implementation supports following functionality, In this version of UFSHCD Query requests and power management functionality are not implemented. +4. BSG Support +------------------ + +This transport driver supports exchanging UFS protocol information units +(UPIUs) with a UFS device. Typically, user space will allocate +struct ufs_bsg_request and struct ufs_bsg_reply (see ufs_bsg.h) as +request_upiu and reply_upiu respectively. Filling those UPIUs should +be done in accordance with JEDEC spec UFS2.1 paragraph 10.7. +*Caveat emptor*: The driver makes no further input validations and sends the +UPIU to the device as it is. Open the bsg device in /dev/ufs-bsg and +send SG_IO with the applicable sg_io_v4: + + io_hdr_v4.guard = 'Q'; + io_hdr_v4.protocol = BSG_PROTOCOL_SCSI; + io_hdr_v4.subprotocol = BSG_SUB_PROTOCOL_SCSI_TRANSPORT; + io_hdr_v4.response = (__u64)reply_upiu; + io_hdr_v4.max_response_len = reply_len; + io_hdr_v4.request_len = request_len; + io_hdr_v4.request = (__u64)request_upiu; + UFS Specifications can be found at, UFS - http://www.jedec.org/sites/default/files/docs/JESD220.pdf UFSHCI - http://www.jedec.org/sites/default/files/docs/JESD223.pdf diff --git a/Documentation/security/LSM.rst b/Documentation/security/LSM.rst index 98522e0e1ee2..8b9ee597e9d0 100644 --- a/Documentation/security/LSM.rst +++ b/Documentation/security/LSM.rst @@ -5,7 +5,7 @@ Linux Security Module Development Based on https://lkml.org/lkml/2007/10/26/215, a new LSM is accepted into the kernel when its intent (a description of what it tries to protect against and in what cases one would expect to -use it) has been appropriately documented in ``Documentation/security/LSM.rst``. +use it) has been appropriately documented in ``Documentation/admin-guide/LSM/``. This allows an LSM's code to be easily compared to its goals, and so that end users and distros can make a more informed decision about which LSMs suit their requirements. diff --git a/Documentation/security/keys/ecryptfs.rst b/Documentation/security/keys/ecryptfs.rst index 4920f3a8ea75..0e2be0a6bb6a 100644 --- a/Documentation/security/keys/ecryptfs.rst +++ b/Documentation/security/keys/ecryptfs.rst @@ -5,10 +5,10 @@ Encrypted keys for the eCryptfs filesystem ECryptfs is a stacked filesystem which transparently encrypts and decrypts each file using a randomly generated File Encryption Key (FEK). -Each FEK is in turn encrypted with a File Encryption Key Encryption Key (FEFEK) +Each FEK is in turn encrypted with a File Encryption Key Encryption Key (FEKEK) either in kernel space or in user space with a daemon called 'ecryptfsd'. In the former case the operation is performed directly by the kernel CryptoAPI -using a key, the FEFEK, derived from a user prompted passphrase; in the latter +using a key, the FEKEK, derived from a user prompted passphrase; in the latter the FEK is encrypted by 'ecryptfsd' with the help of external libraries in order to support other mechanisms like public key cryptography, PKCS#11 and TPM based operations. @@ -22,12 +22,12 @@ by the userspace utility 'mount.ecryptfs' shipped with the package The 'encrypted' key type has been extended with the introduction of the new format 'ecryptfs' in order to be used in conjunction with the eCryptfs filesystem. Encrypted keys of the newly introduced format store an -authentication token in its payload with a FEFEK randomly generated by the +authentication token in its payload with a FEKEK randomly generated by the kernel and protected by the parent master key. In order to avoid known-plaintext attacks, the datablob obtained through commands 'keyctl print' or 'keyctl pipe' does not contain the overall -authentication token, which content is well known, but only the FEFEK in +authentication token, which content is well known, but only the FEKEK in encrypted form. The eCryptfs filesystem may really benefit from using encrypted keys in that the diff --git a/Documentation/serial/00-INDEX b/Documentation/serial/00-INDEX deleted file mode 100644 index 8021a9f29fc5..000000000000 --- a/Documentation/serial/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - this file. -README.cycladesZ - - info on Cyclades-Z firmware loading. -driver - - intro to the low level serial driver. -moxa-smartio - - file with info on installing/using Moxa multiport serial driver. -n_gsm.txt - - GSM 0710 tty multiplexer howto. -rocket.txt - - info on the Comtrol RocketPort multiport serial driver. -serial-rs485.txt - - info about RS485 structures and support in the kernel. -tty.txt - - guide to the locking policies of the tty layer. diff --git a/Documentation/sphinx-static/theme_overrides.css b/Documentation/sphinx-static/theme_overrides.css index 522b6d4c49d4..e21e36cd6761 100644 --- a/Documentation/sphinx-static/theme_overrides.css +++ b/Documentation/sphinx-static/theme_overrides.css @@ -4,6 +4,44 @@ * */ +/* Improve contrast and increase size for easier reading. */ + +body { + font-family: serif; + color: black; + font-size: 100%; +} + +h1, h2, .rst-content .toctree-wrapper p.caption, h3, h4, h5, h6, legend { + font-family: sans-serif; +} + +.wy-menu-vertical li.current a { + color: #505050; +} + +.wy-menu-vertical li.on a, .wy-menu-vertical li.current > a { + color: #303030; +} + +div[class^="highlight"] pre { + font-family: monospace; + color: black; + font-size: 100%; +} + +.wy-menu-vertical { + font-family: sans-serif; +} + +.c { + font-style: normal; +} + +p { + font-size: 100%; +} + /* Interim: Code-blocks with line nos - lines and line numbers don't line up. * see: https://github.com/rtfd/sphinx_rtd_theme/issues/419 */ diff --git a/Documentation/spi/00-INDEX b/Documentation/spi/00-INDEX deleted file mode 100644 index 8e4bb17d70eb..000000000000 --- a/Documentation/spi/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - this file. -butterfly - - AVR Butterfly SPI driver overview and pin configuration. -ep93xx_spi - - Basic EP93xx SPI driver configuration. -pxa2xx - - PXA2xx SPI master controller build by spi_message fifo wq -spidev - - Intro to the userspace API for spi devices -spi-lm70llp - - Connecting an LM70-LLP sensor to the kernel via the SPI subsys. -spi-sc18is602 - - NXP SC18IS602/603 I2C-bus to SPI bridge -spi-summary - - (Linux) SPI overview. If unsure about SPI or SPI in Linux, start here. diff --git a/Documentation/switchtec.txt b/Documentation/switchtec.txt index f788264921ff..30d6a64e53f7 100644 --- a/Documentation/switchtec.txt +++ b/Documentation/switchtec.txt @@ -23,7 +23,7 @@ The primary means of communicating with the Switchtec management firmware is through the Memory-mapped Remote Procedure Call (MRPC) interface. Commands are submitted to the interface with a 4-byte command identifier and up to 1KB of command specific data. The firmware will -respond with a 4 bytes return code and up to 1KB of command specific +respond with a 4-byte return code and up to 1KB of command-specific data. The interface only processes a single command at a time. @@ -36,8 +36,8 @@ device: /dev/switchtec#, one for each management endpoint in the system. The char device has the following semantics: * A write must consist of at least 4 bytes and no more than 1028 bytes. - The first four bytes will be interpreted as the command to run and - the remainder will be used as the input data. A write will send the + The first 4 bytes will be interpreted as the Command ID and the + remainder will be used as the input data. A write will send the command to the firmware to begin processing. * Each write must be followed by exactly one read. Any double write will @@ -45,9 +45,9 @@ The char device has the following semantics: produce an error. * A read will block until the firmware completes the command and return - the four bytes of status plus up to 1024 bytes of output data. (The - length will be specified by the size parameter of the read call -- - reading less than 4 bytes will produce an error. + the 4-byte Command Return Value plus up to 1024 bytes of output + data. (The length will be specified by the size parameter of the read + call -- reading less than 4 bytes will produce an error.) * The poll call will also be supported for userspace applications that need to do other things while waiting for the command to complete. @@ -83,10 +83,20 @@ The following IOCTLs are also supported by the device: Non-Transparent Bridge (NTB) Driver =================================== -An NTB driver is provided for the switchtec hardware in switchtec_ntb. -Currently, it only supports switches configured with exactly 2 -partitions. It also requires the following configuration settings: +An NTB hardware driver is provided for the Switchtec hardware in +ntb_hw_switchtec. Currently, it only supports switches configured with +exactly 2 NT partitions and zero or more non-NT partitions. It also requires +the following configuration settings: -* Both partitions must be able to access each other's GAS spaces. +* Both NT partitions must be able to access each other's GAS spaces. Thus, the bits in the GAS Access Vector under Management Settings must be set to support this. +* Kernel configuration MUST include support for NTB (CONFIG_NTB needs + to be set) + +NT EP BAR 2 will be dynamically configured as a Direct Window, and +the configuration file does not need to configure it explicitly. + +Please refer to Documentation/ntb.txt in Linux source tree for an overall +understanding of the Linux NTB stack. ntb_hw_switchtec works as an NTB +Hardware Driver in this stack. diff --git a/Documentation/sysctl/00-INDEX b/Documentation/sysctl/00-INDEX deleted file mode 100644 index 8cf5d493fd03..000000000000 --- a/Documentation/sysctl/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - this file. -README - - general information about /proc/sys/ sysctl files. -abi.txt - - documentation for /proc/sys/abi/*. -fs.txt - - documentation for /proc/sys/fs/*. -kernel.txt - - documentation for /proc/sys/kernel/*. -net.txt - - documentation for /proc/sys/net/*. -sunrpc.txt - - documentation for /proc/sys/sunrpc/*. -vm.txt - - documentation for /proc/sys/vm/*. diff --git a/Documentation/timers/00-INDEX b/Documentation/timers/00-INDEX deleted file mode 100644 index 3be05fe0f1f9..000000000000 --- a/Documentation/timers/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - this file -highres.txt - - High resolution timers and dynamic ticks design notes -hpet.txt - - High Precision Event Timer Driver for Linux -hrtimers.txt - - subsystem for high-resolution kernel timers -NO_HZ.txt - - Summary of the different methods for the scheduler clock-interrupts management. -timekeeping.txt - - Clock sources, clock events, sched_clock() and delay timer notes -timers-howto.txt - - how to insert delays in the kernel the right (tm) way. -timer_stats.txt - - timer usage statistics diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst index 7ea16a0ceffc..f82434f2795e 100644 --- a/Documentation/trace/ftrace.rst +++ b/Documentation/trace/ftrace.rst @@ -2987,6 +2987,9 @@ The following commands are supported: command, it only prints out the contents of the ring buffer for the CPU that executed the function that triggered the dump. +- stacktrace: + When the function is hit, a stack trace is recorded. + trace_pipe ---------- diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst index 5ac724baea7d..7dda76503127 100644 --- a/Documentation/trace/histogram.rst +++ b/Documentation/trace/histogram.rst @@ -1765,7 +1765,7 @@ For example, here's how a latency can be calculated:: # echo 'hist:keys=pid,prio:ts0=common_timestamp ...' >> event1/trigger # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp-$ts0 ...' >> event2/trigger -In the first line above, the event's timetamp is saved into the +In the first line above, the event's timestamp is saved into the variable ts0. In the next line, ts0 is subtracted from the second event's timestamp to produce the latency, which is then assigned into yet another variable, 'wakeup_lat'. The hist trigger below in turn @@ -1811,7 +1811,7 @@ the command that defined it with a '!':: /sys/kernel/debug/tracing/synthetic_events At this point, there isn't yet an actual 'wakeup_latency' event -instantiated in the event subsytem - for this to happen, a 'hist +instantiated in the event subsystem - for this to happen, a 'hist trigger action' needs to be instantiated and bound to actual fields and variables defined on other events (see Section 2.2.3 below on how that is done using hist trigger 'onmatch' action). Once that is @@ -1837,7 +1837,7 @@ output can be displayed by reading the event's 'hist' file. A hist trigger 'action' is a function that's executed whenever a histogram entry is added or updated. -The default 'action' if no special function is explicity specified is +The default 'action' if no special function is explicitly specified is as it always has been, to simply update the set of values associated with an entry. Some applications, however, may want to perform additional actions at that point, such as generate another event, or diff --git a/Documentation/virtual/00-INDEX b/Documentation/virtual/00-INDEX deleted file mode 100644 index af0d23968ee7..000000000000 --- a/Documentation/virtual/00-INDEX +++ /dev/null @@ -1,11 +0,0 @@ -Virtualization support in the Linux kernel. - -00-INDEX - - this file. - -paravirt_ops.txt - - Describes the Linux kernel pv_ops to support different hypervisors -kvm/ - - Kernel Virtual Machine. See also http://linux-kvm.org -uml/ - - User Mode Linux, builds/runs Linux kernel as a userspace program. diff --git a/Documentation/virtual/kvm/00-INDEX b/Documentation/virtual/kvm/00-INDEX deleted file mode 100644 index 3492458a4ae8..000000000000 --- a/Documentation/virtual/kvm/00-INDEX +++ /dev/null @@ -1,35 +0,0 @@ -00-INDEX - - this file. -amd-memory-encryption.rst - - notes on AMD Secure Encrypted Virtualization feature and SEV firmware - command description -api.txt - - KVM userspace API. -arm - - internal ABI between the kernel and HYP (for arm/arm64) -cpuid.txt - - KVM-specific cpuid leaves (x86). -devices/ - - KVM_CAP_DEVICE_CTRL userspace API. -halt-polling.txt - - notes on halt-polling -hypercalls.txt - - KVM hypercalls. -locking.txt - - notes on KVM locks. -mmu.txt - - the x86 kvm shadow mmu. -msr.txt - - KVM-specific MSRs (x86). -nested-vmx.txt - - notes on nested virtualization for Intel x86 processors. -ppc-pv.txt - - the paravirtualization interface on PowerPC. -review-checklist.txt - - review checklist for KVM patches. -s390-diag.txt - - Diagnose hypercall description (for IBM S/390) -timekeeping.txt - - timekeeping virtualization for x86-based architectures. -vcpu-requests.rst - - internal VCPU request API diff --git a/Documentation/vm/00-INDEX b/Documentation/vm/00-INDEX deleted file mode 100644 index f4a4f3e884cf..000000000000 --- a/Documentation/vm/00-INDEX +++ /dev/null @@ -1,50 +0,0 @@ -00-INDEX - - this file. -active_mm.rst - - An explanation from Linus about tsk->active_mm vs tsk->mm. -balance.rst - - various information on memory balancing. -cleancache.rst - - Intro to cleancache and page-granularity victim cache. -frontswap.rst - - Outline frontswap, part of the transcendent memory frontend. -highmem.rst - - Outline of highmem and common issues. -hmm.rst - - Documentation of heterogeneous memory management -hugetlbfs_reserv.rst - - A brief overview of hugetlbfs reservation design/implementation. -hwpoison.rst - - explains what hwpoison is -ksm.rst - - how to use the Kernel Samepage Merging feature. -mmu_notifier.rst - - a note about clearing pte/pmd and mmu notifications -numa.rst - - information about NUMA specific code in the Linux vm. -overcommit-accounting.rst - - description of the Linux kernels overcommit handling modes. -page_frags.rst - - description of page fragments allocator -page_migration.rst - - description of page migration in NUMA systems. -page_owner.rst - - tracking about who allocated each page -remap_file_pages.rst - - a note about remap_file_pages() system call -slub.rst - - a short users guide for SLUB. -split_page_table_lock.rst - - Separate per-table lock to improve scalability of the old page_table_lock. -swap_numa.rst - - automatic binding of swap device to numa node -transhuge.rst - - Transparent Hugepage Support, alternative way of using hugepages. -unevictable-lru.rst - - Unevictable LRU infrastructure -z3fold.txt - - outline of z3fold allocator for storing compressed pages -zsmalloc.rst - - outline of zsmalloc allocator for storing compressed pages -zswap.rst - - Intro to compressed cache for swap pages diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst index cdf3911582c8..44205f0b671f 100644 --- a/Documentation/vm/hmm.rst +++ b/Documentation/vm/hmm.rst @@ -194,13 +194,13 @@ use either:: unsigned long start, unsigned long end, hmm_pfn_t *pfns); - int hmm_vma_fault(struct vm_area_struct *vma, - struct hmm_range *range, - unsigned long start, - unsigned long end, - hmm_pfn_t *pfns, - bool write, - bool block); + int hmm_vma_fault(struct vm_area_struct *vma, + struct hmm_range *range, + unsigned long start, + unsigned long end, + hmm_pfn_t *pfns, + bool write, + bool block); The first one (hmm_vma_get_pfns()) will only fetch present CPU page table entries and will not trigger a page fault on missing or non-present entries. diff --git a/Documentation/w1/00-INDEX b/Documentation/w1/00-INDEX deleted file mode 100644 index cb49802745dc..000000000000 --- a/Documentation/w1/00-INDEX +++ /dev/null @@ -1,10 +0,0 @@ -00-INDEX - - This file -slaves/ - - Drivers that provide support for specific family codes. -masters/ - - Individual chips providing 1-wire busses. -w1.generic - - The 1-wire (w1) bus -w1.netlink - - Userspace communication protocol over connector [1]. diff --git a/Documentation/w1/masters/00-INDEX b/Documentation/w1/masters/00-INDEX deleted file mode 100644 index 8330cf9325f0..000000000000 --- a/Documentation/w1/masters/00-INDEX +++ /dev/null @@ -1,12 +0,0 @@ -00-INDEX - - This file -ds2482 - - The Maxim/Dallas Semiconductor DS2482 provides 1-wire busses. -ds2490 - - The Maxim/Dallas Semiconductor DS2490 builds USB <-> W1 bridges. -mxc-w1 - - W1 master controller driver found on Freescale MX2/MX3 SoCs -omap-hdq - - HDQ/1-wire module of TI OMAP 2430/3430. -w1-gpio - - GPIO 1-wire bus master driver. diff --git a/Documentation/w1/slaves/00-INDEX b/Documentation/w1/slaves/00-INDEX deleted file mode 100644 index 68946f83e579..000000000000 --- a/Documentation/w1/slaves/00-INDEX +++ /dev/null @@ -1,14 +0,0 @@ -00-INDEX - - This file -w1_therm - - The Maxim/Dallas Semiconductor ds18*20 temperature sensor. -w1_ds2413 - - The Maxim/Dallas Semiconductor ds2413 dual channel addressable switch. -w1_ds2423 - - The Maxim/Dallas Semiconductor ds2423 counter device. -w1_ds2438 - - The Maxim/Dallas Semiconductor ds2438 smart battery monitor. -w1_ds28e04 - - The Maxim/Dallas Semiconductor ds28e04 eeprom. -w1_ds28e17 - - The Maxim/Dallas Semiconductor ds28e17 1-Wire-to-I2C Master Bridge. diff --git a/Documentation/x86/00-INDEX b/Documentation/x86/00-INDEX deleted file mode 100644 index 3bb2ee3edcd1..000000000000 --- a/Documentation/x86/00-INDEX +++ /dev/null @@ -1,20 +0,0 @@ -00-INDEX - - this file -boot.txt - - List of boot protocol versions -earlyprintk.txt - - Using earlyprintk with a USB2 debug port key. -entry_64.txt - - Describe (some of the) kernel entry points for x86. -exception-tables.txt - - why and how Linux kernel uses exception tables on x86 -microcode.txt - - How to load microcode from an initrd-CPIO archive early to fix CPU issues. -mtrr.txt - - how to use x86 Memory Type Range Registers to increase performance -pat.txt - - Page Attribute Table intro and API -usb-legacy-support.txt - - how to fix/avoid quirks when using emulated PS/2 mouse/keyboard. -zero-page.txt - - layout of the first page of memory. diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt index 5e9b826b5f62..7727db8f94bc 100644 --- a/Documentation/x86/boot.txt +++ b/Documentation/x86/boot.txt @@ -61,6 +61,18 @@ Protocol 2.12: (Kernel 3.8) Added the xloadflags field and extension fields to struct boot_params for loading bzImage and ramdisk above 4G in 64bit. +Protocol 2.13: (Kernel 3.14) Support 32- and 64-bit flags being set in + xloadflags to support booting a 64-bit kernel from 32-bit + EFI + +Protocol 2.14: (Kernel 4.20) Added acpi_rsdp_addr holding the physical + address of the ACPI RSDP table. + The bootloader updates version with: + 0x8000 | min(kernel-version, bootloader-version) + kernel-version being the protocol version supported by + the kernel and bootloader-version the protocol version + supported by the bootloader. + **** MEMORY LAYOUT The traditional memory map for the kernel loader, used for Image or @@ -197,6 +209,7 @@ Offset Proto Name Meaning 0258/8 2.10+ pref_address Preferred loading address 0260/4 2.10+ init_size Linear memory required during initialization 0264/4 2.11+ handover_offset Offset of handover entry point +0268/8 2.14+ acpi_rsdp_addr Physical address of RSDP table (1) For backwards compatibility, if the setup_sects field contains 0, the real value is 4. @@ -309,7 +322,7 @@ Protocol: 2.00+ Contains the magic number "HdrS" (0x53726448). Field name: version -Type: read +Type: modify Offset/size: 0x206/2 Protocol: 2.00+ @@ -317,6 +330,12 @@ Protocol: 2.00+ e.g. 0x0204 for version 2.04, and 0x0a11 for a hypothetical version 10.17. + Up to protocol version 2.13 this information is only read by the + bootloader. From protocol version 2.14 onwards the bootloader will + write the used protocol version or-ed with 0x8000 to the field. The + used protocol version will be the minimum of the supported protocol + versions of the bootloader and the kernel. + Field name: realmode_swtch Type: modify (optional) Offset/size: 0x208/4 @@ -744,6 +763,17 @@ Offset/size: 0x264/4 See EFI HANDOVER PROTOCOL below for more details. +Field name: acpi_rsdp_addr +Type: write +Offset/size: 0x268/8 +Protocol: 2.14+ + + This field can be set by the boot loader to tell the kernel the + physical address of the ACPI RSDP table. + + A value of 0 indicates the kernel should fall back to the standard + methods to locate the RSDP. + **** THE IMAGE CHECKSUM diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt index f662d3c530e5..52b10945ff75 100644 --- a/Documentation/x86/intel_rdt_ui.txt +++ b/Documentation/x86/intel_rdt_ui.txt @@ -520,18 +520,24 @@ the pseudo-locked region: 2) Cache hit and miss measurements using model specific precision counters if available. Depending on the levels of cache on the system the pseudo_lock_l2 and pseudo_lock_l3 tracepoints are available. - WARNING: triggering this measurement uses from two (for just L2 - measurements) to four (for L2 and L3 measurements) precision counters on - the system, if any other measurements are in progress the counters and - their corresponding event registers will be clobbered. When a pseudo-locked region is created a new debugfs directory is created for it in debugfs as /sys/kernel/debug/resctrl/<newdir>. A single write-only file, pseudo_lock_measure, is present in this directory. The -measurement on the pseudo-locked region depends on the number, 1 or 2, -written to this debugfs file. Since the measurements are recorded with the -tracing infrastructure the relevant tracepoints need to be enabled before the -measurement is triggered. +measurement of the pseudo-locked region depends on the number written to this +debugfs file: +1 - writing "1" to the pseudo_lock_measure file will trigger the latency + measurement captured in the pseudo_lock_mem_latency tracepoint. See + example below. +2 - writing "2" to the pseudo_lock_measure file will trigger the L2 cache + residency (cache hits and misses) measurement captured in the + pseudo_lock_l2 tracepoint. See example below. +3 - writing "3" to the pseudo_lock_measure file will trigger the L3 cache + residency (cache hits and misses) measurement captured in the + pseudo_lock_l3 tracepoint. + +All measurements are recorded with the tracing infrastructure. This requires +the relevant tracepoints to be enabled before the measurement is triggered. Example of latency debugging interface: In this example a pseudo-locked region named "newlock" was created. Here is diff --git a/Documentation/x86/x86_64/00-INDEX b/Documentation/x86/x86_64/00-INDEX deleted file mode 100644 index 92fc20ab5f0e..000000000000 --- a/Documentation/x86/x86_64/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - This file -boot-options.txt - - AMD64-specific boot options. -cpu-hotplug-spec - - Firmware support for CPU hotplug under Linux/x86-64 -fake-numa-for-cpusets - - Using numa=fake and CPUSets for Resource Management -kernel-stacks - - Context-specific per-processor interrupt stacks. -machinecheck - - Configurable sysfs parameters for the x86-64 machine check code. -mm.txt - - Memory layout of x86-64 (4 level page tables, 46 bits physical). -uefi.txt - - Booting Linux via Unified Extensible Firmware Interface. diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt index 5432a96d31ff..702898633b00 100644 --- a/Documentation/x86/x86_64/mm.txt +++ b/Documentation/x86/x86_64/mm.txt @@ -1,55 +1,124 @@ +==================================================== +Complete virtual memory map with 4-level page tables +==================================================== -Virtual memory map with 4 level page tables: - -0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm -hole caused by [47:63] sign extension -ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor -ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory -ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole -ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space -ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole -ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB) -... unused hole ... -ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB) -... unused hole ... - vaddr_end for KASLR -fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping -fffffe8000000000 - fffffeffffffffff (=39 bits) LDT remap for PTI -ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks -... unused hole ... -ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space -... unused hole ... -ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0 -ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space -[fixmap start] - ffffffffff5fffff kernel-internal fixmap range -ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI -ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole - -Virtual memory map with 5 level page tables: - -0000000000000000 - 00ffffffffffffff (=56 bits) user space, different per mm -hole caused by [56:63] sign extension -ff00000000000000 - ff0fffffffffffff (=52 bits) guard hole, reserved for hypervisor -ff10000000000000 - ff8fffffffffffff (=55 bits) direct mapping of all phys. memory -ff90000000000000 - ff9fffffffffffff (=52 bits) LDT remap for PTI -ffa0000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space (12800 TB) -ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole -ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB) -... unused hole ... -ffdf000000000000 - fffffc0000000000 (=53 bits) kasan shadow memory (8PB) -... unused hole ... - vaddr_end for KASLR -fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping -... unused hole ... -ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks -... unused hole ... -ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space -... unused hole ... -ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0 -ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space -[fixmap start] - ffffffffff5fffff kernel-internal fixmap range -ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI -ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole +Notes: + + - Negative addresses such as "-23 TB" are absolute addresses in bytes, counted down + from the top of the 64-bit address space. It's easier to understand the layout + when seen both in absolute addresses and in distance-from-top notation. + + For example 0xffffe90000000000 == -23 TB, it's 23 TB lower than the top of the + 64-bit address space (ffffffffffffffff). + + Note that as we get closer to the top of the address space, the notation changes + from TB to GB and then MB/KB. + + - "16M TB" might look weird at first sight, but it's an easier to visualize size + notation than "16 EB", which few will recognize at first sight as 16 exabytes. + It also shows it nicely how incredibly large 64-bit address space is. + +======================================================================================================================== + Start addr | Offset | End addr | Size | VM area description +======================================================================================================================== + | | | | + 0000000000000000 | 0 | 00007fffffffffff | 128 TB | user-space virtual memory, different per mm +__________________|____________|__________________|_________|___________________________________________________________ + | | | | + 0000800000000000 | +128 TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical + | | | | virtual memory addresses up to the -128 TB + | | | | starting offset of kernel mappings. +__________________|____________|__________________|_________|___________________________________________________________ + | + | Kernel-space virtual memory, shared between all processes: +____________________________________________________________|___________________________________________________________ + | | | | + ffff800000000000 | -128 TB | ffff87ffffffffff | 8 TB | ... guard hole, also reserved for hypervisor + ffff880000000000 | -120 TB | ffffc7ffffffffff | 64 TB | direct mapping of all physical memory (page_offset_base) + ffffc80000000000 | -56 TB | ffffc8ffffffffff | 1 TB | ... unused hole + ffffc90000000000 | -55 TB | ffffe8ffffffffff | 32 TB | vmalloc/ioremap space (vmalloc_base) + ffffe90000000000 | -23 TB | ffffe9ffffffffff | 1 TB | ... unused hole + ffffea0000000000 | -22 TB | ffffeaffffffffff | 1 TB | virtual memory map (vmemmap_base) + ffffeb0000000000 | -21 TB | ffffebffffffffff | 1 TB | ... unused hole + ffffec0000000000 | -20 TB | fffffbffffffffff | 16 TB | KASAN shadow memory + fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole + | | | | vaddr_end for KASLR + fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping + fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | LDT remap for PTI + ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks +__________________|____________|__________________|_________|____________________________________________________________ + | + | Identical layout to the 47-bit one from here on: +____________________________________________________________|____________________________________________________________ + | | | | + ffffff8000000000 | -512 GB | ffffffeeffffffff | 444 GB | ... unused hole + ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space + ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | ... unused hole + ffffffff80000000 | -2 GB | ffffffff9fffffff | 512 MB | kernel text mapping, mapped to physical address 0 + ffffffff80000000 |-2048 MB | | | + ffffffffa0000000 |-1536 MB | fffffffffeffffff | 1520 MB | module mapping space + ffffffffff000000 | -16 MB | | | + FIXADDR_START | ~-11 MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset + ffffffffff600000 | -10 MB | ffffffffff600fff | 4 kB | legacy vsyscall ABI + ffffffffffe00000 | -2 MB | ffffffffffffffff | 2 MB | ... unused hole +__________________|____________|__________________|_________|___________________________________________________________ + + +==================================================== +Complete virtual memory map with 5-level page tables +==================================================== + +Notes: + + - With 56-bit addresses, user-space memory gets expanded by a factor of 512x, + from 0.125 PB to 64 PB. All kernel mappings shift down to the -64 PT starting + offset and many of the regions expand to support the much larger physical + memory supported. + +======================================================================================================================== + Start addr | Offset | End addr | Size | VM area description +======================================================================================================================== + | | | | + 0000000000000000 | 0 | 00ffffffffffffff | 64 PB | user-space virtual memory, different per mm +__________________|____________|__________________|_________|___________________________________________________________ + | | | | + 0000800000000000 | +64 PB | ffff7fffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical + | | | | virtual memory addresses up to the -128 TB + | | | | starting offset of kernel mappings. +__________________|____________|__________________|_________|___________________________________________________________ + | + | Kernel-space virtual memory, shared between all processes: +____________________________________________________________|___________________________________________________________ + | | | | + ff00000000000000 | -64 PB | ff0fffffffffffff | 4 PB | ... guard hole, also reserved for hypervisor + ff10000000000000 | -60 PB | ff8fffffffffffff | 32 PB | direct mapping of all physical memory (page_offset_base) + ff90000000000000 | -28 PB | ff9fffffffffffff | 4 PB | LDT remap for PTI + ffa0000000000000 | -24 PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base) + ffd2000000000000 | -11.5 PB | ffd3ffffffffffff | 0.5 PB | ... unused hole + ffd4000000000000 | -11 PB | ffd5ffffffffffff | 0.5 PB | virtual memory map (vmemmap_base) + ffd6000000000000 | -10.5 PB | ffdeffffffffffff | 2.25 PB | ... unused hole + ffdf000000000000 | -8.25 PB | fffffdffffffffff | ~8 PB | KASAN shadow memory + fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole + | | | | vaddr_end for KASLR + fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping + fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | ... unused hole + ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks +__________________|____________|__________________|_________|____________________________________________________________ + | + | Identical layout to the 47-bit one from here on: +____________________________________________________________|____________________________________________________________ + | | | | + ffffff8000000000 | -512 GB | ffffffeeffffffff | 444 GB | ... unused hole + ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space + ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | ... unused hole + ffffffff80000000 | -2 GB | ffffffff9fffffff | 512 MB | kernel text mapping, mapped to physical address 0 + ffffffff80000000 |-2048 MB | | | + ffffffffa0000000 |-1536 MB | fffffffffeffffff | 1520 MB | module mapping space + ffffffffff000000 | -16 MB | | | + FIXADDR_START | ~-11 MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset + ffffffffff600000 | -10 MB | ffffffffff600fff | 4 kB | legacy vsyscall ABI + ffffffffffe00000 | -2 MB | ffffffffffffffff | 2 MB | ... unused hole +__________________|____________|__________________|_________|___________________________________________________________ Architecture defines a 64-bit virtual address. Implementations can support less. Currently supported are 48- and 57-bit virtual addresses. Bits 63 |