diff options
Diffstat (limited to 'Documentation')
38 files changed, 2993 insertions, 83 deletions
diff --git a/Documentation/ABI/testing/configfs-usb-gadget-mass-storage b/Documentation/ABI/testing/configfs-usb-gadget-mass-storage new file mode 100644 index 000000000000..ad72a37ee9ff --- /dev/null +++ b/Documentation/ABI/testing/configfs-usb-gadget-mass-storage @@ -0,0 +1,31 @@ +What: /config/usb-gadget/gadget/functions/mass_storage.name +Date: Oct 2013 +KenelVersion: 3.13 +Description: + The attributes: + + stall - Set to permit function to halt bulk endpoints. + Disabled on some USB devices known not to work + correctly. You should set it to true. + num_buffers - Number of pipeline buffers. Valid numbers + are 2..4. Available only if + CONFIG_USB_GADGET_DEBUG_FILES is set. + +What: /config/usb-gadget/gadget/functions/mass_storage.name/lun.name +Date: Oct 2013 +KenelVersion: 3.13 +Description: + The attributes: + + file - The path to the backing file for the LUN. + Required if LUN is not marked as removable. + ro - Flag specifying access to the LUN shall be + read-only. This is implied if CD-ROM emulation + is enabled as well as when it was impossible + to open "filename" in R/W mode. + removable - Flag specifying that LUN shall be indicated as + being removable. + cdrom - Flag specifying that LUN shall be reported as + being a CD-ROM. + nofua - Flag specifying that FUA flag + in SCSI WRITE(10,12) diff --git a/Documentation/ABI/testing/sysfs-class-mic.txt b/Documentation/ABI/testing/sysfs-class-mic.txt new file mode 100644 index 000000000000..13f48afc534f --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-mic.txt @@ -0,0 +1,157 @@ +What: /sys/class/mic/ +Date: October 2013 +KernelVersion: 3.13 +Contact: Sudeep Dutt <sudeep.dutt@intel.com> +Description: + The mic class directory belongs to Intel MIC devices and + provides information per MIC device. An Intel MIC device is a + PCIe form factor add-in Coprocessor card based on the Intel Many + Integrated Core (MIC) architecture that runs a Linux OS. + +What: /sys/class/mic/mic(x) +Date: October 2013 +KernelVersion: 3.13 +Contact: Sudeep Dutt <sudeep.dutt@intel.com> +Description: + The directories /sys/class/mic/mic0, /sys/class/mic/mic1 etc., + represent MIC devices (0,1,..etc). Each directory has + information specific to that MIC device. + +What: /sys/class/mic/mic(x)/family +Date: October 2013 +KernelVersion: 3.13 +Contact: Sudeep Dutt <sudeep.dutt@intel.com> +Description: + Provides information about the Coprocessor family for an Intel + MIC device. For example - "x100" + +What: /sys/class/mic/mic(x)/stepping +Date: October 2013 +KernelVersion: 3.13 +Contact: Sudeep Dutt <sudeep.dutt@intel.com> +Description: + Provides information about the silicon stepping for an Intel + MIC device. For example - "A0" or "B0" + +What: /sys/class/mic/mic(x)/state +Date: October 2013 +KernelVersion: 3.13 +Contact: Sudeep Dutt <sudeep.dutt@intel.com> +Description: + When read, this entry provides the current state of an Intel + MIC device in the context of the card OS. Possible values that + will be read are: + "offline" - The MIC device is ready to boot the card OS. On + reading this entry after an OSPM resume, a "boot" has to be + written to this entry if the card was previously shutdown + during OSPM suspend. + "online" - The MIC device has initiated booting a card OS. + "shutting_down" - The card OS is shutting down. + "reset_failed" - The MIC device has failed to reset. + "suspending" - The MIC device is currently being prepared for + suspend. On reading this entry, a "suspend" has to be written + to the state sysfs entry to ensure the card is shutdown during + OSPM suspend. + "suspended" - The MIC device has been suspended. + + When written, this sysfs entry triggers different state change + operations depending upon the current state of the card OS. + Acceptable values are: + "boot" - Boot the card OS image specified by the combination + of firmware, ramdisk, cmdline and bootmode + sysfs entries. + "reset" - Initiates device reset. + "shutdown" - Initiates card OS shutdown. + "suspend" - Initiates card OS shutdown and also marks the card + as suspended. + +What: /sys/class/mic/mic(x)/shutdown_status +Date: October 2013 +KernelVersion: 3.13 +Contact: Sudeep Dutt <sudeep.dutt@intel.com> +Description: + An Intel MIC device runs a Linux OS during its operation. This + OS can shutdown because of various reasons. When read, this + entry provides the status on why the card OS was shutdown. + Possible values are: + "nop" - shutdown status is not applicable, when the card OS is + "online" + "crashed" - Shutdown because of a HW or SW crash. + "halted" - Shutdown because of a halt command. + "poweroff" - Shutdown because of a poweroff command. + "restart" - Shutdown because of a restart command. + +What: /sys/class/mic/mic(x)/cmdline +Date: October 2013 +KernelVersion: 3.13 +Contact: Sudeep Dutt <sudeep.dutt@intel.com> +Description: + An Intel MIC device runs a Linux OS during its operation. Before + booting this card OS, it is possible to pass kernel command line + options to configure various features in it, similar to + self-bootable machines. When read, this entry provides + information about the current kernel command line options set to + boot the card OS. This entry can be written to change the + existing kernel command line options. Typically, the user would + want to read the current command line options, append new ones + or modify existing ones and then write the whole kernel command + line back to this entry. + +What: /sys/class/mic/mic(x)/firmware +Date: October 2013 +KernelVersion: 3.13 +Contact: Sudeep Dutt <sudeep.dutt@intel.com> +Description: + When read, this sysfs entry provides the path name under + /lib/firmware/ where the firmware image to be booted on the + card can be found. The entry can be written to change the + firmware image location under /lib/firmware/. + +What: /sys/class/mic/mic(x)/ramdisk +Date: October 2013 +KernelVersion: 3.13 +Contact: Sudeep Dutt <sudeep.dutt@intel.com> +Description: + When read, this sysfs entry provides the path name under + /lib/firmware/ where the ramdisk image to be used during card + OS boot can be found. The entry can be written to change + the ramdisk image location under /lib/firmware/. + +What: /sys/class/mic/mic(x)/bootmode +Date: October 2013 +KernelVersion: 3.13 +Contact: Sudeep Dutt <sudeep.dutt@intel.com> +Description: + When read, this sysfs entry provides the current bootmode for + the card. This sysfs entry can be written with the following + valid strings: + a) linux - Boot a Linux image. + b) elf - Boot an elf image for flash updates. + +What: /sys/class/mic/mic(x)/log_buf_addr +Date: October 2013 +KernelVersion: 3.13 +Contact: Sudeep Dutt <sudeep.dutt@intel.com> +Description: + An Intel MIC device runs a Linux OS during its operation. For + debugging purpose and early kernel boot messages, the user can + access the card OS log buffer via debugfs. When read, this entry + provides the kernel virtual address of the buffer where the card + OS log buffer can be read. This entry is written by the host + configuration daemon to set the log buffer address. The correct + log buffer address to be written can be found in the System.map + file of the card OS. + +What: /sys/class/mic/mic(x)/log_buf_len +Date: October 2013 +KernelVersion: 3.13 +Contact: Sudeep Dutt <sudeep.dutt@intel.com> +Description: + An Intel MIC device runs a Linux OS during its operation. For + debugging purpose and early kernel boot messages, the user can + access the card OS log buffer via debugfs. When read, this entry + provides the kernel virtual address where the card OS log buffer + length can be read. This entry is written by host configuration + daemon to set the log buffer length address. The correct log + buffer length address to be written can be found in the + System.map file of the card OS. diff --git a/Documentation/ABI/testing/sysfs-driver-sunxi-sid b/Documentation/ABI/testing/sysfs-driver-sunxi-sid new file mode 100644 index 000000000000..ffb9536f6ecc --- /dev/null +++ b/Documentation/ABI/testing/sysfs-driver-sunxi-sid @@ -0,0 +1,22 @@ +What: /sys/devices/*/<our-device>/eeprom +Date: August 2013 +Contact: Oliver Schinagl <oliver@schinagl.nl> +Description: read-only access to the SID (Security-ID) on current + A-series SoC's from Allwinner. Currently supports A10, A10s, A13 + and A20 CPU's. The earlier A1x series of SoCs exports 16 bytes, + whereas the newer A20 SoC exposes 512 bytes split into sections. + Besides the 16 bytes of SID, there's also an SJTAG area, + HDMI-HDCP key and some custom keys. Below a quick overview, for + details see the user manual: + 0x000 128 bit root-key (sun[457]i) + 0x010 128 bit boot-key (sun7i) + 0x020 64 bit security-jtag-key (sun7i) + 0x028 16 bit key configuration (sun7i) + 0x02b 16 bit custom-vendor-key (sun7i) + 0x02c 320 bit low general key (sun7i) + 0x040 32 bit read-control access (sun7i) + 0x064 224 bit low general key (sun7i) + 0x080 2304 bit HDCP-key (sun7i) + 0x1a0 768 bit high general key (sun7i) +Users: any user space application which wants to read the SID on + Allwinner's A-series of CPU's. diff --git a/Documentation/DocBook/filesystems.tmpl b/Documentation/DocBook/filesystems.tmpl index 25b58efd955d..4f676838da06 100644 --- a/Documentation/DocBook/filesystems.tmpl +++ b/Documentation/DocBook/filesystems.tmpl @@ -91,7 +91,6 @@ <title>The Filesystem for Exporting Kernel Objects</title> !Efs/sysfs/file.c !Efs/sysfs/symlink.c -!Efs/sysfs/bin.c </chapter> <chapter id="debugfs"> diff --git a/Documentation/connector/ucon.c b/Documentation/connector/ucon.c index 4848db8c71ff..8a4da64e02a8 100644 --- a/Documentation/connector/ucon.c +++ b/Documentation/connector/ucon.c @@ -71,7 +71,7 @@ static int netlink_send(int s, struct cn_msg *msg) nlh->nlmsg_seq = seq++; nlh->nlmsg_pid = getpid(); nlh->nlmsg_type = NLMSG_DONE; - nlh->nlmsg_len = NLMSG_LENGTH(size - sizeof(*nlh)); + nlh->nlmsg_len = size; nlh->nlmsg_flags = 0; m = NLMSG_DATA(nlh); diff --git a/Documentation/devicetree/bindings/misc/allwinner,sunxi-sid.txt b/Documentation/devicetree/bindings/misc/allwinner,sunxi-sid.txt new file mode 100644 index 000000000000..68ba37295565 --- /dev/null +++ b/Documentation/devicetree/bindings/misc/allwinner,sunxi-sid.txt @@ -0,0 +1,17 @@ +Allwinner sunxi-sid + +Required properties: +- compatible: "allwinner,sun4i-sid" or "allwinner,sun7i-a20-sid". +- reg: Should contain registers location and length + +Example for sun4i: + sid@01c23800 { + compatible = "allwinner,sun4i-sid"; + reg = <0x01c23800 0x10> + }; + +Example for sun7i: + sid@01c23800 { + compatible = "allwinner,sun7i-a20-sid"; + reg = <0x01c23800 0x200> + }; diff --git a/Documentation/devicetree/bindings/misc/ti,dac7512.txt b/Documentation/devicetree/bindings/misc/ti,dac7512.txt new file mode 100644 index 000000000000..1db45939dac9 --- /dev/null +++ b/Documentation/devicetree/bindings/misc/ti,dac7512.txt @@ -0,0 +1,20 @@ +TI DAC7512 DEVICETREE BINDINGS + +Required properties: + + - "compatible" Must be set to "ti,dac7512" + +Property rules described in Documentation/devicetree/bindings/spi/spi-bus.txt +apply. In particular, "reg" and "spi-max-frequency" properties must be given. + + +Example: + + spi_master { + dac7512: dac7512@0 { + compatible = "ti,dac7512"; + reg = <0>; /* CS0 */ + spi-max-frequency = <1000000>; + }; + }; + diff --git a/Documentation/devicetree/bindings/phy/phy-bindings.txt b/Documentation/devicetree/bindings/phy/phy-bindings.txt new file mode 100644 index 000000000000..8ae844fc0c60 --- /dev/null +++ b/Documentation/devicetree/bindings/phy/phy-bindings.txt @@ -0,0 +1,66 @@ +This document explains only the device tree data binding. For general +information about PHY subsystem refer to Documentation/phy.txt + +PHY device node +=============== + +Required Properties: +#phy-cells: Number of cells in a PHY specifier; The meaning of all those + cells is defined by the binding for the phy node. The PHY + provider can use the values in cells to find the appropriate + PHY. + +For example: + +phys: phy { + compatible = "xxx"; + reg = <...>; + . + . + #phy-cells = <1>; + . + . +}; + +That node describes an IP block (PHY provider) that implements 2 different PHYs. +In order to differentiate between these 2 PHYs, an additonal specifier should be +given while trying to get a reference to it. + +PHY user node +============= + +Required Properties: +phys : the phandle for the PHY device (used by the PHY subsystem) +phy-names : the names of the PHY corresponding to the PHYs present in the + *phys* phandle + +Example 1: +usb1: usb_otg_ss@xxx { + compatible = "xxx"; + reg = <xxx>; + . + . + phys = <&usb2_phy>, <&usb3_phy>; + phy-names = "usb2phy", "usb3phy"; + . + . +}; + +This node represents a controller that uses two PHYs, one for usb2 and one for +usb3. + +Example 2: +usb2: usb_otg_ss@xxx { + compatible = "xxx"; + reg = <xxx>; + . + . + phys = <&phys 1>; + phy-names = "usbphy"; + . + . +}; + +This node represents a controller that uses one of the PHYs of the PHY provider +device defined previously. Note that the phy handle has an additional specifier +"1" to differentiate between the two PHYs. diff --git a/Documentation/devicetree/bindings/phy/samsung-phy.txt b/Documentation/devicetree/bindings/phy/samsung-phy.txt new file mode 100644 index 000000000000..c0fccaa1671e --- /dev/null +++ b/Documentation/devicetree/bindings/phy/samsung-phy.txt @@ -0,0 +1,22 @@ +Samsung S5P/EXYNOS SoC series MIPI CSIS/DSIM DPHY +------------------------------------------------- + +Required properties: +- compatible : should be "samsung,s5pv210-mipi-video-phy"; +- reg : offset and length of the MIPI DPHY register set; +- #phy-cells : from the generic phy bindings, must be 1; + +For "samsung,s5pv210-mipi-video-phy" compatible PHYs the second cell in +the PHY specifier identifies the PHY and its meaning is as follows: + 0 - MIPI CSIS 0, + 1 - MIPI DSIM 0, + 2 - MIPI CSIS 1, + 3 - MIPI DSIM 1. + +Samsung EXYNOS SoC series Display Port PHY +------------------------------------------------- + +Required properties: +- compatible : should be "samsung,exynos5250-dp-video-phy"; +- reg : offset and length of the Display Port PHY register set; +- #phy-cells : from the generic PHY bindings, must be 0; diff --git a/Documentation/devicetree/bindings/usb/msm-hsusb.txt b/Documentation/devicetree/bindings/usb/msm-hsusb.txt new file mode 100644 index 000000000000..5ea26c631e3a --- /dev/null +++ b/Documentation/devicetree/bindings/usb/msm-hsusb.txt @@ -0,0 +1,17 @@ +MSM SoC HSUSB controllers + +EHCI + +Required properties: +- compatible: Should contain "qcom,ehci-host" +- regs: offset and length of the register set in the memory map +- usb-phy: phandle for the PHY device + +Example EHCI controller device node: + + ehci: ehci@f9a55000 { + compatible = "qcom,ehci-host"; + reg = <0xf9a55000 0x400>; + usb-phy = <&usb_otg>; + }; + diff --git a/Documentation/devicetree/bindings/usb/omap-usb.txt b/Documentation/devicetree/bindings/usb/omap-usb.txt index 9088ab09e200..090e5e22bd2b 100644 --- a/Documentation/devicetree/bindings/usb/omap-usb.txt +++ b/Documentation/devicetree/bindings/usb/omap-usb.txt @@ -3,9 +3,6 @@ OMAP GLUE AND OTHER OMAP SPECIFIC COMPONENTS OMAP MUSB GLUE - compatible : Should be "ti,omap4-musb" or "ti,omap3-musb" - ti,hwmods : must be "usb_otg_hs" - - ti,has-mailbox : to specify that omap uses an external mailbox - (in control module) to communicate with the musb core during device connect - and disconnect. - multipoint : Should be "1" indicating the musb controller supports multipoint. This is a MUSB configuration-specific setting. - num-eps : Specifies the number of endpoints. This is also a @@ -19,6 +16,9 @@ OMAP MUSB GLUE - power : Should be "50". This signifies the controller can supply up to 100mA when operating in host mode. - usb-phy : the phandle for the PHY device + - phys : the phandle for the PHY device (used by generic PHY framework) + - phy-names : the names of the PHY corresponding to the PHYs present in the + *phy* phandle. Optional properties: - ctrl-module : phandle of the control module this glue uses to write to @@ -28,11 +28,12 @@ SOC specific device node entry usb_otg_hs: usb_otg_hs@4a0ab000 { compatible = "ti,omap4-musb"; ti,hwmods = "usb_otg_hs"; - ti,has-mailbox; multipoint = <1>; num-eps = <16>; ram-bits = <12>; ctrl-module = <&omap_control_usb>; + phys = <&usb2_phy>; + phy-names = "usb2-phy"; }; Board specific device node entry @@ -78,22 +79,22 @@ omap_dwc3 { OMAP CONTROL USB Required properties: - - compatible: Should be "ti,omap-control-usb" + - compatible: Should be one of + "ti,control-phy-otghs" - if it has otghs_control mailbox register as on OMAP4. + "ti,control-phy-usb2" - if it has Power down bit in control_dev_conf register + e.g. USB2_PHY on OMAP5. + "ti,control-phy-pipe3" - if it has DPLL and individual Rx & Tx power control + e.g. USB3 PHY and SATA PHY on OMAP5. + "ti,control-phy-dra7usb2" - if it has power down register like USB2 PHY on + DRA7 platform. - reg : Address and length of the register set for the device. It contains - the address of "control_dev_conf" and "otghs_control" or "phy_power_usb" - depending upon omap4 or omap5. - - reg-names: The names of the register addresses corresponding to the registers - filled in "reg". - - ti,type: This is used to differentiate whether the control module has - usb mailbox or usb3 phy power. omap4 has usb mailbox in control module to - notify events to the musb core and omap5 has usb3 phy power register to - power on usb3 phy. Should be "1" if it has mailbox and "2" if it has usb3 - phy power. + the address of "otghs_control" for control-phy-otghs or "power" register + for other types. + - reg-names: should be "otghs_control" control-phy-otghs and "power" for + other types. omap_control_usb: omap-control-usb@4a002300 { - compatible = "ti,omap-control-usb"; - reg = <0x4a002300 0x4>, - <0x4a00233c 0x4>; - reg-names = "control_dev_conf", "otghs_control"; - ti,type = <1>; + compatible = "ti,control-phy-otghs"; + reg = <0x4a00233c 0x4>; + reg-names = "otghs_control"; }; diff --git a/Documentation/devicetree/bindings/usb/usb-nop-xceiv.txt b/Documentation/devicetree/bindings/usb/usb-nop-xceiv.txt index d7e272671c7e..1bd37faba05b 100644 --- a/Documentation/devicetree/bindings/usb/usb-nop-xceiv.txt +++ b/Documentation/devicetree/bindings/usb/usb-nop-xceiv.txt @@ -15,7 +15,7 @@ Optional properties: - vcc-supply: phandle to the regulator that provides RESET to the PHY. -- reset-supply: phandle to the regulator that provides power to the PHY. +- reset-gpios: Should specify the GPIO for reset. Example: @@ -25,10 +25,9 @@ Example: clocks = <&osc 0>; clock-names = "main_clk"; vcc-supply = <&hsusb1_vcc_regulator>; - reset-supply = <&hsusb1_reset_regulator>; + reset-gpios = <&gpio1 7 GPIO_ACTIVE_LOW>; }; hsusb1_phy is a NOP USB PHY device that gets its clock from an oscillator and expects that clock to be configured to 19.2MHz by the NOP PHY driver. -hsusb1_vcc_regulator provides power to the PHY and hsusb1_reset_regulator -controls RESET. +hsusb1_vcc_regulator provides power to the PHY and GPIO 7 controls RESET. diff --git a/Documentation/devicetree/bindings/usb/usb-phy.txt b/Documentation/devicetree/bindings/usb/usb-phy.txt index 61496f5cb095..c0245c888982 100644 --- a/Documentation/devicetree/bindings/usb/usb-phy.txt +++ b/Documentation/devicetree/bindings/usb/usb-phy.txt @@ -5,6 +5,8 @@ OMAP USB2 PHY Required properties: - compatible: Should be "ti,omap-usb2" - reg : Address and length of the register set for the device. + - #phy-cells: determine the number of cells that should be given in the + phandle while referencing this phy. Optional properties: - ctrl-module : phandle of the control module used by PHY driver to power on @@ -16,6 +18,7 @@ usb2phy@4a0ad080 { compatible = "ti,omap-usb2"; reg = <0x4a0ad080 0x58>; ctrl-module = <&omap_control_usb>; + #phy-cells = <0>; }; OMAP USB3 PHY @@ -25,6 +28,8 @@ Required properties: - reg : Address and length of the register set for the device. - reg-names: The names of the register addresses corresponding to the registers filled in "reg". + - #phy-cells: determine the number of cells that should be given in the + phandle while referencing this phy. Optional properties: - ctrl-module : phandle of the control module used by PHY driver to power on @@ -39,4 +44,5 @@ usb3phy@4a084400 { <0x4a084c00 0x40>; reg-names = "phy_rx", "phy_tx", "pll_ctrl"; ctrl-module = <&omap_control_usb>; + #phy-cells = <0>; }; diff --git a/Documentation/devicetree/bindings/video/exynos_dp.txt b/Documentation/devicetree/bindings/video/exynos_dp.txt index 84f10c16cb38..3289d76a21d0 100644 --- a/Documentation/devicetree/bindings/video/exynos_dp.txt +++ b/Documentation/devicetree/bindings/video/exynos_dp.txt @@ -6,10 +6,10 @@ We use two nodes: -dptx-phy node(defined inside dp-controller node) For the DP-PHY initialization, we use the dptx-phy node. -Required properties for dptx-phy: - -reg: +Required properties for dptx-phy: deprecated, use phys and phy-names + -reg: deprecated Base address of DP PHY register. - -samsung,enable-mask: + -samsung,enable-mask: deprecated The bit-mask used to enable/disable DP PHY. For the Panel initialization, we read data from dp-controller node. @@ -27,6 +27,10 @@ Required properties for dp-controller: from common clock binding: Shall be "dp". -interrupt-parent: phandle to Interrupt combiner node. + -phys: + from general PHY binding: the phandle for the PHY device. + -phy-names: + from general PHY binding: Should be "dp". -samsung,color-space: input video data format. COLOR_RGB = 0, COLOR_YCBCR422 = 1, COLOR_YCBCR444 = 2 @@ -68,11 +72,8 @@ SOC specific portion: clocks = <&clock 342>; clock-names = "dp"; - dptx-phy { - reg = <0x10040720>; - samsung,enable-mask = <1>; - }; - + phys = <&dp_phy>; + phy-names = "dp"; }; Board Specific portion: diff --git a/Documentation/extcon/porting-android-switch-class b/Documentation/extcon/porting-android-switch-class index eb0fa5f4fe88..5377f6317961 100644 --- a/Documentation/extcon/porting-android-switch-class +++ b/Documentation/extcon/porting-android-switch-class @@ -25,8 +25,10 @@ MyungJoo Ham <myungjoo.ham@samsung.com> @print_state: no change but type change (switch_dev->extcon_dev) - switch_dev_register(sdev, dev) - => extcon_dev_register(edev, dev) - : no change but type change (sdev->edev) + => extcon_dev_register(edev) + : type change (sdev->edev) + : remove second param('dev'). if edev has parent device, should store + 'dev' to 'edev.dev.parent' before registering extcon device - switch_dev_unregister(sdev) => extcon_dev_unregister(edev) : no change but type change (sdev->edev) diff --git a/Documentation/mic/mic_overview.txt b/Documentation/mic/mic_overview.txt new file mode 100644 index 000000000000..b41929224804 --- /dev/null +++ b/Documentation/mic/mic_overview.txt @@ -0,0 +1,51 @@ +An Intel MIC X100 device is a PCIe form factor add-in coprocessor +card based on the Intel Many Integrated Core (MIC) architecture +that runs a Linux OS. It is a PCIe endpoint in a platform and therefore +implements the three required standard address spaces i.e. configuration, +memory and I/O. The host OS loads a device driver as is typical for +PCIe devices. The card itself runs a bootstrap after reset that +transfers control to the card OS downloaded from the host driver. The +host driver supports OSPM suspend and resume operations. It shuts down +the card during suspend and reboots the card OS during resume. +The card OS as shipped by Intel is a Linux kernel with modifications +for the X100 devices. + +Since it is a PCIe card, it does not have the ability to host hardware +devices for networking, storage and console. We provide these devices +on X100 coprocessors thus enabling a self-bootable equivalent environment +for applications. A key benefit of our solution is that it leverages +the standard virtio framework for network, disk and console devices, +though in our case the virtio framework is used across a PCIe bus. + +Here is a block diagram of the various components described above. The +virtio backends are situated on the host rather than the card given better +single threaded performance for the host compared to MIC, the ability of +the host to initiate DMA's to/from the card using the MIC DMA engine and +the fact that the virtio block storage backend can only be on the host. + + | + +----------+ | +----------+ + | Card OS | | | Host OS | + +----------+ | +----------+ + | ++-------+ +--------+ +------+ | +---------+ +--------+ +--------+ +| Virtio| |Virtio | |Virtio| | |Virtio | |Virtio | |Virtio | +| Net | |Console | |Block | | |Net | |Console | |Block | +| Driver| |Driver | |Driver| | |backend | |backend | |backend | ++-------+ +--------+ +------+ | +---------+ +--------+ +--------+ + | | | | | | | + | | | |User | | | + | | | |------|------------|---------|------- + +-------------------+ |Kernel +--------------------------+ + | | | Virtio over PCIe IOCTLs | + | | +--------------------------+ + +--------------+ | | + |Intel MIC | | +---------------+ + |Card Driver | | |Intel MIC | + +--------------+ | |Host Driver | + | | +---------------+ + | | | + +-------------------------------------------------------------+ + | | + | PCIe Bus | + +-------------------------------------------------------------+ diff --git a/Documentation/mic/mpssd/.gitignore b/Documentation/mic/mpssd/.gitignore new file mode 100644 index 000000000000..8b7c72f07c92 --- /dev/null +++ b/Documentation/mic/mpssd/.gitignore @@ -0,0 +1 @@ +mpssd diff --git a/Documentation/mic/mpssd/Makefile b/Documentation/mic/mpssd/Makefile new file mode 100644 index 000000000000..eb860a7d152e --- /dev/null +++ b/Documentation/mic/mpssd/Makefile @@ -0,0 +1,19 @@ +# +# Makefile - Intel MIC User Space Tools. +# Copyright(c) 2013, Intel Corporation. +# +ifdef DEBUG +CFLAGS += $(USERWARNFLAGS) -I. -g -Wall -DDEBUG=$(DEBUG) +else +CFLAGS += $(USERWARNFLAGS) -I. -g -Wall +endif + +mpssd: mpssd.o sysfs.o + $(CC) $(CFLAGS) -o $@ $^ -lpthread + +install: + install mpssd /usr/sbin/mpssd + install micctrl /usr/sbin/micctrl + +clean: + rm -f mpssd *.o diff --git a/Documentation/mic/mpssd/micctrl b/Documentation/mic/mpssd/micctrl new file mode 100755 index 000000000000..8f2629b41c5f --- /dev/null +++ b/Documentation/mic/mpssd/micctrl @@ -0,0 +1,173 @@ +#!/bin/bash +# Intel MIC Platform Software Stack (MPSS) +# +# Copyright(c) 2013 Intel Corporation. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License, version 2, as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. +# +# The full GNU General Public License is included in this distribution in +# the file called "COPYING". +# +# Intel MIC User Space Tools. +# +# micctrl - Controls MIC boot/start/stop. +# +# chkconfig: 2345 95 05 +# description: start MPSS stack processing. +# +### BEGIN INIT INFO +# Provides: micctrl +### END INIT INFO + +# Source function library. +. /etc/init.d/functions + +sysfs="/sys/class/mic" + +_status() +{ + f=$sysfs/$1 + echo -e $1 state: "`cat $f/state`" shutdown_status: "`cat $f/shutdown_status`" +} + +status() +{ + if [ "`echo $1 | head -c3`" == "mic" ]; then + _status $1 + return $? + fi + for f in $sysfs/* + do + _status `basename $f` + RETVAL=$? + [ $RETVAL -ne 0 ] && return $RETVAL + done + return 0 +} + +_reset() +{ + f=$sysfs/$1 + echo reset > $f/state +} + +reset() +{ + if [ "`echo $1 | head -c3`" == "mic" ]; then + _reset $1 + return $? + fi + for f in $sysfs/* + do + _reset `basename $f` + RETVAL=$? + [ $RETVAL -ne 0 ] && return $RETVAL + done + return 0 +} + +_boot() +{ + f=$sysfs/$1 + echo "linux" > $f/bootmode + echo "mic/uos.img" > $f/firmware + echo "mic/$1.image" > $f/ramdisk + echo "boot" > $f/state +} + +boot() +{ + if [ "`echo $1 | head -c3`" == "mic" ]; then + _boot $1 + return $? + fi + for f in $sysfs/* + do + _boot `basename $f` + RETVAL=$? + [ $RETVAL -ne 0 ] && return $RETVAL + done + return 0 +} + +_shutdown() +{ + f=$sysfs/$1 + echo shutdown > $f/state +} + +shutdown() +{ + if [ "`echo $1 | head -c3`" == "mic" ]; then + _shutdown $1 + return $? + fi + for f in $sysfs/* + do + _shutdown `basename $f` + RETVAL=$? + [ $RETVAL -ne 0 ] && return $RETVAL + done + return 0 +} + +_wait() +{ + f=$sysfs/$1 + while [ "`cat $f/state`" != "offline" -a "`cat $f/state`" != "online" ] + do + sleep 1 + echo -e "Waiting for $1 to go offline" + done +} + +wait() +{ + if [ "`echo $1 | head -c3`" == "mic" ]; then + _wait $1 + return $? + fi + # Wait for the cards to go offline + for f in $sysfs/* + do + _wait `basename $f` + RETVAL=$? + [ $RETVAL -ne 0 ] && return $RETVAL + done + return 0 +} + +if [ ! -d "$sysfs" ]; then + echo -e $"Module unloaded " + exit 3 +fi + +case $1 in + -s) + status $2 + ;; + -r) + reset $2 + ;; + -b) + boot $2 + ;; + -S) + shutdown $2 + ;; + -w) + wait $2 + ;; + *) + echo $"Usage: $0 {-s (status) |-r (reset) |-b (boot) |-S (shutdown) |-w (wait)}" + exit 2 +esac + +exit $? diff --git a/Documentation/mic/mpssd/mpss b/Documentation/mic/mpssd/mpss new file mode 100755 index 000000000000..3136c68dad0b --- /dev/null +++ b/Documentation/mic/mpssd/mpss @@ -0,0 +1,202 @@ +#!/bin/bash +# Intel MIC Platform Software Stack (MPSS) +# +# Copyright(c) 2013 Intel Corporation. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License, version 2, as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. +# +# The full GNU General Public License is included in this distribution in +# the file called "COPYING". +# +# Intel MIC User Space Tools. +# +# mpss Start mpssd. +# +# chkconfig: 2345 95 05 +# description: start MPSS stack processing. +# +### BEGIN INIT INFO +# Provides: mpss +# Required-Start: +# Required-Stop: +# Short-Description: MPSS stack control +# Description: MPSS stack control +### END INIT INFO + +# Source function library. +. /etc/init.d/functions + +exec=/usr/sbin/mpssd +sysfs="/sys/class/mic" + +start() +{ + [ -x $exec ] || exit 5 + + if [ "`ps -e | awk '{print $4}' | grep mpssd | head -1`" = "mpssd" ]; then + echo -e $"MPSSD already running! " + success + echo + return 0 + fi + + echo -e $"Starting MPSS Stack" + echo -e $"Loading MIC_HOST Module" + + # Ensure the driver is loaded + if [ ! -d "$sysfs" ]; then + modprobe mic_host + RETVAL=$? + if [ $RETVAL -ne 0 ]; then + failure + echo + return $RETVAL + fi + fi + + # Start the daemon + echo -n $"Starting MPSSD " + $exec + RETVAL=$? + if [ $RETVAL -ne 0 ]; then + failure + echo + return $RETVAL + fi + success + echo + + sleep 5 + + # Boot the cards + micctrl -b + + # Wait till ping works + for f in $sysfs/* + do + count=100 + ipaddr=`cat $f/cmdline` + ipaddr=${ipaddr#*address,} + ipaddr=`echo $ipaddr | cut -d, -f1 | cut -d\; -f1` + while [ $count -ge 0 ] + do + echo -e "Pinging "`basename $f`" " + ping -c 1 $ipaddr &> /dev/null + RETVAL=$? + if [ $RETVAL -eq 0 ]; then + success + break + fi + sleep 1 + count=`expr $count - 1` + done + [ $RETVAL -ne 0 ] && failure || success + echo + done + return $RETVAL +} + +stop() +{ + echo -e $"Shutting down MPSS Stack: " + + # Bail out if module is unloaded + if [ ! -d "$sysfs" ]; then + echo -n $"Module unloaded " + success + echo + return 0 + fi + + # Shut down the cards. + micctrl -S + + # Wait for the cards to go offline + for f in $sysfs/* + do + while [ "`cat $f/state`" != "offline" ] + do + sleep 1 + echo -e "Waiting for "`basename $f`" to go offline" + done + done + + # Display the status of the cards + micctrl -s + + # Kill MPSSD now + echo -n $"Killing MPSSD" + killall -9 mpssd 2>/dev/null + RETVAL=$? + [ $RETVAL -ne 0 ] && failure || success + echo + return $RETVAL +} + +restart() +{ + stop + sleep 5 + start +} + +status() +{ + micctrl -s + if [ "`ps -e | awk '{print $4}' | grep mpssd | head -n 1`" = "mpssd" ]; then + echo "mpssd is running" + else + echo "mpssd is stopped" + fi + return 0 +} + +unload() +{ + if [ ! -d "$sysfs" ]; then + echo -n $"No MIC_HOST Module: " + success + echo + return + fi + + stop + + sleep 5 + echo -n $"Removing MIC_HOST Module: " + modprobe -r mic_host + RETVAL=$? + [ $RETVAL -ne 0 ] && failure || success + echo + return $RETVAL +} + +case $1 in + start) + start + ;; + stop) + stop + ;; + restart) + restart + ;; + status) + status + ;; + unload) + unload + ;; + *) + echo $"Usage: $0 {start|stop|restart|status|unload}" + exit 2 +esac + +exit $? diff --git a/Documentation/mic/mpssd/mpssd.c b/Documentation/mic/mpssd/mpssd.c new file mode 100644 index 000000000000..0c980ad40b17 --- /dev/null +++ b/Documentation/mic/mpssd/mpssd.c @@ -0,0 +1,1721 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC User Space Tools. + */ + +#define _GNU_SOURCE + +#include <stdlib.h> +#include <fcntl.h> +#include <getopt.h> +#include <assert.h> +#include <unistd.h> +#include <stdbool.h> +#include <signal.h> +#include <poll.h> +#include <features.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <sys/mman.h> +#include <sys/socket.h> +#include <linux/virtio_ring.h> +#include <linux/virtio_net.h> +#include <linux/virtio_console.h> +#include <linux/virtio_blk.h> +#include <linux/version.h> +#include "mpssd.h" +#include <linux/mic_ioctl.h> +#include <linux/mic_common.h> + +static void init_mic(struct mic_info *mic); + +static FILE *logfp; +static struct mic_info mic_list; + +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) + +#define min_t(type, x, y) ({ \ + type __min1 = (x); \ + type __min2 = (y); \ + __min1 < __min2 ? __min1 : __min2; }) + +/* align addr on a size boundary - adjust address up/down if needed */ +#define _ALIGN_DOWN(addr, size) ((addr)&(~((size)-1))) +#define _ALIGN_UP(addr, size) _ALIGN_DOWN(addr + size - 1, size) + +/* align addr on a size boundary - adjust address up if needed */ +#define _ALIGN(addr, size) _ALIGN_UP(addr, size) + +/* to align the pointer to the (next) page boundary */ +#define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE) + +#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x)) + +#define GSO_ENABLED 1 +#define MAX_GSO_SIZE (64 * 1024) +#define ETH_H_LEN 14 +#define MAX_NET_PKT_SIZE (_ALIGN_UP(MAX_GSO_SIZE + ETH_H_LEN, 64)) +#define MIC_DEVICE_PAGE_END 0x1000 + +#ifndef VIRTIO_NET_HDR_F_DATA_VALID +#define VIRTIO_NET_HDR_F_DATA_VALID 2 /* Csum is valid */ +#endif + +static struct { + struct mic_device_desc dd; + struct mic_vqconfig vqconfig[2]; + __u32 host_features, guest_acknowledgements; + struct virtio_console_config cons_config; +} virtcons_dev_page = { + .dd = { + .type = VIRTIO_ID_CONSOLE, + .num_vq = ARRAY_SIZE(virtcons_dev_page.vqconfig), + .feature_len = sizeof(virtcons_dev_page.host_features), + .config_len = sizeof(virtcons_dev_page.cons_config), + }, + .vqconfig[0] = { + .num = htole16(MIC_VRING_ENTRIES), + }, + .vqconfig[1] = { + .num = htole16(MIC_VRING_ENTRIES), + }, +}; + +static struct { + struct mic_device_desc dd; + struct mic_vqconfig vqconfig[2]; + __u32 host_features, guest_acknowledgements; + struct virtio_net_config net_config; +} virtnet_dev_page = { + .dd = { + .type = VIRTIO_ID_NET, + .num_vq = ARRAY_SIZE(virtnet_dev_page.vqconfig), + .feature_len = sizeof(virtnet_dev_page.host_features), + .config_len = sizeof(virtnet_dev_page.net_config), + }, + .vqconfig[0] = { + .num = htole16(MIC_VRING_ENTRIES), + }, + .vqconfig[1] = { + .num = htole16(MIC_VRING_ENTRIES), + }, +#if GSO_ENABLED + .host_features = htole32( + 1 << VIRTIO_NET_F_CSUM | + 1 << VIRTIO_NET_F_GSO | + 1 << VIRTIO_NET_F_GUEST_TSO4 | + 1 << VIRTIO_NET_F_GUEST_TSO6 | + 1 << VIRTIO_NET_F_GUEST_ECN | + 1 << VIRTIO_NET_F_GUEST_UFO), +#else + .host_features = 0, +#endif +}; + +static const char *mic_config_dir = "/etc/sysconfig/mic"; +static const char *virtblk_backend = "VIRTBLK_BACKEND"; +static struct { + struct mic_device_desc dd; + struct mic_vqconfig vqconfig[1]; + __u32 host_features, guest_acknowledgements; + struct virtio_blk_config blk_config; +} virtblk_dev_page = { + .dd = { + .type = VIRTIO_ID_BLOCK, + .num_vq = ARRAY_SIZE(virtblk_dev_page.vqconfig), + .feature_len = sizeof(virtblk_dev_page.host_features), + .config_len = sizeof(virtblk_dev_page.blk_config), + }, + .vqconfig[0] = { + .num = htole16(MIC_VRING_ENTRIES), + }, + .host_features = + htole32(1<<VIRTIO_BLK_F_SEG_MAX), + .blk_config = { + .seg_max = htole32(MIC_VRING_ENTRIES - 2), + .capacity = htole64(0), + } +}; + +static char *myname; + +static int +tap_configure(struct mic_info *mic, char *dev) +{ + pid_t pid; + char *ifargv[7]; + char ipaddr[IFNAMSIZ]; + int ret = 0; + + pid = fork(); + if (pid == 0) { + ifargv[0] = "ip"; + ifargv[1] = "link"; + ifargv[2] = "set"; + ifargv[3] = dev; + ifargv[4] = "up"; + ifargv[5] = NULL; + mpsslog("Configuring %s\n", dev); + ret = execvp("ip", ifargv); + if (ret < 0) { + mpsslog("%s execvp failed errno %s\n", + mic->name, strerror(errno)); + return ret; + } + } + if (pid < 0) { + mpsslog("%s fork failed errno %s\n", + mic->name, strerror(errno)); + return ret; + } + + ret = waitpid(pid, NULL, 0); + if (ret < 0) { + mpsslog("%s waitpid failed errno %s\n", + mic->name, strerror(errno)); + return ret; + } + + snprintf(ipaddr, IFNAMSIZ, "172.31.%d.254/24", mic->id); + + pid = fork(); + if (pid == 0) { + ifargv[0] = "ip"; + ifargv[1] = "addr"; + ifargv[2] = "add"; + ifargv[3] = ipaddr; + ifargv[4] = "dev"; + ifargv[5] = dev; + ifargv[6] = NULL; + mpsslog("Configuring %s ipaddr %s\n", dev, ipaddr); + ret = execvp("ip", ifargv); + if (ret < 0) { + mpsslog("%s execvp failed errno %s\n", + mic->name, strerror(errno)); + return ret; + } + } + if (pid < 0) { + mpsslog("%s fork failed errno %s\n", + mic->name, strerror(errno)); + return ret; + } + + ret = waitpid(pid, NULL, 0); + if (ret < 0) { + mpsslog("%s waitpid failed errno %s\n", + mic->name, strerror(errno)); + return ret; + } + mpsslog("MIC name %s %s %d DONE!\n", + mic->name, __func__, __LINE__); + return 0; +} + +static int tun_alloc(struct mic_info *mic, char *dev) +{ + struct ifreq ifr; + int fd, err; +#if GSO_ENABLED + unsigned offload; +#endif + fd = open("/dev/net/tun", O_RDWR); + if (fd < 0) { + mpsslog("Could not open /dev/net/tun %s\n", strerror(errno)); + goto done; + } + + memset(&ifr, 0, sizeof(ifr)); + + ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR; + if (*dev) + strncpy(ifr.ifr_name, dev, IFNAMSIZ); + + err = ioctl(fd, TUNSETIFF, (void *)&ifr); + if (err < 0) { + mpsslog("%s %s %d TUNSETIFF failed %s\n", + mic->name, __func__, __LINE__, strerror(errno)); + close(fd); + return err; + } +#if GSO_ENABLED + offload = TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 | + TUN_F_TSO_ECN | TUN_F_UFO; + + err = ioctl(fd, TUNSETOFFLOAD, offload); + if (err < 0) { + mpsslog("%s %s %d TUNSETOFFLOAD failed %s\n", + mic->name, __func__, __LINE__, strerror(errno)); + close(fd); + return err; + } +#endif + strcpy(dev, ifr.ifr_name); + mpsslog("Created TAP %s\n", dev); +done: + return fd; +} + +#define NET_FD_VIRTIO_NET 0 +#define NET_FD_TUN 1 +#define MAX_NET_FD 2 + +static void set_dp(struct mic_info *mic, int type, void *dp) +{ + switch (type) { + case VIRTIO_ID_CONSOLE: + mic->mic_console.console_dp = dp; + return; + case VIRTIO_ID_NET: + mic->mic_net.net_dp = dp; + return; + case VIRTIO_ID_BLOCK: + mic->mic_virtblk.block_dp = dp; + return; + } + mpsslog("%s %s %d not found\n", mic->name, __func__, type); + assert(0); +} + +static void *get_dp(struct mic_info *mic, int type) +{ + switch (type) { + case VIRTIO_ID_CONSOLE: + return mic->mic_console.console_dp; + case VIRTIO_ID_NET: + return mic->mic_net.net_dp; + case VIRTIO_ID_BLOCK: + return mic->mic_virtblk.block_dp; + } + mpsslog("%s %s %d not found\n", mic->name, __func__, type); + assert(0); + return NULL; +} + +static struct mic_device_desc *get_device_desc(struct mic_info *mic, int type) +{ + struct mic_device_desc *d; + int i; + void *dp = get_dp(mic, type); + + for (i = mic_aligned_size(struct mic_bootparam); i < PAGE_SIZE; + i += mic_total_desc_size(d)) { + d = dp + i; + + /* End of list */ + if (d->type == 0) + break; + + if (d->type == -1) + continue; + + mpsslog("%s %s d-> type %d d %p\n", + mic->name, __func__, d->type, d); + + if (d->type == (__u8)type) + return d; + } + mpsslog("%s %s %d not found\n", mic->name, __func__, type); + assert(0); + return NULL; +} + +/* See comments in vhost.c for explanation of next_desc() */ +static unsigned next_desc(struct vring_desc *desc) +{ + unsigned int next; + + if (!(le16toh(desc->flags) & VRING_DESC_F_NEXT)) + return -1U; + next = le16toh(desc->next); + return next; +} + +/* Sum up all the IOVEC length */ +static ssize_t +sum_iovec_len(struct mic_copy_desc *copy) +{ + ssize_t sum = 0; + int i; + + for (i = 0; i < copy->iovcnt; i++) + sum += copy->iov[i].iov_len; + return sum; +} + +static inline void verify_out_len(struct mic_info *mic, + struct mic_copy_desc *copy) +{ + if (copy->out_len != sum_iovec_len(copy)) { + mpsslog("%s %s %d BUG copy->out_len 0x%x len 0x%zx\n", + mic->name, __func__, __LINE__, + copy->out_len, sum_iovec_len(copy)); + assert(copy->out_len == sum_iovec_len(copy)); + } +} + +/* Display an iovec */ +static void +disp_iovec(struct mic_info *mic, struct mic_copy_desc *copy, + const char *s, int line) +{ + int i; + + for (i = 0; i < copy->iovcnt; i++) + mpsslog("%s %s %d copy->iov[%d] addr %p len 0x%zx\n", + mic->name, s, line, i, + copy->iov[i].iov_base, copy->iov[i].iov_len); +} + +static inline __u16 read_avail_idx(struct mic_vring *vr) +{ + return ACCESS_ONCE(vr->info->avail_idx); +} + +static inline void txrx_prepare(int type, bool tx, struct mic_vring *vr, + struct mic_copy_desc *copy, ssize_t len) +{ + copy->vr_idx = tx ? 0 : 1; + copy->update_used = true; + if (type == VIRTIO_ID_NET) + copy->iov[1].iov_len = len - sizeof(struct virtio_net_hdr); + else + copy->iov[0].iov_len = len; +} + +/* Central API which triggers the copies */ +static int +mic_virtio_copy(struct mic_info *mic, int fd, + struct mic_vring *vr, struct mic_copy_desc *copy) +{ + int ret; + + ret = ioctl(fd, MIC_VIRTIO_COPY_DESC, copy); + if (ret) { + mpsslog("%s %s %d errno %s ret %d\n", + mic->name, __func__, __LINE__, + strerror(errno), ret); + } + return ret; +} + +/* + * This initialization routine requires at least one + * vring i.e. vr0. vr1 is optional. + */ +static void * +init_vr(struct mic_info *mic, int fd, int type, + struct mic_vring *vr0, struct mic_vring *vr1, int num_vq) +{ + int vr_size; + char *va; + + vr_size = PAGE_ALIGN(vring_size(MIC_VRING_ENTRIES, + MIC_VIRTIO_RING_ALIGN) + sizeof(struct _mic_vring_info)); + va = mmap(NULL, MIC_DEVICE_PAGE_END + vr_size * num_vq, + PROT_READ, MAP_SHARED, fd, 0); + if (MAP_FAILED == va) { + mpsslog("%s %s %d mmap failed errno %s\n", + mic->name, __func__, __LINE__, + strerror(errno)); + goto done; + } + set_dp(mic, type, va); + vr0->va = (struct mic_vring *)&va[MIC_DEVICE_PAGE_END]; + vr0->info = vr0->va + + vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN); + vring_init(&vr0->vr, + MIC_VRING_ENTRIES, vr0->va, MIC_VIRTIO_RING_ALIGN); + mpsslog("%s %s vr0 %p vr0->info %p vr_size 0x%x vring 0x%x ", + __func__, mic->name, vr0->va, vr0->info, vr_size, + vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN)); + mpsslog("magic 0x%x expected 0x%x\n", + vr0->info->magic, MIC_MAGIC + type); + assert(vr0->info->magic == MIC_MAGIC + type); + if (vr1) { + vr1->va = (struct mic_vring *) + &va[MIC_DEVICE_PAGE_END + vr_size]; + vr1->info = vr1->va + vring_size(MIC_VRING_ENTRIES, + MIC_VIRTIO_RING_ALIGN); + vring_init(&vr1->vr, + MIC_VRING_ENTRIES, vr1->va, MIC_VIRTIO_RING_ALIGN); + mpsslog("%s %s vr1 %p vr1->info %p vr_size 0x%x vring 0x%x ", + __func__, mic->name, vr1->va, vr1->info, vr_size, + vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN)); + mpsslog("magic 0x%x expected 0x%x\n", + vr1->info->magic, MIC_MAGIC + type + 1); + assert(vr1->info->magic == MIC_MAGIC + type + 1); + } +done: + return va; +} + +static void +wait_for_card_driver(struct mic_info *mic, int fd, int type) +{ + struct pollfd pollfd; + int err; + struct mic_device_desc *desc = get_device_desc(mic, type); + + pollfd.fd = fd; + mpsslog("%s %s Waiting .... desc-> type %d status 0x%x\n", + mic->name, __func__, type, desc->status); + while (1) { + pollfd.events = POLLIN; + pollfd.revents = 0; + err = poll(&pollfd, 1, -1); + if (err < 0) { + mpsslog("%s %s poll failed %s\n", + mic->name, __func__, strerror(errno)); + continue; + } + + if (pollfd.revents) { + mpsslog("%s %s Waiting... desc-> type %d status 0x%x\n", + mic->name, __func__, type, desc->status); + if (desc->status & VIRTIO_CONFIG_S_DRIVER_OK) { + mpsslog("%s %s poll.revents %d\n", + mic->name, __func__, pollfd.revents); + mpsslog("%s %s desc-> type %d status 0x%x\n", + mic->name, __func__, type, + desc->status); + break; + } + } + } +} + +/* Spin till we have some descriptors */ +static void +spin_for_descriptors(struct mic_info *mic, struct mic_vring *vr) +{ + __u16 avail_idx = read_avail_idx(vr); + + while (avail_idx == le16toh(ACCESS_ONCE(vr->vr.avail->idx))) { +#ifdef DEBUG + mpsslog("%s %s waiting for desc avail %d info_avail %d\n", + mic->name, __func__, + le16toh(vr->vr.avail->idx), vr->info->avail_idx); +#endif + sched_yield(); + } +} + +static void * +virtio_net(void *arg) +{ + static __u8 vnet_hdr[2][sizeof(struct virtio_net_hdr)]; + static __u8 vnet_buf[2][MAX_NET_PKT_SIZE] __aligned(64); + struct iovec vnet_iov[2][2] = { + { { .iov_base = vnet_hdr[0], .iov_len = sizeof(vnet_hdr[0]) }, + { .iov_base = vnet_buf[0], .iov_len = sizeof(vnet_buf[0]) } }, + { { .iov_base = vnet_hdr[1], .iov_len = sizeof(vnet_hdr[1]) }, + { .iov_base = vnet_buf[1], .iov_len = sizeof(vnet_buf[1]) } }, + }; + struct iovec *iov0 = vnet_iov[0], *iov1 = vnet_iov[1]; + struct mic_info *mic = (struct mic_info *)arg; + char if_name[IFNAMSIZ]; + struct pollfd net_poll[MAX_NET_FD]; + struct mic_vring tx_vr, rx_vr; + struct mic_copy_desc copy; + struct mic_device_desc *desc; + int err; + + snprintf(if_name, IFNAMSIZ, "mic%d", mic->id); + mic->mic_net.tap_fd = tun_alloc(mic, if_name); + if (mic->mic_net.tap_fd < 0) + goto done; + + if (tap_configure(mic, if_name)) + goto done; + mpsslog("MIC name %s id %d\n", mic->name, mic->id); + + net_poll[NET_FD_VIRTIO_NET].fd = mic->mic_net.virtio_net_fd; + net_poll[NET_FD_VIRTIO_NET].events = POLLIN; + net_poll[NET_FD_TUN].fd = mic->mic_net.tap_fd; + net_poll[NET_FD_TUN].events = POLLIN; + + if (MAP_FAILED == init_vr(mic, mic->mic_net.virtio_net_fd, + VIRTIO_ID_NET, &tx_vr, &rx_vr, + virtnet_dev_page.dd.num_vq)) { + mpsslog("%s init_vr failed %s\n", + mic->name, strerror(errno)); + goto done; + } + + copy.iovcnt = 2; + desc = get_device_desc(mic, VIRTIO_ID_NET); + + while (1) { + ssize_t len; + + net_poll[NET_FD_VIRTIO_NET].revents = 0; + net_poll[NET_FD_TUN].revents = 0; + + /* Start polling for data from tap and virtio net */ + err = poll(net_poll, 2, -1); + if (err < 0) { + mpsslog("%s poll failed %s\n", + __func__, strerror(errno)); + continue; + } + if (!(desc->status & VIRTIO_CONFIG_S_DRIVER_OK)) + wait_for_card_driver(mic, mic->mic_net.virtio_net_fd, + VIRTIO_ID_NET); + /* + * Check if there is data to be read from TUN and write to + * virtio net fd if there is. + */ + if (net_poll[NET_FD_TUN].revents & POLLIN) { + copy.iov = iov0; + len = readv(net_poll[NET_FD_TUN].fd, + copy.iov, copy.iovcnt); + if (len > 0) { + struct virtio_net_hdr *hdr + = (struct virtio_net_hdr *)vnet_hdr[0]; + + /* Disable checksums on the card since we are on + a reliable PCIe link */ + hdr->flags |= VIRTIO_NET_HDR_F_DATA_VALID; +#ifdef DEBUG + mpsslog("%s %s %d hdr->flags 0x%x ", mic->name, + __func__, __LINE__, hdr->flags); + mpsslog("copy.out_len %d hdr->gso_type 0x%x\n", + copy.out_len, hdr->gso_type); +#endif +#ifdef DEBUG + disp_iovec(mic, copy, __func__, __LINE__); + mpsslog("%s %s %d read from tap 0x%lx\n", + mic->name, __func__, __LINE__, + len); +#endif + spin_for_descriptors(mic, &tx_vr); + txrx_prepare(VIRTIO_ID_NET, 1, &tx_vr, ©, + len); + + err = mic_virtio_copy(mic, + mic->mic_net.virtio_net_fd, &tx_vr, + ©); + if (err < 0) { + mpsslog("%s %s %d mic_virtio_copy %s\n", + mic->name, __func__, __LINE__, + strerror(errno)); + } + if (!err) + verify_out_len(mic, ©); +#ifdef DEBUG + disp_iovec(mic, copy, __func__, __LINE__); + mpsslog("%s %s %d wrote to net 0x%lx\n", + mic->name, __func__, __LINE__, + sum_iovec_len(©)); +#endif + /* Reinitialize IOV for next run */ + iov0[1].iov_len = MAX_NET_PKT_SIZE; + } else if (len < 0) { + disp_iovec(mic, ©, __func__, __LINE__); + mpsslog("%s %s %d read failed %s ", mic->name, + __func__, __LINE__, strerror(errno)); + mpsslog("cnt %d sum %zd\n", + copy.iovcnt, sum_iovec_len(©)); + } + } + + /* + * Check if there is data to be read from virtio net and + * write to TUN if there is. + */ + if (net_poll[NET_FD_VIRTIO_NET].revents & POLLIN) { + while (rx_vr.info->avail_idx != + le16toh(rx_vr.vr.avail->idx)) { + copy.iov = iov1; + txrx_prepare(VIRTIO_ID_NET, 0, &rx_vr, ©, + MAX_NET_PKT_SIZE + + sizeof(struct virtio_net_hdr)); + + err = mic_virtio_copy(mic, + mic->mic_net.virtio_net_fd, &rx_vr, + ©); + if (!err) { +#ifdef DEBUG + struct virtio_net_hdr *hdr + = (struct virtio_net_hdr *) + vnet_hdr[1]; + + mpsslog("%s %s %d hdr->flags 0x%x, ", + mic->name, __func__, __LINE__, + hdr->flags); + mpsslog("out_len %d gso_type 0x%x\n", + copy.out_len, + hdr->gso_type); +#endif + /* Set the correct output iov_len */ + iov1[1].iov_len = copy.out_len - + sizeof(struct virtio_net_hdr); + verify_out_len(mic, ©); +#ifdef DEBUG + disp_iovec(mic, copy, __func__, + __LINE__); + mpsslog("%s %s %d ", + mic->name, __func__, __LINE__); + mpsslog("read from net 0x%lx\n", + sum_iovec_len(copy)); +#endif + len = writev(net_poll[NET_FD_TUN].fd, + copy.iov, copy.iovcnt); + if (len != sum_iovec_len(©)) { + mpsslog("Tun write failed %s ", + strerror(errno)); + mpsslog("len 0x%zx ", len); + mpsslog("read_len 0x%zx\n", + sum_iovec_len(©)); + } else { +#ifdef DEBUG + disp_iovec(mic, ©, __func__, + __LINE__); + mpsslog("%s %s %d ", + mic->name, __func__, + __LINE__); + mpsslog("wrote to tap 0x%lx\n", + len); +#endif + } + } else { + mpsslog("%s %s %d mic_virtio_copy %s\n", + mic->name, __func__, __LINE__, + strerror(errno)); + break; + } + } + } + if (net_poll[NET_FD_VIRTIO_NET].revents & POLLERR) + mpsslog("%s: %s: POLLERR\n", __func__, mic->name); + } +done: + pthread_exit(NULL); +} + +/* virtio_console */ +#define VIRTIO_CONSOLE_FD 0 +#define MONITOR_FD (VIRTIO_CONSOLE_FD + 1) +#define MAX_CONSOLE_FD (MONITOR_FD + 1) /* must be the last one + 1 */ +#define MAX_BUFFER_SIZE PAGE_SIZE + +static void * +virtio_console(void *arg) +{ + static __u8 vcons_buf[2][PAGE_SIZE]; + struct iovec vcons_iov[2] = { + { .iov_base = vcons_buf[0], .iov_len = sizeof(vcons_buf[0]) }, + { .iov_base = vcons_buf[1], .iov_len = sizeof(vcons_buf[1]) }, + }; + struct iovec *iov0 = &vcons_iov[0], *iov1 = &vcons_iov[1]; + struct mic_info *mic = (struct mic_info *)arg; + int err; + struct pollfd console_poll[MAX_CONSOLE_FD]; + int pty_fd; + char *pts_name; + ssize_t len; + struct mic_vring tx_vr, rx_vr; + struct mic_copy_desc copy; + struct mic_device_desc *desc; + + pty_fd = posix_openpt(O_RDWR); + if (pty_fd < 0) { + mpsslog("can't open a pseudoterminal master device: %s\n", + strerror(errno)); + goto _return; + } + pts_name = ptsname(pty_fd); + if (pts_name == NULL) { + mpsslog("can't get pts name\n"); + goto _close_pty; + } + printf("%s console message goes to %s\n", mic->name, pts_name); + mpsslog("%s console message goes to %s\n", mic->name, pts_name); + err = grantpt(pty_fd); + if (err < 0) { + mpsslog("can't grant access: %s %s\n", + pts_name, strerror(errno)); + goto _close_pty; + } + err = unlockpt(pty_fd); + if (err < 0) { + mpsslog("can't unlock a pseudoterminal: %s %s\n", + pts_name, strerror(errno)); + goto _close_pty; + } + console_poll[MONITOR_FD].fd = pty_fd; + console_poll[MONITOR_FD].events = POLLIN; + + console_poll[VIRTIO_CONSOLE_FD].fd = mic->mic_console.virtio_console_fd; + console_poll[VIRTIO_CONSOLE_FD].events = POLLIN; + + if (MAP_FAILED == init_vr(mic, mic->mic_console.virtio_console_fd, + VIRTIO_ID_CONSOLE, &tx_vr, &rx_vr, + virtcons_dev_page.dd.num_vq)) { + mpsslog("%s init_vr failed %s\n", + mic->name, strerror(errno)); + goto _close_pty; + } + + copy.iovcnt = 1; + desc = get_device_desc(mic, VIRTIO_ID_CONSOLE); + + for (;;) { + console_poll[MONITOR_FD].revents = 0; + console_poll[VIRTIO_CONSOLE_FD].revents = 0; + err = poll(console_poll, MAX_CONSOLE_FD, -1); + if (err < 0) { + mpsslog("%s %d: poll failed: %s\n", __func__, __LINE__, + strerror(errno)); + continue; + } + if (!(desc->status & VIRTIO_CONFIG_S_DRIVER_OK)) + wait_for_card_driver(mic, + mic->mic_console.virtio_console_fd, + VIRTIO_ID_CONSOLE); + + if (console_poll[MONITOR_FD].revents & POLLIN) { + copy.iov = iov0; + len = readv(pty_fd, copy.iov, copy.iovcnt); + if (len > 0) { +#ifdef DEBUG + disp_iovec(mic, copy, __func__, __LINE__); + mpsslog("%s %s %d read from tap 0x%lx\n", + mic->name, __func__, __LINE__, + len); +#endif + spin_for_descriptors(mic, &tx_vr); + txrx_prepare(VIRTIO_ID_CONSOLE, 1, &tx_vr, + ©, len); + + err = mic_virtio_copy(mic, + mic->mic_console.virtio_console_fd, + &tx_vr, ©); + if (err < 0) { + mpsslog("%s %s %d mic_virtio_copy %s\n", + mic->name, __func__, __LINE__, + strerror(errno)); + } + if (!err) + verify_out_len(mic, ©); +#ifdef DEBUG + disp_iovec(mic, copy, __func__, __LINE__); + mpsslog("%s %s %d wrote to net 0x%lx\n", + mic->name, __func__, __LINE__, + sum_iovec_len(copy)); +#endif + /* Reinitialize IOV for next run */ + iov0->iov_len = PAGE_SIZE; + } else if (len < 0) { + disp_iovec(mic, ©, __func__, __LINE__); + mpsslog("%s %s %d read failed %s ", + mic->name, __func__, __LINE__, + strerror(errno)); + mpsslog("cnt %d sum %zd\n", + copy.iovcnt, sum_iovec_len(©)); + } + } + + if (console_poll[VIRTIO_CONSOLE_FD].revents & POLLIN) { + while (rx_vr.info->avail_idx != + le16toh(rx_vr.vr.avail->idx)) { + copy.iov = iov1; + txrx_prepare(VIRTIO_ID_CONSOLE, 0, &rx_vr, + ©, PAGE_SIZE); + + err = mic_virtio_copy(mic, + mic->mic_console.virtio_console_fd, + &rx_vr, ©); + if (!err) { + /* Set the correct output iov_len */ + iov1->iov_len = copy.out_len; + verify_out_len(mic, ©); +#ifdef DEBUG + disp_iovec(mic, copy, __func__, + __LINE__); + mpsslog("%s %s %d ", + mic->name, __func__, __LINE__); + mpsslog("read from net 0x%lx\n", + sum_iovec_len(copy)); +#endif + len = writev(pty_fd, + copy.iov, copy.iovcnt); + if (len != sum_iovec_len(©)) { + mpsslog("Tun write failed %s ", + strerror(errno)); + mpsslog("len 0x%zx ", len); + mpsslog("read_len 0x%zx\n", + sum_iovec_len(©)); + } else { +#ifdef DEBUG + disp_iovec(mic, copy, __func__, + __LINE__); + mpsslog("%s %s %d ", + mic->name, __func__, + __LINE__); + mpsslog("wrote to tap 0x%lx\n", + len); +#endif + } + } else { + mpsslog("%s %s %d mic_virtio_copy %s\n", + mic->name, __func__, __LINE__, + strerror(errno)); + break; + } + } + } + if (console_poll[NET_FD_VIRTIO_NET].revents & POLLERR) + mpsslog("%s: %s: POLLERR\n", __func__, mic->name); + } +_close_pty: + close(pty_fd); +_return: + pthread_exit(NULL); +} + +static void +add_virtio_device(struct mic_info *mic, struct mic_device_desc *dd) +{ + char path[PATH_MAX]; + int fd, err; + + snprintf(path, PATH_MAX, "/dev/mic%d", mic->id); + fd = open(path, O_RDWR); + if (fd < 0) { + mpsslog("Could not open %s %s\n", path, strerror(errno)); + return; + } + + err = ioctl(fd, MIC_VIRTIO_ADD_DEVICE, dd); + if (err < 0) { + mpsslog("Could not add %d %s\n", dd->type, strerror(errno)); + close(fd); + return; + } + switch (dd->type) { + case VIRTIO_ID_NET: + mic->mic_net.virtio_net_fd = fd; + mpsslog("Added VIRTIO_ID_NET for %s\n", mic->name); + break; + case VIRTIO_ID_CONSOLE: + mic->mic_console.virtio_console_fd = fd; + mpsslog("Added VIRTIO_ID_CONSOLE for %s\n", mic->name); + break; + case VIRTIO_ID_BLOCK: + mic->mic_virtblk.virtio_block_fd = fd; + mpsslog("Added VIRTIO_ID_BLOCK for %s\n", mic->name); + break; + } +} + +static bool +set_backend_file(struct mic_info *mic) +{ + FILE *config; + char buff[PATH_MAX], *line, *evv, *p; + + snprintf(buff, PATH_MAX, "%s/mpssd%03d.conf", mic_config_dir, mic->id); + config = fopen(buff, "r"); + if (config == NULL) + return false; + do { /* look for "virtblk_backend=XXXX" */ + line = fgets(buff, PATH_MAX, config); + if (line == NULL) + break; + if (*line == '#') + continue; + p = strchr(line, '\n'); + if (p) + *p = '\0'; + } while (strncmp(line, virtblk_backend, strlen(virtblk_backend)) != 0); + fclose(config); + if (line == NULL) + return false; + evv = strchr(line, '='); + if (evv == NULL) + return false; + mic->mic_virtblk.backend_file = malloc(strlen(evv) + 1); + if (mic->mic_virtblk.backend_file == NULL) { + mpsslog("%s %d can't allocate memory\n", mic->name, mic->id); + return false; + } + strcpy(mic->mic_virtblk.backend_file, evv + 1); + return true; +} + +#define SECTOR_SIZE 512 +static bool +set_backend_size(struct mic_info *mic) +{ + mic->mic_virtblk.backend_size = lseek(mic->mic_virtblk.backend, 0, + SEEK_END); + if (mic->mic_virtblk.backend_size < 0) { + mpsslog("%s: can't seek: %s\n", + mic->name, mic->mic_virtblk.backend_file); + return false; + } + virtblk_dev_page.blk_config.capacity = + mic->mic_virtblk.backend_size / SECTOR_SIZE; + if ((mic->mic_virtblk.backend_size % SECTOR_SIZE) != 0) + virtblk_dev_page.blk_config.capacity++; + + virtblk_dev_page.blk_config.capacity = + htole64(virtblk_dev_page.blk_config.capacity); + + return true; +} + +static bool +open_backend(struct mic_info *mic) +{ + if (!set_backend_file(mic)) + goto _error_exit; + mic->mic_virtblk.backend = open(mic->mic_virtblk.backend_file, O_RDWR); + if (mic->mic_virtblk.backend < 0) { + mpsslog("%s: can't open: %s\n", mic->name, + mic->mic_virtblk.backend_file); + goto _error_free; + } + if (!set_backend_size(mic)) + goto _error_close; + mic->mic_virtblk.backend_addr = mmap(NULL, + mic->mic_virtblk.backend_size, + PROT_READ|PROT_WRITE, MAP_SHARED, + mic->mic_virtblk.backend, 0L); + if (mic->mic_virtblk.backend_addr == MAP_FAILED) { + mpsslog("%s: can't map: %s %s\n", + mic->name, mic->mic_virtblk.backend_file, + strerror(errno)); + goto _error_close; + } + return true; + + _error_close: + close(mic->mic_virtblk.backend); + _error_free: + free(mic->mic_virtblk.backend_file); + _error_exit: + return false; +} + +static void +close_backend(struct mic_info *mic) +{ + munmap(mic->mic_virtblk.backend_addr, mic->mic_virtblk.backend_size); + close(mic->mic_virtblk.backend); + free(mic->mic_virtblk.backend_file); +} + +static bool +start_virtblk(struct mic_info *mic, struct mic_vring *vring) +{ + if (((unsigned long)&virtblk_dev_page.blk_config % 8) != 0) { + mpsslog("%s: blk_config is not 8 byte aligned.\n", + mic->name); + return false; + } + add_virtio_device(mic, &virtblk_dev_page.dd); + if (MAP_FAILED == init_vr(mic, mic->mic_virtblk.virtio_block_fd, + VIRTIO_ID_BLOCK, vring, NULL, + virtblk_dev_page.dd.num_vq)) { + mpsslog("%s init_vr failed %s\n", + mic->name, strerror(errno)); + return false; + } + return true; +} + +static void +stop_virtblk(struct mic_info *mic) +{ + int vr_size, ret; + + vr_size = PAGE_ALIGN(vring_size(MIC_VRING_ENTRIES, + MIC_VIRTIO_RING_ALIGN) + sizeof(struct _mic_vring_info)); + ret = munmap(mic->mic_virtblk.block_dp, + MIC_DEVICE_PAGE_END + vr_size * virtblk_dev_page.dd.num_vq); + if (ret < 0) + mpsslog("%s munmap errno %d\n", mic->name, errno); + close(mic->mic_virtblk.virtio_block_fd); +} + +static __u8 +header_error_check(struct vring_desc *desc) +{ + if (le32toh(desc->len) != sizeof(struct virtio_blk_outhdr)) { + mpsslog("%s() %d: length is not sizeof(virtio_blk_outhd)\n", + __func__, __LINE__); + return -EIO; + } + if (!(le16toh(desc->flags) & VRING_DESC_F_NEXT)) { + mpsslog("%s() %d: alone\n", + __func__, __LINE__); + return -EIO; + } + if (le16toh(desc->flags) & VRING_DESC_F_WRITE) { + mpsslog("%s() %d: not read\n", + __func__, __LINE__); + return -EIO; + } + return 0; +} + +static int +read_header(int fd, struct virtio_blk_outhdr *hdr, __u32 desc_idx) +{ + struct iovec iovec; + struct mic_copy_desc copy; + + iovec.iov_len = sizeof(*hdr); + iovec.iov_base = hdr; + copy.iov = &iovec; + copy.iovcnt = 1; + copy.vr_idx = 0; /* only one vring on virtio_block */ + copy.update_used = false; /* do not update used index */ + return ioctl(fd, MIC_VIRTIO_COPY_DESC, ©); +} + +static int +transfer_blocks(int fd, struct iovec *iovec, __u32 iovcnt) +{ + struct mic_copy_desc copy; + + copy.iov = iovec; + copy.iovcnt = iovcnt; + copy.vr_idx = 0; /* only one vring on virtio_block */ + copy.update_used = false; /* do not update used index */ + return ioctl(fd, MIC_VIRTIO_COPY_DESC, ©); +} + +static __u8 +status_error_check(struct vring_desc *desc) +{ + if (le32toh(desc->len) != sizeof(__u8)) { + mpsslog("%s() %d: length is not sizeof(status)\n", + __func__, __LINE__); + return -EIO; + } + return 0; +} + +static int +write_status(int fd, __u8 *status) +{ + struct iovec iovec; + struct mic_copy_desc copy; + + iovec.iov_base = status; + iovec.iov_len = sizeof(*status); + copy.iov = &iovec; + copy.iovcnt = 1; + copy.vr_idx = 0; /* only one vring on virtio_block */ + copy.update_used = true; /* Update used index */ + return ioctl(fd, MIC_VIRTIO_COPY_DESC, ©); +} + +static void * +virtio_block(void *arg) +{ + struct mic_info *mic = (struct mic_info *)arg; + int ret; + struct pollfd block_poll; + struct mic_vring vring; + __u16 avail_idx; + __u32 desc_idx; + struct vring_desc *desc; + struct iovec *iovec, *piov; + __u8 status; + __u32 buffer_desc_idx; + struct virtio_blk_outhdr hdr; + void *fos; + + for (;;) { /* forever */ + if (!open_backend(mic)) { /* No virtblk */ + for (mic->mic_virtblk.signaled = 0; + !mic->mic_virtblk.signaled;) + sleep(1); + continue; + } + + /* backend file is specified. */ + if (!start_virtblk(mic, &vring)) + goto _close_backend; + iovec = malloc(sizeof(*iovec) * + le32toh(virtblk_dev_page.blk_config.seg_max)); + if (!iovec) { + mpsslog("%s: can't alloc iovec: %s\n", + mic->name, strerror(ENOMEM)); + goto _stop_virtblk; + } + + block_poll.fd = mic->mic_virtblk.virtio_block_fd; + block_poll.events = POLLIN; + for (mic->mic_virtblk.signaled = 0; + !mic->mic_virtblk.signaled;) { + block_poll.revents = 0; + /* timeout in 1 sec to see signaled */ + ret = poll(&block_poll, 1, 1000); + if (ret < 0) { + mpsslog("%s %d: poll failed: %s\n", + __func__, __LINE__, + strerror(errno)); + continue; + } + + if (!(block_poll.revents & POLLIN)) { +#ifdef DEBUG + mpsslog("%s %d: block_poll.revents=0x%x\n", + __func__, __LINE__, block_poll.revents); +#endif + continue; + } + + /* POLLIN */ + while (vring.info->avail_idx != + le16toh(vring.vr.avail->idx)) { + /* read header element */ + avail_idx = + vring.info->avail_idx & + (vring.vr.num - 1); + desc_idx = le16toh( + vring.vr.avail->ring[avail_idx]); + desc = &vring.vr.desc[desc_idx]; +#ifdef DEBUG + mpsslog("%s() %d: avail_idx=%d ", + __func__, __LINE__, + vring.info->avail_idx); + mpsslog("vring.vr.num=%d desc=%p\n", + vring.vr.num, desc); +#endif + status = header_error_check(desc); + ret = read_header( + mic->mic_virtblk.virtio_block_fd, + &hdr, desc_idx); + if (ret < 0) { + mpsslog("%s() %d %s: ret=%d %s\n", + __func__, __LINE__, + mic->name, ret, + strerror(errno)); + break; + } + /* buffer element */ + piov = iovec; + status = 0; + fos = mic->mic_virtblk.backend_addr + + (hdr.sector * SECTOR_SIZE); + buffer_desc_idx = next_desc(desc); + desc_idx = buffer_desc_idx; + for (desc = &vring.vr.desc[buffer_desc_idx]; + desc->flags & VRING_DESC_F_NEXT; + desc_idx = next_desc(desc), + desc = &vring.vr.desc[desc_idx]) { + piov->iov_len = desc->len; + piov->iov_base = fos; + piov++; + fos += desc->len; + } + /* Returning NULLs for VIRTIO_BLK_T_GET_ID. */ + if (hdr.type & ~(VIRTIO_BLK_T_OUT | + VIRTIO_BLK_T_GET_ID)) { + /* + VIRTIO_BLK_T_IN - does not do + anything. Probably for documenting. + VIRTIO_BLK_T_SCSI_CMD - for + virtio_scsi. + VIRTIO_BLK_T_FLUSH - turned off in + config space. + VIRTIO_BLK_T_BARRIER - defined but not + used in anywhere. + */ + mpsslog("%s() %d: type %x ", + __func__, __LINE__, + hdr.type); + mpsslog("is not supported\n"); + status = -ENOTSUP; + + } else { + ret = transfer_blocks( + mic->mic_virtblk.virtio_block_fd, + iovec, + piov - iovec); + if (ret < 0 && + status != 0) + status = ret; + } + /* write status and update used pointer */ + if (status != 0) + status = status_error_check(desc); + ret = write_status( + mic->mic_virtblk.virtio_block_fd, + &status); +#ifdef DEBUG + mpsslog("%s() %d: write status=%d on desc=%p\n", + __func__, __LINE__, + status, desc); +#endif + } + } + free(iovec); +_stop_virtblk: + stop_virtblk(mic); +_close_backend: + close_backend(mic); + } /* forever */ + + pthread_exit(NULL); +} + +static void +reset(struct mic_info *mic) +{ +#define RESET_TIMEOUT 120 + int i = RESET_TIMEOUT; + setsysfs(mic->name, "state", "reset"); + while (i) { + char *state; + state = readsysfs(mic->name, "state"); + if (!state) + goto retry; + mpsslog("%s: %s %d state %s\n", + mic->name, __func__, __LINE__, state); + + /* + * If the shutdown was initiated by OSPM, the state stays + * in "suspended" which is also a valid condition for reset. + */ + if ((!strcmp(state, "offline")) || + (!strcmp(state, "suspended"))) { + free(state); + break; + } + free(state); +retry: + sleep(1); + i--; + } +} + +static int +get_mic_shutdown_status(struct mic_info *mic, char *shutdown_status) +{ + if (!strcmp(shutdown_status, "nop")) + return MIC_NOP; + if (!strcmp(shutdown_status, "crashed")) + return MIC_CRASHED; + if (!strcmp(shutdown_status, "halted")) + return MIC_HALTED; + if (!strcmp(shutdown_status, "poweroff")) + return MIC_POWER_OFF; + if (!strcmp(shutdown_status, "restart")) + return MIC_RESTART; + mpsslog("%s: BUG invalid status %s\n", mic->name, shutdown_status); + /* Invalid state */ + assert(0); +}; + +static int get_mic_state(struct mic_info *mic, char *state) +{ + if (!strcmp(state, "offline")) + return MIC_OFFLINE; + if (!strcmp(state, "online")) + return MIC_ONLINE; + if (!strcmp(state, "shutting_down")) + return MIC_SHUTTING_DOWN; + if (!strcmp(state, "reset_failed")) + return MIC_RESET_FAILED; + if (!strcmp(state, "suspending")) + return MIC_SUSPENDING; + if (!strcmp(state, "suspended")) + return MIC_SUSPENDED; + mpsslog("%s: BUG invalid state %s\n", mic->name, state); + /* Invalid state */ + assert(0); +}; + +static void mic_handle_shutdown(struct mic_info *mic) +{ +#define SHUTDOWN_TIMEOUT 60 + int i = SHUTDOWN_TIMEOUT, ret, stat = 0; + char *shutdown_status; + while (i) { + shutdown_status = readsysfs(mic->name, "shutdown_status"); + if (!shutdown_status) + continue; + mpsslog("%s: %s %d shutdown_status %s\n", + mic->name, __func__, __LINE__, shutdown_status); + switch (get_mic_shutdown_status(mic, shutdown_status)) { + case MIC_RESTART: + mic->restart = 1; + case MIC_HALTED: + case MIC_POWER_OFF: + case MIC_CRASHED: + free(shutdown_status); + goto reset; + default: + break; + } + free(shutdown_status); + sleep(1); + i--; + } +reset: + ret = kill(mic->pid, SIGTERM); + mpsslog("%s: %s %d kill pid %d ret %d\n", + mic->name, __func__, __LINE__, + mic->pid, ret); + if (!ret) { + ret = waitpid(mic->pid, &stat, + WIFSIGNALED(stat)); + mpsslog("%s: %s %d waitpid ret %d pid %d\n", + mic->name, __func__, __LINE__, + ret, mic->pid); + } + if (ret == mic->pid) + reset(mic); +} + +static void * +mic_config(void *arg) +{ + struct mic_info *mic = (struct mic_info *)arg; + char *state = NULL; + char pathname[PATH_MAX]; + int fd, ret; + struct pollfd ufds[1]; + char value[4096]; + + snprintf(pathname, PATH_MAX - 1, "%s/%s/%s", + MICSYSFSDIR, mic->name, "state"); + + fd = open(pathname, O_RDONLY); + if (fd < 0) { + mpsslog("%s: opening file %s failed %s\n", + mic->name, pathname, strerror(errno)); + goto error; + } + + do { + ret = read(fd, value, sizeof(value)); + if (ret < 0) { + mpsslog("%s: Failed to read sysfs entry '%s': %s\n", + mic->name, pathname, strerror(errno)); + goto close_error1; + } +retry: + state = readsysfs(mic->name, "state"); + if (!state) + goto retry; + mpsslog("%s: %s %d state %s\n", + mic->name, __func__, __LINE__, state); + switch (get_mic_state(mic, state)) { + case MIC_SHUTTING_DOWN: + mic_handle_shutdown(mic); + goto close_error; + case MIC_SUSPENDING: + mic->boot_on_resume = 1; + setsysfs(mic->name, "state", "suspend"); + mic_handle_shutdown(mic); + goto close_error; + case MIC_OFFLINE: + if (mic->boot_on_resume) { + setsysfs(mic->name, "state", "boot"); + mic->boot_on_resume = 0; + } + break; + default: + break; + } + free(state); + + ufds[0].fd = fd; + ufds[0].events = POLLERR | POLLPRI; + ret = poll(ufds, 1, -1); + if (ret < 0) { + mpsslog("%s: poll failed %s\n", + mic->name, strerror(errno)); + goto close_error1; + } + } while (1); +close_error: + free(state); +close_error1: + close(fd); +error: + init_mic(mic); + pthread_exit(NULL); +} + +static void +set_cmdline(struct mic_info *mic) +{ + char buffer[PATH_MAX]; + int len; + + len = snprintf(buffer, PATH_MAX, + "clocksource=tsc highres=off nohz=off "); + len += snprintf(buffer + len, PATH_MAX, + "cpufreq_on;corec6_off;pc3_off;pc6_off "); + len += snprintf(buffer + len, PATH_MAX, + "ifcfg=static;address,172.31.%d.1;netmask,255.255.255.0", + mic->id); + + setsysfs(mic->name, "cmdline", buffer); + mpsslog("%s: Command line: \"%s\"\n", mic->name, buffer); + snprintf(buffer, PATH_MAX, "172.31.%d.1", mic->id); + mpsslog("%s: IPADDR: \"%s\"\n", mic->name, buffer); +} + +static void +set_log_buf_info(struct mic_info *mic) +{ + int fd; + off_t len; + char system_map[] = "/lib/firmware/mic/System.map"; + char *map, *temp, log_buf[17] = {'\0'}; + + fd = open(system_map, O_RDONLY); + if (fd < 0) { + mpsslog("%s: Opening System.map failed: %d\n", + mic->name, errno); + return; + } + len = lseek(fd, 0, SEEK_END); + if (len < 0) { + mpsslog("%s: Reading System.map size failed: %d\n", + mic->name, errno); + close(fd); + return; + } + map = mmap(NULL, len, PROT_READ, MAP_PRIVATE, fd, 0); + if (map == MAP_FAILED) { + mpsslog("%s: mmap of System.map failed: %d\n", + mic->name, errno); + close(fd); + return; + } + temp = strstr(map, "__log_buf"); + if (!temp) { + mpsslog("%s: __log_buf not found: %d\n", mic->name, errno); + munmap(map, len); + close(fd); + return; + } + strncpy(log_buf, temp - 19, 16); + setsysfs(mic->name, "log_buf_addr", log_buf); + mpsslog("%s: log_buf_addr: %s\n", mic->name, log_buf); + temp = strstr(map, "log_buf_len"); + if (!temp) { + mpsslog("%s: log_buf_len not found: %d\n", mic->name, errno); + munmap(map, len); + close(fd); + return; + } + strncpy(log_buf, temp - 19, 16); + setsysfs(mic->name, "log_buf_len", log_buf); + mpsslog("%s: log_buf_len: %s\n", mic->name, log_buf); + munmap(map, len); + close(fd); +} + +static void init_mic(struct mic_info *mic); + +static void +change_virtblk_backend(int x, siginfo_t *siginfo, void *p) +{ + struct mic_info *mic; + + for (mic = mic_list.next; mic != NULL; mic = mic->next) + mic->mic_virtblk.signaled = 1/* true */; +} + +static void +init_mic(struct mic_info *mic) +{ + struct sigaction ignore = { + .sa_flags = 0, + .sa_handler = SIG_IGN + }; + struct sigaction act = { + .sa_flags = SA_SIGINFO, + .sa_sigaction = change_virtblk_backend, + }; + char buffer[PATH_MAX]; + int err; + + /* + * Currently, one virtio block device is supported for each MIC card + * at a time. Any user (or test) can send a SIGUSR1 to the MIC daemon. + * The signal informs the virtio block backend about a change in the + * configuration file which specifies the virtio backend file name on + * the host. Virtio block backend then re-reads the configuration file + * and switches to the new block device. This signalling mechanism may + * not be required once multiple virtio block devices are supported by + * the MIC daemon. + */ + sigaction(SIGUSR1, &ignore, NULL); + + mic->pid = fork(); + switch (mic->pid) { + case 0: + set_log_buf_info(mic); + set_cmdline(mic); + add_virtio_device(mic, &virtcons_dev_page.dd); + add_virtio_device(mic, &virtnet_dev_page.dd); + err = pthread_create(&mic->mic_console.console_thread, NULL, + virtio_console, mic); + if (err) + mpsslog("%s virtcons pthread_create failed %s\n", + mic->name, strerror(err)); + err = pthread_create(&mic->mic_net.net_thread, NULL, + virtio_net, mic); + if (err) + mpsslog("%s virtnet pthread_create failed %s\n", + mic->name, strerror(err)); + err = pthread_create(&mic->mic_virtblk.block_thread, NULL, + virtio_block, mic); + if (err) + mpsslog("%s virtblk pthread_create failed %s\n", + mic->name, strerror(err)); + sigemptyset(&act.sa_mask); + err = sigaction(SIGUSR1, &act, NULL); + if (err) + mpsslog("%s sigaction SIGUSR1 failed %s\n", + mic->name, strerror(errno)); + while (1) + sleep(60); + case -1: + mpsslog("fork failed MIC name %s id %d errno %d\n", + mic->name, mic->id, errno); + break; + default: + if (mic->restart) { + snprintf(buffer, PATH_MAX, "boot"); + setsysfs(mic->name, "state", buffer); + mpsslog("%s restarting mic %d\n", + mic->name, mic->restart); + mic->restart = 0; + } + pthread_create(&mic->config_thread, NULL, mic_config, mic); + } +} + +static void +start_daemon(void) +{ + struct mic_info *mic; + + for (mic = mic_list.next; mic != NULL; mic = mic->next) + init_mic(mic); + + while (1) + sleep(60); +} + +static int +init_mic_list(void) +{ + struct mic_info *mic = &mic_list; + struct dirent *file; + DIR *dp; + int cnt = 0; + + dp = opendir(MICSYSFSDIR); + if (!dp) + return 0; + + while ((file = readdir(dp)) != NULL) { + if (!strncmp(file->d_name, "mic", 3)) { + mic->next = calloc(1, sizeof(struct mic_info)); + if (mic->next) { + mic = mic->next; + mic->id = atoi(&file->d_name[3]); + mic->name = malloc(strlen(file->d_name) + 16); + if (mic->name) + strcpy(mic->name, file->d_name); + mpsslog("MIC name %s id %d\n", mic->name, + mic->id); + cnt++; + } + } + } + + closedir(dp); + return cnt; +} + +void +mpsslog(char *format, ...) +{ + va_list args; + char buffer[4096]; + char ts[52], *ts1; + time_t t; + + if (logfp == NULL) + return; + + va_start(args, format); + vsprintf(buffer, format, args); + va_end(args); + + time(&t); + ts1 = ctime_r(&t, ts); + ts1[strlen(ts1) - 1] = '\0'; + fprintf(logfp, "%s: %s", ts1, buffer); + + fflush(logfp); +} + +int +main(int argc, char *argv[]) +{ + int cnt; + pid_t pid; + + myname = argv[0]; + + logfp = fopen(LOGFILE_NAME, "a+"); + if (!logfp) { + fprintf(stderr, "cannot open logfile '%s'\n", LOGFILE_NAME); + exit(1); + } + pid = fork(); + switch (pid) { + case 0: + break; + case -1: + exit(2); + default: + exit(0); + } + + mpsslog("MIC Daemon start\n"); + + cnt = init_mic_list(); + if (cnt == 0) { + mpsslog("MIC module not loaded\n"); + exit(3); + } + mpsslog("MIC found %d devices\n", cnt); + + start_daemon(); + + exit(0); +} diff --git a/Documentation/mic/mpssd/mpssd.h b/Documentation/mic/mpssd/mpssd.h new file mode 100644 index 000000000000..f5f18b15d9a0 --- /dev/null +++ b/Documentation/mic/mpssd/mpssd.h @@ -0,0 +1,102 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC User Space Tools. + */ +#ifndef _MPSSD_H_ +#define _MPSSD_H_ + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <fcntl.h> +#include <unistd.h> +#include <dirent.h> +#include <libgen.h> +#include <pthread.h> +#include <stdarg.h> +#include <time.h> +#include <errno.h> +#include <sys/dir.h> +#include <sys/ioctl.h> +#include <sys/poll.h> +#include <sys/types.h> +#include <sys/socket.h> +#include <sys/stat.h> +#include <sys/types.h> +#include <sys/mman.h> +#include <sys/utsname.h> +#include <sys/wait.h> +#include <netinet/in.h> +#include <arpa/inet.h> +#include <netdb.h> +#include <pthread.h> +#include <signal.h> +#include <limits.h> +#include <syslog.h> +#include <getopt.h> +#include <net/if.h> +#include <linux/if_tun.h> +#include <linux/if_tun.h> +#include <linux/virtio_ids.h> + +#define MICSYSFSDIR "/sys/class/mic" +#define LOGFILE_NAME "/var/log/mpssd" +#define PAGE_SIZE 4096 + +struct mic_console_info { + pthread_t console_thread; + int virtio_console_fd; + void *console_dp; +}; + +struct mic_net_info { + pthread_t net_thread; + int virtio_net_fd; + int tap_fd; + void *net_dp; +}; + +struct mic_virtblk_info { + pthread_t block_thread; + int virtio_block_fd; + void *block_dp; + volatile sig_atomic_t signaled; + char *backend_file; + int backend; + void *backend_addr; + long backend_size; +}; + +struct mic_info { + int id; + char *name; + pthread_t config_thread; + pid_t pid; + struct mic_console_info mic_console; + struct mic_net_info mic_net; + struct mic_virtblk_info mic_virtblk; + int restart; + int boot_on_resume; + struct mic_info *next; +}; + +__attribute__((format(printf, 1, 2))) +void mpsslog(char *format, ...); +char *readsysfs(char *dir, char *entry); +int setsysfs(char *dir, char *entry, char *value); +#endif diff --git a/Documentation/mic/mpssd/sysfs.c b/Documentation/mic/mpssd/sysfs.c new file mode 100644 index 000000000000..8dd326936083 --- /dev/null +++ b/Documentation/mic/mpssd/sysfs.c @@ -0,0 +1,102 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC User Space Tools. + */ + +#include "mpssd.h" + +#define PAGE_SIZE 4096 + +char * +readsysfs(char *dir, char *entry) +{ + char filename[PATH_MAX]; + char value[PAGE_SIZE]; + char *string = NULL; + int fd; + int len; + + if (dir == NULL) + snprintf(filename, PATH_MAX, "%s/%s", MICSYSFSDIR, entry); + else + snprintf(filename, PATH_MAX, + "%s/%s/%s", MICSYSFSDIR, dir, entry); + + fd = open(filename, O_RDONLY); + if (fd < 0) { + mpsslog("Failed to open sysfs entry '%s': %s\n", + filename, strerror(errno)); + return NULL; + } + + len = read(fd, value, sizeof(value)); + if (len < 0) { + mpsslog("Failed to read sysfs entry '%s': %s\n", + filename, strerror(errno)); + goto readsys_ret; + } + if (len == 0) + goto readsys_ret; + + value[len - 1] = '\0'; + + string = malloc(strlen(value) + 1); + if (string) + strcpy(string, value); + +readsys_ret: + close(fd); + return string; +} + +int +setsysfs(char *dir, char *entry, char *value) +{ + char filename[PATH_MAX]; + char *oldvalue; + int fd, ret = 0; + + if (dir == NULL) + snprintf(filename, PATH_MAX, "%s/%s", MICSYSFSDIR, entry); + else + snprintf(filename, PATH_MAX, "%s/%s/%s", + MICSYSFSDIR, dir, entry); + + oldvalue = readsysfs(dir, entry); + + fd = open(filename, O_RDWR); + if (fd < 0) { + ret = errno; + mpsslog("Failed to open sysfs entry '%s': %s\n", + filename, strerror(errno)); + goto done; + } + + if (!oldvalue || strcmp(value, oldvalue)) { + if (write(fd, value, strlen(value)) < 0) { + ret = errno; + mpsslog("Failed to write new sysfs entry '%s': %s\n", + filename, strerror(errno)); + } + } + close(fd); +done: + if (oldvalue) + free(oldvalue); + return ret; +} diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt index d718bc2ff1cf..bf5dbe3ab8c5 100644 --- a/Documentation/networking/dccp.txt +++ b/Documentation/networking/dccp.txt @@ -18,8 +18,8 @@ Introduction Datagram Congestion Control Protocol (DCCP) is an unreliable, connection oriented protocol designed to solve issues present in UDP and TCP, particularly for real-time and multimedia (streaming) traffic. -It divides into a base protocol (RFC 4340) and plugable congestion control -modules called CCIDs. Like plugable TCP congestion control, at least one CCID +It divides into a base protocol (RFC 4340) and pluggable congestion control +modules called CCIDs. Like pluggable TCP congestion control, at least one CCID needs to be enabled in order for the protocol to function properly. In the Linux implementation, this is the TCP-like CCID2 (RFC 4341). Additional CCIDs, such as the TCP-friendly CCID3 (RFC 4342), are optional. diff --git a/Documentation/networking/e100.txt b/Documentation/networking/e100.txt index 13a32124bca0..f862cf3aff34 100644 --- a/Documentation/networking/e100.txt +++ b/Documentation/networking/e100.txt @@ -103,7 +103,7 @@ Additional Configurations PRO/100 Family of Adapters is e100. As an example, if you install the e100 driver for two PRO/100 adapters - (eth0 and eth1), add the following to a configuraton file in /etc/modprobe.d/ + (eth0 and eth1), add the following to a configuration file in /etc/modprobe.d/ alias eth0 e100 alias eth1 e100 diff --git a/Documentation/networking/ieee802154.txt b/Documentation/networking/ieee802154.txt index 09eb57329f11..22bbc7225f8e 100644 --- a/Documentation/networking/ieee802154.txt +++ b/Documentation/networking/ieee802154.txt @@ -4,7 +4,7 @@ Introduction ============ -The IEEE 802.15.4 working group focuses on standartization of bottom +The IEEE 802.15.4 working group focuses on standardization of bottom two layers: Medium Access Control (MAC) and Physical (PHY). And there are mainly two options available for upper layers: - ZigBee - proprietary protocol from ZigBee Alliance @@ -66,7 +66,7 @@ net_device, with .type = ARPHRD_IEEE802154. Data is exchanged with socket family code via plain sk_buffs. On skb reception skb->cb must contain additional info as described in the struct ieee802154_mac_cb. During packet transmission the skb->cb is used to provide additional data to device's header_ops->create -function. Be aware, that this data can be overriden later (when socket code +function. Be aware that this data can be overridden later (when socket code submits skb to qdisc), so if you need something from that cb later, you should store info in the skb->data on your own. diff --git a/Documentation/networking/l2tp.txt b/Documentation/networking/l2tp.txt index e63fc1f7bf87..c74434de2fa5 100644 --- a/Documentation/networking/l2tp.txt +++ b/Documentation/networking/l2tp.txt @@ -197,7 +197,7 @@ state information because the file format is subject to change. It is implemented to provide extra debug information to help diagnose problems.) Users should use the netlink API. -/proc/net/pppol2tp is also provided for backwards compaibility with +/proc/net/pppol2tp is also provided for backwards compatibility with the original pppol2tp driver. It lists information about L2TPv2 tunnels and sessions only. Its use is discouraged. diff --git a/Documentation/networking/netdev-FAQ.txt b/Documentation/networking/netdev-FAQ.txt index d9112f01c44a..0fe1c6e0dbcd 100644 --- a/Documentation/networking/netdev-FAQ.txt +++ b/Documentation/networking/netdev-FAQ.txt @@ -4,23 +4,23 @@ Information you need to know about netdev Q: What is netdev? -A: It is a mailing list for all network related linux stuff. This includes +A: It is a mailing list for all network-related Linux stuff. This includes anything found under net/ (i.e. core code like IPv6) and drivers/net - (i.e. hardware specific drivers) in the linux source tree. + (i.e. hardware specific drivers) in the Linux source tree. Note that some subsystems (e.g. wireless drivers) which have a high volume of traffic have their own specific mailing lists. - The netdev list is managed (like many other linux mailing lists) through + The netdev list is managed (like many other Linux mailing lists) through VGER ( http://vger.kernel.org/ ) and archives can be found below: http://marc.info/?l=linux-netdev http://www.spinics.net/lists/netdev/ - Aside from subsystems like that mentioned above, all network related linux - development (i.e. RFC, review, comments, etc) takes place on netdev. + Aside from subsystems like that mentioned above, all network-related Linux + development (i.e. RFC, review, comments, etc.) takes place on netdev. -Q: How do the changes posted to netdev make their way into linux? +Q: How do the changes posted to netdev make their way into Linux? A: There are always two trees (git repositories) in play. Both are driven by David Miller, the main network maintainer. There is the "net" tree, @@ -35,7 +35,7 @@ A: There are always two trees (git repositories) in play. Both are driven Q: How often do changes from these trees make it to the mainline Linus tree? A: To understand this, you need to know a bit of background information - on the cadence of linux development. Each new release starts off with + on the cadence of Linux development. Each new release starts off with a two week "merge window" where the main maintainers feed their new stuff to Linus for merging into the mainline tree. After the two weeks, the merge window is closed, and it is called/tagged "-rc1". No new @@ -46,7 +46,7 @@ A: To understand this, you need to know a bit of background information things are in a state of churn), and a week after the last vX.Y-rcN was done, the official "vX.Y" is released. - Relating that to netdev: At the beginning of the 2 week merge window, + Relating that to netdev: At the beginning of the 2-week merge window, the net-next tree will be closed - no new changes/features. The accumulated new content of the past ~10 weeks will be passed onto mainline/Linus via a pull request for vX.Y -- at the same time, @@ -59,16 +59,16 @@ A: To understand this, you need to know a bit of background information IMPORTANT: Do not send new net-next content to netdev during the period during which net-next tree is closed. - Shortly after the two weeks have passed, (and vX.Y-rc1 is released) the + Shortly after the two weeks have passed (and vX.Y-rc1 is released), the tree for net-next reopens to collect content for the next (vX.Y+1) release. If you aren't subscribed to netdev and/or are simply unsure if net-next has re-opened yet, simply check the net-next git repository link above for - any new networking related commits. + any new networking-related commits. The "net" tree continues to collect fixes for the vX.Y content, and is fed back to Linus at regular (~weekly) intervals. Meaning that the - focus for "net" is on stablilization and bugfixes. + focus for "net" is on stabilization and bugfixes. Finally, the vX.Y gets released, and the whole cycle starts over. @@ -217,7 +217,7 @@ A: Attention to detail. Re-read your own work as if you were the to why it happens, and then if necessary, explain why the fix proposed is the best way to get things done. Don't mangle whitespace, and as is common, don't mis-indent function arguments that span multiple lines. - If it is your 1st patch, mail it to yourself so you can test apply + If it is your first patch, mail it to yourself so you can test apply it to an unpatched tree to confirm infrastructure didn't mangle it. Finally, go back and read Documentation/SubmittingPatches to be diff --git a/Documentation/networking/netlink_mmap.txt b/Documentation/networking/netlink_mmap.txt index 533378839546..b26122973525 100644 --- a/Documentation/networking/netlink_mmap.txt +++ b/Documentation/networking/netlink_mmap.txt @@ -45,7 +45,7 @@ processing. Conversion of the reception path involves calling poll() on the file descriptor, once the socket is readable the frames from the ring are -processsed in order until no more messages are available, as indicated by +processed in order until no more messages are available, as indicated by a status word in the frame header. On kernel side, in order to make use of memory mapped I/O on receive, the @@ -56,7 +56,7 @@ Dumps of kernel databases automatically support memory mapped I/O. Conversion of the transmit path involves changing message construction to use memory from the TX ring instead of (usually) a buffer declared on the -stack and setting up the frame header approriately. Optionally poll() can +stack and setting up the frame header appropriately. Optionally poll() can be used to wait for free frames in the TX ring. Structured and definitions for using memory mapped I/O are contained in @@ -231,7 +231,7 @@ Ring setup: if (setsockopt(fd, NETLINK_TX_RING, &req, sizeof(req)) < 0) exit(1) - /* Calculate size of each invididual ring */ + /* Calculate size of each individual ring */ ring_size = req.nm_block_nr * req.nm_block_size; /* Map RX/TX rings. The TX ring is located after the RX ring */ diff --git a/Documentation/networking/operstates.txt b/Documentation/networking/operstates.txt index 97694572338b..355c6d8ef8ad 100644 --- a/Documentation/networking/operstates.txt +++ b/Documentation/networking/operstates.txt @@ -89,8 +89,8 @@ packets. The name 'carrier' and the inversion are historical, think of it as lower layer. Note that for certain kind of soft-devices, which are not managing any -real hardware, there is possible to set this bit from userpsace. -One should use TVL IFLA_CARRIER to do so. +real hardware, it is possible to set this bit from userspace. One +should use TVL IFLA_CARRIER to do so. netif_carrier_ok() can be used to query that bit. diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt index 60d05eb77c64..b89bc82eed46 100644 --- a/Documentation/networking/rxrpc.txt +++ b/Documentation/networking/rxrpc.txt @@ -144,7 +144,7 @@ An overview of the RxRPC protocol: (*) Calls use ACK packets to handle reliability. Data packets are also explicitly sequenced per call. - (*) There are two types of positive acknowledgement: hard-ACKs and soft-ACKs. + (*) There are two types of positive acknowledgment: hard-ACKs and soft-ACKs. A hard-ACK indicates to the far side that all the data received to a point has been received and processed; a soft-ACK indicates that the data has been received but may yet be discarded and re-requested. The sender may diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt index 457b8bbafb08..cdd916da838d 100644 --- a/Documentation/networking/stmmac.txt +++ b/Documentation/networking/stmmac.txt @@ -160,7 +160,7 @@ Where: o pmt: core has the embedded power module (optional). o force_sf_dma_mode: force DMA to use the Store and Forward mode instead of the Threshold. - o force_thresh_dma_mode: force DMA to use the Shreshold mode other than + o force_thresh_dma_mode: force DMA to use the Threshold mode other than the Store and Forward mode. o riwt_off: force to disable the RX watchdog feature and switch to NAPI mode. o fix_mac_speed: this callback is used for modifying some syscfg registers @@ -175,7 +175,7 @@ Where: registers. o custom_cfg/custom_data: this is a custom configuration that can be passed while initializing the resources. - o bsp_priv: another private poiter. + o bsp_priv: another private pointer. For MDIO bus The we have: @@ -271,7 +271,7 @@ reset procedure etc). o dwmac1000_dma.c: dma functions for the GMAC chip; o dwmac1000.h: specific header file for the GMAC; o dwmac100_core: MAC 100 core and dma code; - o dwmac100_dma.c: dma funtions for the MAC chip; + o dwmac100_dma.c: dma functions for the MAC chip; o dwmac1000.h: specific header file for the MAC; o dwmac_lib.c: generic DMA functions shared among chips; o enh_desc.c: functions for handling enhanced descriptors; @@ -364,4 +364,4 @@ Auto-negotiated Link Parter Ability. 10) TODO: o XGMAC is not supported. o Complete the TBI & RTBI support. - o extened VLAN support for 3.70a SYNP GMAC. + o extend VLAN support for 3.70a SYNP GMAC. diff --git a/Documentation/networking/vortex.txt b/Documentation/networking/vortex.txt index 9a8041dcbb53..97282da82b75 100644 --- a/Documentation/networking/vortex.txt +++ b/Documentation/networking/vortex.txt @@ -68,7 +68,7 @@ Module parameters There are several parameters which may be provided to the driver when its module is loaded. These are usually placed in /etc/modprobe.d/*.conf -configuretion files. Example: +configuration files. Example: options 3c59x debug=3 rx_copybreak=300 @@ -178,7 +178,7 @@ max_interrupt_work=N The driver's interrupt service routine can handle many receive and transmit packets in a single invocation. It does this in a loop. - The value of max_interrupt_work governs how mnay times the interrupt + The value of max_interrupt_work governs how many times the interrupt service routine will loop. The default value is 32 loops. If this is exceeded the interrupt service routine gives up and generates a warning message "eth0: Too much work in interrupt". diff --git a/Documentation/networking/x25-iface.txt b/Documentation/networking/x25-iface.txt index 78f662ee0622..7f213b556e85 100644 --- a/Documentation/networking/x25-iface.txt +++ b/Documentation/networking/x25-iface.txt @@ -105,7 +105,7 @@ reduced by the following measures or a combination thereof: later. The lapb module interface was modified to support this. Its data_indication() method should now transparently pass the - netif_rx() return value to the (lapb mopdule) caller. + netif_rx() return value to the (lapb module) caller. (2) Drivers for kernel versions 2.2.x should always check the global variable netdev_dropping when a new frame is received. The driver should only call netif_rx() if netdev_dropping is zero. Otherwise diff --git a/Documentation/phy.txt b/Documentation/phy.txt new file mode 100644 index 000000000000..0103e4b15b0e --- /dev/null +++ b/Documentation/phy.txt @@ -0,0 +1,166 @@ + PHY SUBSYSTEM + Kishon Vijay Abraham I <kishon@ti.com> + +This document explains the Generic PHY Framework along with the APIs provided, +and how-to-use. + +1. Introduction + +*PHY* is the abbreviation for physical layer. It is used to connect a device +to the physical medium e.g., the USB controller has a PHY to provide functions +such as serialization, de-serialization, encoding, decoding and is responsible +for obtaining the required data transmission rate. Note that some USB +controllers have PHY functionality embedded into it and others use an external +PHY. Other peripherals that use PHY include Wireless LAN, Ethernet, +SATA etc. + +The intention of creating this framework is to bring the PHY drivers spread +all over the Linux kernel to drivers/phy to increase code re-use and for +better code maintainability. + +This framework will be of use only to devices that use external PHY (PHY +functionality is not embedded within the controller). + +2. Registering/Unregistering the PHY provider + +PHY provider refers to an entity that implements one or more PHY instances. +For the simple case where the PHY provider implements only a single instance of +the PHY, the framework provides its own implementation of of_xlate in +of_phy_simple_xlate. If the PHY provider implements multiple instances, it +should provide its own implementation of of_xlate. of_xlate is used only for +dt boot case. + +#define of_phy_provider_register(dev, xlate) \ + __of_phy_provider_register((dev), THIS_MODULE, (xlate)) + +#define devm_of_phy_provider_register(dev, xlate) \ + __devm_of_phy_provider_register((dev), THIS_MODULE, (xlate)) + +of_phy_provider_register and devm_of_phy_provider_register macros can be used to +register the phy_provider and it takes device and of_xlate as +arguments. For the dt boot case, all PHY providers should use one of the above +2 macros to register the PHY provider. + +void devm_of_phy_provider_unregister(struct device *dev, + struct phy_provider *phy_provider); +void of_phy_provider_unregister(struct phy_provider *phy_provider); + +devm_of_phy_provider_unregister and of_phy_provider_unregister can be used to +unregister the PHY. + +3. Creating the PHY + +The PHY driver should create the PHY in order for other peripheral controllers +to make use of it. The PHY framework provides 2 APIs to create the PHY. + +struct phy *phy_create(struct device *dev, const struct phy_ops *ops, + struct phy_init_data *init_data); +struct phy *devm_phy_create(struct device *dev, const struct phy_ops *ops, + struct phy_init_data *init_data); + +The PHY drivers can use one of the above 2 APIs to create the PHY by passing +the device pointer, phy ops and init_data. +phy_ops is a set of function pointers for performing PHY operations such as +init, exit, power_on and power_off. *init_data* is mandatory to get a reference +to the PHY in the case of non-dt boot. See section *Board File Initialization* +on how init_data should be used. + +Inorder to dereference the private data (in phy_ops), the phy provider driver +can use phy_set_drvdata() after creating the PHY and use phy_get_drvdata() in +phy_ops to get back the private data. + +4. Getting a reference to the PHY + +Before the controller can make use of the PHY, it has to get a reference to +it. This framework provides the following APIs to get a reference to the PHY. + +struct phy *phy_get(struct device *dev, const char *string); +struct phy *devm_phy_get(struct device *dev, const char *string); + +phy_get and devm_phy_get can be used to get the PHY. In the case of dt boot, +the string arguments should contain the phy name as given in the dt data and +in the case of non-dt boot, it should contain the label of the PHY. +The only difference between the two APIs is that devm_phy_get associates the +device with the PHY using devres on successful PHY get. On driver detach, +release function is invoked on the the devres data and devres data is freed. + +5. Releasing a reference to the PHY + +When the controller no longer needs the PHY, it has to release the reference +to the PHY it has obtained using the APIs mentioned in the above section. The +PHY framework provides 2 APIs to release a reference to the PHY. + +void phy_put(struct phy *phy); +void devm_phy_put(struct device *dev, struct phy *phy); + +Both these APIs are used to release a reference to the PHY and devm_phy_put +destroys the devres associated with this PHY. + +6. Destroying the PHY + +When the driver that created the PHY is unloaded, it should destroy the PHY it +created using one of the following 2 APIs. + +void phy_destroy(struct phy *phy); +void devm_phy_destroy(struct device *dev, struct phy *phy); + +Both these APIs destroy the PHY and devm_phy_destroy destroys the devres +associated with this PHY. + +7. PM Runtime + +This subsystem is pm runtime enabled. So while creating the PHY, +pm_runtime_enable of the phy device created by this subsystem is called and +while destroying the PHY, pm_runtime_disable is called. Note that the phy +device created by this subsystem will be a child of the device that calls +phy_create (PHY provider device). + +So pm_runtime_get_sync of the phy_device created by this subsystem will invoke +pm_runtime_get_sync of PHY provider device because of parent-child relationship. +It should also be noted that phy_power_on and phy_power_off performs +phy_pm_runtime_get_sync and phy_pm_runtime_put respectively. +There are exported APIs like phy_pm_runtime_get, phy_pm_runtime_get_sync, +phy_pm_runtime_put, phy_pm_runtime_put_sync, phy_pm_runtime_allow and +phy_pm_runtime_forbid for performing PM operations. + +8. Board File Initialization + +Certain board file initialization is necessary in order to get a reference +to the PHY in the case of non-dt boot. +Say we have a single device that implements 3 PHYs that of USB, SATA and PCIe, +then in the board file the following initialization should be done. + +struct phy_consumer consumers[] = { + PHY_CONSUMER("dwc3.0", "usb"), + PHY_CONSUMER("pcie.0", "pcie"), + PHY_CONSUMER("sata.0", "sata"), +}; +PHY_CONSUMER takes 2 parameters, first is the device name of the controller +(PHY consumer) and second is the port name. + +struct phy_init_data init_data = { + .consumers = consumers, + .num_consumers = ARRAY_SIZE(consumers), +}; + +static const struct platform_device pipe3_phy_dev = { + .name = "pipe3-phy", + .id = -1, + .dev = { + .platform_data = { + .init_data = &init_data, + }, + }, +}; + +then, while doing phy_create, the PHY driver should pass this init_data + phy_create(dev, ops, pdata->init_data); + +and the controller driver (phy consumer) should pass the port name along with +the device to get a reference to the PHY + phy_get(dev, "pcie"); + +9. DeviceTree Binding + +The documentation for PHY dt binding can be found @ +Documentation/devicetree/bindings/phy/phy-bindings.txt diff --git a/Documentation/pps/pps.txt b/Documentation/pps/pps.txt index d35dcdd82ff6..c03b1be5eb15 100644 --- a/Documentation/pps/pps.txt +++ b/Documentation/pps/pps.txt @@ -66,6 +66,21 @@ In LinuxPPS the PPS sources are simply char devices usually mapped into files /dev/pps0, /dev/pps1, etc.. +PPS with USB to serial devices +------------------------------ + +It is possible to grab the PPS from an USB to serial device. However, +you should take into account the latencies and jitter introduced by +the USB stack. Users has reported clock instability around +-1ms when +synchronized with PPS through USB. This isn't suited for time server +synchronization. + +If your device doesn't report PPS, you can check that the feature is +supported by its driver. Most of the time, you only need to add a call +to usb_serial_handle_dcd_change after checking the DCD status (see +ch341 and pl2303 examples). + + Coding example -------------- diff --git a/Documentation/serial/driver b/Documentation/serial/driver index 067c47d46917..c3a7689a90e6 100644 --- a/Documentation/serial/driver +++ b/Documentation/serial/driver @@ -264,10 +264,6 @@ hardware. Locking: none. Interrupts: caller dependent. - set_wake(port,state) - Enable/disable power management wakeup on serial activity. Not - currently implemented. - type(port) Return a pointer to a string constant describing the specified port, or return NULL, in which case the string 'unknown' is diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt index 8cb4d7842a5f..0e307c94809a 100644 --- a/Documentation/sysrq.txt +++ b/Documentation/sysrq.txt @@ -11,27 +11,29 @@ regardless of whatever else it is doing, unless it is completely locked up. You need to say "yes" to 'Magic SysRq key (CONFIG_MAGIC_SYSRQ)' when configuring the kernel. When running a kernel with SysRq compiled in, /proc/sys/kernel/sysrq controls the functions allowed to be invoked via -the SysRq key. By default the file contains 1 which means that every -possible SysRq request is allowed (in older versions SysRq was disabled -by default, and you were required to specifically enable it at run-time -but this is not the case any more). Here is the list of possible values -in /proc/sys/kernel/sysrq: +the SysRq key. The default value in this file is set by the +CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE config symbol, which itself defaults +to 1. Here is the list of possible values in /proc/sys/kernel/sysrq: 0 - disable sysrq completely 1 - enable all functions of sysrq >1 - bitmask of allowed sysrq functions (see below for detailed function description): - 2 - enable control of console logging level - 4 - enable control of keyboard (SAK, unraw) - 8 - enable debugging dumps of processes etc. - 16 - enable sync command - 32 - enable remount read-only - 64 - enable signalling of processes (term, kill, oom-kill) - 128 - allow reboot/poweroff - 256 - allow nicing of all RT tasks + 2 = 0x2 - enable control of console logging level + 4 = 0x4 - enable control of keyboard (SAK, unraw) + 8 = 0x8 - enable debugging dumps of processes etc. + 16 = 0x10 - enable sync command + 32 = 0x20 - enable remount read-only + 64 = 0x40 - enable signalling of processes (term, kill, oom-kill) + 128 = 0x80 - allow reboot/poweroff + 256 = 0x100 - allow nicing of all RT tasks You can set the value in the file by the following command: echo "number" >/proc/sys/kernel/sysrq +The number may be written here either as decimal or as hexadecimal +with the 0x prefix. CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE must always be +written in hexadecimal. + Note that the value of /proc/sys/kernel/sysrq influences only the invocation via a keyboard. Invocation of any operation via /proc/sysrq-trigger is always allowed (by a user with admin privileges). |