summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2015-06-26acpi: Add acpi_map_pxm_to_online_node()Toshi Kani1-0/+5
The kernel initializes CPU & memory's NUMA topology from ACPI SRAT table. Some other ACPI tables, such as NFIT and DMAR, also contain proximity IDs for their device's NUMA topology. This information can be used to improve performance of these devices. This patch introduces acpi_map_pxm_to_online_node(), which is similar to acpi_map_pxm_to_node(), but always returns an online node. When the mapped node from a given proximity ID is offline, it looks up the node distance table and returns the nearest online node. ACPI device drivers, which are called after the NUMA initialization has completed in the kernel, can call this interface to obtain their device NUMA topology from ACPI tables. Such drivers do not have to deal with offline nodes. A node may be offline when a device proximity ID is unique, SRAT memory entry does not exist, or NUMA is disabled, ex. "numa=off" on x86. This patch also moves the pxm range check from acpi_get_node() to acpi_map_pxm_to_node(). Signed-off-by: Toshi Kani <toshi.kani@hp.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-26libnvdimm, nfit: handle unarmed dimms, mark namespaces read-onlyDan Williams1-0/+2
Upon detection of an unarmed dimm in a region, arrange for descendant BTT, PMEM, or BLK instances to be read-only. A dimm is primarily marked "unarmed" via flags passed by platform firmware (NFIT). The flags in the NFIT memory device sub-structure indicate the state of the data on the nvdimm relative to its energy source or last "flush to persistence". For the most part there is nothing the driver can do but advertise the state of these flags in sysfs and emit a message if firmware indicates that the contents of the device may be corrupted. However, for the case of ACPI_NFIT_MEM_ARMED, the driver can arrange for the block devices incorporating that nvdimm to be marked read-only. This is a safe default as the data is still available and new writes are held off until the administrator either forces read-write mode, or the energy source becomes armed. A 'read_only' attribute is added to REGION devices to allow for overriding the default read-only policy of all descendant block devices. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-26libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memoryRoss Zwisler1-2/+25
The libnvdimm implementation handles allocating dimm address space (DPA) between PMEM and BLK mode interfaces. After DPA has been allocated from a BLK-region to a BLK-namespace the nd_blk driver attaches to handle I/O as a struct bio based block device. Unlike PMEM, BLK is required to handle platform specific details like mmio register formats and memory controller interleave. For this reason the libnvdimm generic nd_blk driver calls back into the bus provider to carry out the I/O. This initial implementation handles the BLK interface defined by the ACPI 6 NFIT [1] and the NVDIMM DSM Interface Example [2] composed from DCR (dimm control region), BDW (block data window), IDT (interleave descriptor) NFIT structures and the hardware register format. [1]: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf [2]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jens Axboe <axboe@fb.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-26nd_btt: atomic sector updatesVishal Verma1-0/+1
BTT stands for Block Translation Table, and is a way to provide power fail sector atomicity semantics for block devices that have the ability to perform byte granularity IO. It relies on the capability of libnvdimm namespace devices to do byte aligned IO. The BTT works as a stacked blocked device, and reserves a chunk of space from the backing device for its accounting metadata. It is a bio-based driver because all IO is done synchronously, and there is no queuing or asynchronous completions at either the device or the driver level. The BTT uses 'lanes' to index into various 'on-disk' data structures, and lanes also act as a synchronization mechanism in case there are more CPUs than available lanes. We did a comparison between two lane lock strategies - first where we kept an atomic counter around that tracked which was the last lane that was used, and 'our' lane was determined by atomically incrementing that. That way, for the nr_cpus > nr_lanes case, theoretically, no CPU would be blocked waiting for a lane. The other strategy was to use the cpu number we're scheduled on to and hash it to a lane number. Theoretically, this could block an IO that could've otherwise run using a different, free lane. But some fio workloads showed that the direct cpu -> lane hash performed faster than tracking 'last lane' - my reasoning is the cache thrash caused by moving the atomic variable made that approach slower than simply waiting out the in-progress IO. This supports the conclusion that the driver can be a very simple bio-based one that does synchronous IOs instead of queuing. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jens Axboe <axboe@fb.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Greg KH <gregkh@linuxfoundation.org> [jmoyer: fix nmi watchdog timeout in btt_map_init] [jmoyer: move btt initialization to module load path] [jmoyer: fix memory leak in the btt initialization path] [jmoyer: Don't overwrite corrupted arenas] Signed-off-by: Vishal Verma <vishal.l.verma@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-25libnvdimm: infrastructure for btt devicesDan Williams1-5/+58
NVDIMM namespaces, in addition to accepting "struct bio" based requests, also have the capability to perform byte-aligned accesses. By default only the bio/block interface is used. However, if another driver can make effective use of the byte-aligned capability it can claim namespace interface and use the byte-aligned ->rw_bytes() interface. The BTT driver is the initial first consumer of this mechanism to allow adding atomic sector update semantics to a pmem or blk namespace. This patch is the sysfs infrastructure to allow configuring a BTT instance for a namespace. Enabling that BTT and performing i/o is in a subsequent patch. Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-25libnvdimm: blk labels and namespace instantiationDan Williams2-0/+28
A blk label set describes a namespace comprised of one or more discontiguous dpa ranges on a single dimm. They may alias with one or more pmem interleave sets that include the given dimm. This is the runtime/volatile configuration infrastructure for sysfs manipulation of 'alt_name', 'uuid', 'size', and 'sector_size'. A later patch will make these settings persistent by writing back the label(s). Unlike pmem namespaces, multiple blk namespaces can be created per region. Once a blk namespace has been created a new seed device (unconfigured child of a parent blk region) is instantiated. As long as a region has 'available_size' != 0 new child namespaces may be created. Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Neil Brown <neilb@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-25libnvdimm: pmem label sets and namespace instantiation.Dan Williams3-0/+38
A complete label set is a PMEM-label per-dimm per-interleave-set where all the UUIDs match and the interleave set cookie matches the hosting interleave set. Present sysfs attributes for manipulation of a PMEM-namespace's 'alt_name', 'uuid', and 'size' attributes. A later patch will make these settings persistent by writing back the label. Note that PMEM allocations grow forwards from the start of an interleave set (lowest dimm-physical-address (DPA)). BLK-namespaces that alias with a PMEM interleave set will grow allocations backward from the highest DPA. Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Neil Brown <neilb@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-25libnvdimm: namespace indices: read and validateDan Williams1-1/+0
This on media label format [1] consists of two index blocks followed by an array of labels. None of these structures are ever updated in place. A sequence number tracks the current active index and the next one to write, while labels are written to free slots. +------------+ | | | nsindex0 | | | +------------+ | | | nsindex1 | | | +------------+ | label0 | +------------+ | label1 | +------------+ | | ....nslot... | | +------------+ | labelN | +------------+ After reading valid labels, store the dpa ranges they claim into per-dimm resource trees. [1]: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf Cc: Neil Brown <neilb@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-25libnvdimm, nfit: add interleave-set state-tracking infrastructureDan Williams1-0/+6
On platforms that have firmware support for reading/writing per-dimm label space, a portion of the dimm may be accessible via an interleave set PMEM mapping in addition to the dimm's BLK (block-data-window aperture(s)) interface. A label, stored in a "configuration data region" on the dimm, disambiguates which dimm addresses are accessed through which exclusive interface. Add infrastructure that allows the kernel to block modifications to a label in the set while any member dimm is active. Note that this is meant only for enforcing "no modifications of active labels" via the coarse ioctl command. Adding/deleting namespaces from an active interleave set is always possible via sysfs. Another aspect of tracking interleave sets is tracking their integrity when DIMMs in a set are physically re-ordered. For this purpose we generate an "interleave-set cookie" that can be recorded in a label and validated against the current configuration. It is the bus provider implementation's responsibility to calculate the interleave set cookie and attach it to a given region. Cc: Neil Brown <neilb@suse.de> Cc: <linux-acpi@vger.kernel.org> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Robert Moore <robert.moore@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-25libnvdimm: support for legacy (non-aliasing) nvdimmsDan Williams3-2/+25
The libnvdimm region driver is an intermediary driver that translates non-volatile "region"s into "namespace" sub-devices that are surfaced by persistent memory block-device drivers (PMEM and BLK). ACPI 6 introduces the concept that a given nvdimm may simultaneously offer multiple access modes to its media through direct PMEM load/store access, or windowed BLK mode. Existing nvdimms mostly implement a PMEM interface, some offer a BLK-like mode, but never both as ACPI 6 defines. If an nvdimm is single interfaced, then there is no need for dimm metadata labels. For these devices we can take the region boundaries directly to create a child namespace device (nd_namespace_io). Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-25libnvdimm, nfit: regions (block-data-window, persistent memory, volatile memory)Dan Williams1-0/+25
A "region" device represents the maximum capacity of a BLK range (mmio block-data-window(s)), or a PMEM range (DAX-capable persistent memory or volatile memory), without regard for aliasing. Aliasing, in the dimm-local address space (DPA), is resolved by metadata on a dimm to designate which exclusive interface will access the aliased DPA ranges. Support for the per-dimm metadata/label arrvies is in a subsequent patch. The name format of "region" devices is "regionN" where, like dimms, N is a global ida index assigned at discovery time. This id is not reliable across reboots nor in the presence of hotplug. Look to attributes of the region or static id-data of the sub-namespace to generate a persistent name. However, if the platform configuration does not change it is reasonable to expect the same region id to be assigned at the next boot. "region"s have 2 generic attributes "size", and "mapping"s where: - size: the BLK accessible capacity or the span of the system physical address range in the case of PMEM. - mappingN: a tuple describing a dimm's contribution to the region's capacity in the format (<nmemX>,<dpa>,<size>). For a PMEM-region there will be at least one mapping per dimm in the interleave set. For a BLK-region there is only "mapping0" listing the starting DPA of the BLK-region and the available DPA capacity of that space (matches "size" above). The max number of mappings per "region" is hard coded per the constraints of sysfs attribute groups. That said the number of mappings per region should never exceed the maximum number of possible dimms in the system. If the current number turns out to not be enough then the "mappings" attribute clarifies how many there are supposed to be. "32 should be enough for anybody...". Cc: Neil Brown <neilb@suse.de> Cc: <linux-acpi@vger.kernel.org> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Robert Moore <robert.moore@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-25libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructureDan Williams3-0/+47
* Implement the device-model infrastructure for loading modules and attaching drivers to nvdimm devices. This is a simple association of a nd-device-type number with a driver that has a bitmask of supported device types. To facilitate userspace bind/unbind operations 'modalias' and 'devtype', that also appear in the uevent, are added as generic sysfs attributes for all nvdimm devices. The reason for the device-type number is to support sub-types within a given parent devtype, be it a vendor-specific sub-type or otherwise. * The first consumer of this infrastructure is the driver for dimm devices. It simply uses control messages to retrieve and store the configuration-data image (label set) from each dimm. Note: nd_device_register() arranges for asynchronous registration of nvdimm bus devices by default. Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Neil Brown <neilb@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-25libnvdimm: control (ioctl) messages for nvdimm_bus and nvdimm devicesDan Williams3-1/+205
Most discovery/configuration of the nvdimm-subsystem is done via sysfs attributes. However, some nvdimm_bus instances, particularly the ACPI.NFIT bus, define a small set of messages that can be passed to the platform. For convenience we derive the initial libnvdimm-ioctl command formats directly from the NFIT DSM Interface Example formats. ND_CMD_SMART: media health and diagnostics ND_CMD_GET_CONFIG_SIZE: size of the label space ND_CMD_GET_CONFIG_DATA: read label space ND_CMD_SET_CONFIG_DATA: write label space ND_CMD_VENDOR: vendor-specific command passthrough ND_CMD_ARS_CAP: report address-range-scrubbing capabilities ND_CMD_ARS_START: initiate scrubbing ND_CMD_ARS_STATUS: report on scrubbing state ND_CMD_SMART_THRESHOLD: configure alarm thresholds for smart events If a platform later defines different commands than this set it is straightforward to extend support to those formats. Most of the commands target a specific dimm. However, the address-range-scrubbing commands target the bus. The 'commands' attribute in sysfs of an nvdimm_bus, or nvdimm, enumerate the supported commands for that object. Cc: <linux-acpi@vger.kernel.org> Cc: Robert Moore <robert.moore@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-25libnvdimm, nfit: dimm/memory-devicesDan Williams1-0/+11
Enable nvdimm devices to be registered on a nvdimm_bus. The kernel assigned device id for nvdimm devicesis dynamic. If userspace needs a more static identifier it should consult a provider-specific attribute. In the case where NFIT is the provider, the 'nmemX/nfit/handle' or 'nmemX/nfit/serial' attributes may be used for this purpose. Cc: Neil Brown <neilb@suse.de> Cc: <linux-acpi@vger.kernel.org> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Robert Moore <robert.moore@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-25libnvdimm: control character device and nvdimm_bus sysfs attributesDan Williams1-1/+5
The control device for a nvdimm_bus is registered as an "nd" class device. The expectation is that there will usually only be one "nd" bus registered under /sys/class/nd. However, we allow for the possibility of multiple buses and they will listed in discovery order as ndctl0...ndctlN. This character device hosts the ioctl for passing control messages. The initial command set has a 1:1 correlation with the commands listed in the by the "NFIT DSM Example" document [1], but this scheme is extensible to future command sets. Note, nd_ioctl() and the backing ->ndctl() implementation are defined in a subsequent patch. This is simply the initial registrations and sysfs attributes. [1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf Cc: Neil Brown <neilb@suse.de> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: <linux-acpi@vger.kernel.org> Cc: Robert Moore <robert.moore@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-25libnvdimm, nfit: initial libnvdimm infrastructure and NFIT supportDan Williams1-0/+34
A struct nvdimm_bus is the anchor device for registering nvdimm resources and interfaces, for example, a character control device, nvdimm devices, and I/O region devices. The ACPI NFIT (NVDIMM Firmware Interface Table) is one possible platform description for such non-volatile memory resources in a system. The nfit.ko driver attaches to the "ACPI0012" device that indicates the presence of the NFIT and parses the table to register a struct nvdimm_bus instance. Cc: <linux-acpi@vger.kernel.org> Cc: Lv Zheng <lv.zheng@intel.com> Cc: Robert Moore <robert.moore@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-05-28e820, efi: add ACPI 6.0 persistent memory typesDan Williams1-1/+2
ACPI 6.0 formalizes e820-type-7 and efi-type-14 as persistent memory. Mark it "reserved" and allow it to be claimed by a persistent memory device driver. This definition is in addition to the Linux kernel's existing type-12 definition that was recently added in support of shipping platforms with NVDIMM support that predate ACPI 6.0 (which now classifies type-12 as OEM reserved). Note, /proc/iomem can be consulted for differentiating legacy "Persistent Memory (legacy)" E820_PRAM vs standard "Persistent Memory" E820_PMEM. Cc: Boaz Harrosh <boaz@plexistor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jens Axboe <axboe@fb.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-05-26ACPICA: Fix for ill-formed GUID strings for NFIT tables.Bob Moore1-8/+8
ACPICA commit 60052949ba2aa7377106870da69b237193d10dc1 Error in transcription from the ACPI spec. Link: https://github.com/acpica/acpica/commit/60052949 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-26ACPICA: acpihelp: Update for new NFIT table GUIDs.Bob Moore1-9/+34
ACPICA commit 83727bed8f715685a63a9f668e73c60496a06054 Add original UUIDs/GUIDs to the acuuid.h file. Cleanup acpihelp output for UUIDs/GUIDs. Link: https://github.com/acpica/acpica/commit/83727bed Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-22ACPICA: Update version to 20150515.Bob Moore1-1/+1
ACPICA commit ed4de2e8b0a5dd6fc17773a055590bff0e995588 Version 20150515. Link: https://github.com/acpica/acpica/commit/ed4de2e8 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-22ACPICA: ACPI 6.0: Add support for NFIT table.Bob Moore2-0/+218
ACPICA commit e4e17ca361373e9b81494bb4ca697a12cef3cba6 NVDIMM Firmware Interface Table. Link: https://github.com/acpica/acpica/commit/e4e17ca3 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-22ACPICA: ACPI 6.0: Add changes for DRTM table.Lv Zheng1-7/+27
ACPICA commit b02b754a2b7afcd0384cb3b31f29eb1be028fe90 This patch adds support for DRTM (Dynamic Root of Trust for Measurement table) in iasl. Lv Zheng. Link: https://github.com/acpica/acpica/commit/b02b754a Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-22ACPICA: ACPI 6.0: Add support for IORT table.Lv Zheng1-0/+126
ACPICA commit 5de82757aef5d6163e37064033aacbce193abbca This patch adds support for IORT (IO Remapping Table) in iasl. Note that some field names are modified to shrink their length or the decompiled IORT ASL will contain fields with ugly ":" alignment. The IORT contains field definitions around "Memory Access Properties". This patch also adds support to encode/decode it using inline table. This patch doesn't add inline table support for the SMMU interrupt fields due to a limitation in current ACPICA data table support. Lv Zheng. Link: https://github.com/acpica/acpica/commit/5de82757 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-22ACPICA: ACPI 6.0: Add ACPI_SUB_PTR().Lv Zheng1-0/+1
ACPICA commit 5de82757aef5d6163e37064033aacbce193abbca Using a minus number with ACPI_ADD_PTR() will cause compiler warnings, such warnings cannot be eliminated by force casting an unsigned value to a signed value. This patch thus introduces ACPI_SUB_PTR() to be used with minus numbers. Lv Zheng. Link: https://github.com/acpica/acpica/commit/5de82757 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-22ACPICA: ACPI 6.0: Add changes for MADT table.Bob Moore1-4/+18
ACPICA commit 02cbb41232bccf7a91967140cab95d5f48291f21 New subtable type. Some additions to existing subtables. Link: https://github.com/acpica/acpica/commit/02cbb412 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-22ACPICA: ACPI 6.0: Add changes for LPIT table.Bob Moore1-18/+2
ACPICA commit d527908bb33a3ed515cfb349cbec57121deafcc8 Second subtable type was removed from the July 2014 LPIT document. Link: https://github.com/acpica/acpica/commit/d527908b Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-22ACPICA: ACPI 6.0: Add changes for FADT table.Bob Moore1-7/+10
ACPICA commit 72b0b6741990f619f6aaa915302836b7cbb41ac4 One new 64-bit field at the end of the table. FADT version is now 6. Link: https://github.com/acpica/acpica/commit/72b0b674 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-22ACPICA: ACPI 6.0: Add support for WPBT table.Bob Moore1-1/+19
ACPICA commit a6ccb4033b49f7aa33a17ddc41dd69d57e799fbd Windows Platform Binary Table. Link: https://github.com/acpica/acpica/commit/a6ccb403 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-22ACPICA: ACPI 6.0: Add support for XENV table.Bob Moore1-0/+18
ACPICA commit 08c4197cf4ddd45f0c961078220b0fc19c10745c Xen Environment table. Link: https://github.com/acpica/acpica/commit/08c4197c Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-22ACPICA: ACPI 6.0: Add support for STAO table.Bob Moore1-0/+16
ACPICA commit 532bf402a503061afd9d80a23e1d3c8fd99b052c _STA override table. Link: https://github.com/acpica/acpica/commit/532bf402 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-22ACPICA: Additional dragon_fly BSD support.Bob Moore2-0/+6
ACPICA commit 3e93431674abe947202b0f9a0afa7b625b17caa6 Makefiles and environment defines. This commit doesn't affect Linux builds. Link: https://github.com/acpica/acpica/commit/3e934316 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-19ACPICA: Divergence: Remove redundant definitions.Lv Zheng1-5/+0
There are two same "define"s in the actypes.h for ACPI_USE_NATIVE_DIVIDE, this patch removes one of them as it is useless and is not in the ACPICA upstream. It is likely that the useless block is there because of the issues in the old ACPICA release process. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-10Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Two patches from the irq departement: - a simple fix to make dummy_irq_chip usable for wakeup scenarios - removal of the gic arch_extn hackery. Now that all users are converted we really want to get rid of the interface so people wont come up with new use cases" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: gic: Drop support for gic_arch_extn genirq: Set IRQCHIP_SKIP_SET_WAKE flag for dummy_irq_chip
2015-05-09Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull block fixes from Jens Axboe: "A collection of fixes since the merge window; - fix for a double elevator module release, from Chao Yu. Ancient bug. - the splice() MORE flag fix from Christophe Leroy. - a fix for NVMe, fixing a patch that went in in the merge window. From Keith. - two fixes for blk-mq CPU hotplug handling, from Ming Lei. - bdi vs blockdev lifetime fix from Neil Brown, fixing and oops in md. - two blk-mq fixes from Shaohua, fixing a race on queue stop and a bad merge issue with FUA writes. - division-by-zero fix for writeback from Tejun. - a block bounce page accounting fix, making sure we inc/dec after bouncing so that pre/post IO pages match up. From Wang YanQing" * 'for-linus' of git://git.kernel.dk/linux-block: splice: sendfile() at once fails for big files blk-mq: don't lose requests if a stopped queue restarts blk-mq: fix FUA request hang block: destroy bdi before blockdev is unregistered. block:bounce: fix call inc_|dec_zone_page_state on different pages confuse value of NR_BOUNCE elevator: fix double release of elevator module writeback: use |1 instead of +1 to protect against div by zero blk-mq: fix CPU hotplug handling blk-mq: fix race between timeout and CPU hotplug NVMe: Fix VPD B0 max sectors translation
2015-05-09Merge tag 'trace-fixes-v4.1-rc2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "The newly added ftrace_print_array_seq() function had a bug in it. Luckily, the only user of it didn't make the 4.1 merge window. But the helper function should be fixed before 4.2 when the users start coming in" * tag 'trace-fixes-v4.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Make ftrace_print_array_seq compute buf_len
2015-05-07Merge tag 'for-linus' of git://github.com/dledford/linuxLinus Torvalds4-5/+31
Pull infiniband updates from Doug Ledford: "Minor updates for 4.1-rc Most of the changes are fairly small and well confined. The iWARP address reporting changes are the only ones that are a medium size. I had these queued up prior to rc1, but due to the shuffle in maintainers, they did not get submitted when I expected. My apologies for that. I feel comfortable with them however due to the testing they've received, so I left them in this submission" * tag 'for-linus' of git://github.com/dledford/linux: MAINTAINERS: Update InfiniBand subsystem maintainer MAINTAINERS: add include/rdma/ to InfiniBand subsystem IPoIB/CM: Fix indentation level iw_cxgb4: Remove negative advice dmesg warnings IB/core: Fix unaligned accesses IB/core: change rdma_gid2ip into void function as it always return zero IB/qib: use arch_phys_wc_add() IB/qib: add acounting for MTRR IB/core: dma unmap optimizations IB/core: dma map/unmap locking optimizations RDMA/cxgb4: Report the actual address of the remote connecting peer RDMA/nes: Report the actual address of the remote connecting peer RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the connecting peer to its clients iw_cxgb4: enforce qp/cq id requirements iw_cxgb4: use BAR2 GTS register for T5 kernel mode CQs iw_cxgb4: 32b platform fixes iw_cxgb4: Cleanup register defines/MACROS RDMA/CMA: Canonize IPv4 on IPV6 sockets properly
2015-05-07tracing: Make ftrace_print_array_seq compute buf_lenAlex Bennée1-1/+1
The only caller to this function (__print_array) was getting it wrong by passing the array length instead of buffer length. As the element size was already being passed for other reasons it seems reasonable to push the calculation of buffer length into the function. Link: http://lkml.kernel.org/r/1430320727-14582-1-git-send-email-alex.bennee@linaro.org Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-07Merge tag 'for-linus-4.1b-rc2-tag' of ↵Linus Torvalds2-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen bug fixes from David Vrabel: - fix blkback regression if using persistent grants - fix various event channel related suspend/resume bugs - fix AMD x86 regression with X86_BUG_SYSRET_SS_ATTRS - SWIOTLB on ARM now uses frames <4 GiB (if available) so device only capable of 32-bit DMA work. * tag 'for-linus-4.1b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM hypervisor/x86/xen: Unset X86_BUG_SYSRET_SS_ATTRS on Xen PV guests xen/events: Set irq_info->evtchn before binding the channel to CPU in __startup_pirq() xen/console: Update console event channel on resume xen/xenbus: Update xenbus event channel on resume xen/events: Clear cpu_evtchn_mask before resuming xen-pciback: Add name prefix to global 'permissive' variable xen: Suspend ticks on all CPUs during suspend xen/grant: introduce func gnttab_unmap_refs_sync() xen/blkback: safely unmap purge persistent grants
2015-05-06Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-4/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, but also an uncore PMU driver fix and an uncore PMU driver hardware-enablement addition" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf probe: Fix segfault if passed with ''. perf report: Fix -T/--threads option to work again perf bench numa: Fix immediate meeting of convergence condition perf bench numa: Fixes of --quiet argument perf bench futex: Fix hung wakeup tasks after requeueing perf probe: Fix bug with global variables handling perf top: Fix a segfault when kernel map is restricted. tools lib traceevent: Fix build failure on 32-bit arch perf kmem: Fix compiles on RHEL6/OL6 tools lib api: Undefine _FORTIFY_SOURCE before setting it perf kmem: Consistently use PRIu64 for printing u64 values perf trace: Disable events and drain events when forked workload ends perf trace: Enable events when doing system wide tracing and starting a workload perf/x86/intel/uncore: Move PCI IDs for IMC to uncore driver perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu
2015-05-06nilfs2: fix sanity check of btree level in nilfs_btree_root_broken()Ryusuke Konishi1-1/+1
The range check for b-tree level parameter in nilfs_btree_root_broken() is wrong; it accepts the case of "level == NILFS_BTREE_LEVEL_MAX" even though the level is limited to values in the range of 0 to (NILFS_BTREE_LEVEL_MAX - 1). Since the level parameter is read from storage device and used to index nilfs_btree_path array whose element count is NILFS_BTREE_LEVEL_MAX, it can cause memory overrun during btree operations if the boundary value is set to the level parameter on device. This fixes the broken sanity check and adds a comment to clarify that the upper bound NILFS_BTREE_LEVEL_MAX is exclusive. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-06util_macros.h: have array pointer point to array of constantsGuenter Roeck1-1/+1
Using the new find_closest() macro can result in the following sparse warnings. drivers/hwmon/lm85.c:194:16: warning: incorrect type in initializer (different modifiers) drivers/hwmon/lm85.c:194:16: expected int *__fc_a drivers/hwmon/lm85.c:194:16: got int static const [toplevel] *<noident> drivers/hwmon/lm85.c:210:16: warning: incorrect type in initializer (different modifiers) drivers/hwmon/lm85.c:210:16: expected int *__fc_a drivers/hwmon/lm85.c:210:16: got int const *map This is because the array passed to find_closest() will typically be declared as array of constants, but the macro declares a non-constant pointer to it. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-05IB/core: Fix unaligned accessesDavid Ahern1-3/+4
Addresses the following kernel logs seen during boot of sparc systems: Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Signed-off-by: David Ahern <david.ahern@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-05IB/core: change rdma_gid2ip into void function as it always return zeroHonggang LI1-2/+1
Signed-off-by: Honggang Li <honli@redhat.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds3-1/+22
Pull crypto fixes from Herbert Xu: "This fixes a build problem with bcm63xx and yet another fix to the memzero_explicit function to ensure that the memset is not elided" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: hwrng: bcm63xx - Fix driver compilation lib: make memzero_explicit more robust against dead store elimination
2015-05-05RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the ↵Tatyana Nikolova2-0/+26
connecting peer to its clients Add functionality to enable the port mapper on the passive side to provide to its clients the actual (non-mapped) ip/tcp address information of the connecting peer 1) Adding remote_info_cb() to process the address info of the connecting peer The address info is provided by the user space port mapper service when the connection is initiated by the peer 2) Adding a hash list to store the remote address info 3) Adding functionality to add/remove the remote address info After the info has been provided to the port mapper client, it is removed from the hash list Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-04blk-mq: fix FUA request hangShaohua Li1-1/+1
When a FUA request enters its DATA stage of flush pipeline, the request is added to mq requeue list, the request will then be added to ctx->rq_list. blk_mq_attempt_merge() might merge the request with a bio. Later when the request is finished the flush pipeline, the request->__data_len is 0. Then I only saw the bio gets endio called, the original request never finish. Adding REQ_FLUSH_SEQ into REQ_NOMERGE_FLAGS looks an easy fix. stable: 3.15+ Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-04lib: make memzero_explicit more robust against dead store eliminationDaniel Borkmann3-1/+22
In commit 0b053c951829 ("lib: memzero_explicit: use barrier instead of OPTIMIZER_HIDE_VAR"), we made memzero_explicit() more robust in case LTO would decide to inline memzero_explicit() and eventually find out it could be elimiated as dead store. While using barrier() works well for the case of gcc, recent efforts from LLVMLinux people suggest to use llvm as an alternative to gcc, and there, Stephan found in a simple stand-alone user space example that llvm could nevertheless optimize and thus elimitate the memset(). A similar issue has been observed in the referenced llvm bug report, which is regarded as not-a-bug. Based on some experiments, icc is a bit special on its own, while it doesn't seem to eliminate the memset(), it could do so with an own implementation, and then result in similar findings as with llvm. The fix in this patch now works for all three compilers (also tested with more aggressive optimization levels). Arguably, in the current kernel tree it's more of a theoretical issue, but imho, it's better to be pedantic about it. It's clearly visible with gcc/llvm though, with the below code: if we would have used barrier() only here, llvm would have omitted clearing, not so with barrier_data() variant: static inline void memzero_explicit(void *s, size_t count) { memset(s, 0, count); barrier_data(s); } int main(void) { char buff[20]; memzero_explicit(buff, sizeof(buff)); return 0; } $ gcc -O2 test.c $ gdb a.out (gdb) disassemble main Dump of assembler code for function main: 0x0000000000400400 <+0>: lea -0x28(%rsp),%rax 0x0000000000400405 <+5>: movq $0x0,-0x28(%rsp) 0x000000000040040e <+14>: movq $0x0,-0x20(%rsp) 0x0000000000400417 <+23>: movl $0x0,-0x18(%rsp) 0x000000000040041f <+31>: xor %eax,%eax 0x0000000000400421 <+33>: retq End of assembler dump. $ clang -O2 test.c $ gdb a.out (gdb) disassemble main Dump of assembler code for function main: 0x00000000004004f0 <+0>: xorps %xmm0,%xmm0 0x00000000004004f3 <+3>: movaps %xmm0,-0x18(%rsp) 0x00000000004004f8 <+8>: movl $0x0,-0x8(%rsp) 0x0000000000400500 <+16>: lea -0x18(%rsp),%rax 0x0000000000400505 <+21>: xor %eax,%eax 0x0000000000400507 <+23>: retq End of assembler dump. As gcc, clang, but also icc defines __GNUC__, it's sufficient to define this in compiler-gcc.h only to be picked up. For a fallback or otherwise unsupported compiler, we define it as a barrier. Similarly, for ecc which does not support gcc inline asm. Reference: https://llvm.org/bugs/show_bug.cgi?id=15495 Reported-by: Stephan Mueller <smueller@chronox.de> Tested-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Stephan Mueller <smueller@chronox.de> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: mancha security <mancha1@zoho.com> Cc: Mark Charlebois <charlebm@gmail.com> Cc: Behan Webster <behanw@converseincode.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-05-03Merge tag 'scsi-fixes' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is three logical fixes (as 5 patches). The 3ware class of drivers were causing an oops with multiqueue by tearing down the command mappings after completing the command (where the variables in the command used to tear down the mapping were no-longer valid). There's also a fix for the qnap iscsi target which was choking on us sending it commands that were too long and a fix for the reworked aha1542 allocating GFP_KERNEL under a lock" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: 3w-9xxx: fix command completion race 3w-xxxx: fix command completion race 3w-sas: fix command completion race aha1542: Allocate memory before taking a lock SCSI: add 1024 max sectors black list flag
2015-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds3-5/+19
Pull networking fixes from David Miller: 1) Receive packet length needs to be adjust by 2 on RX to accomodate the two padding bytes in altera_tse driver. From Vlastimil Setka. 2) If rx frame is dropped due to out of memory in macb driver, we leave the receive ring descriptors in an undefined state. From Punnaiah Choudary Kalluri 3) Some netlink subsystems erroneously signal NLM_F_MULTI. That is only for dumps. Fix from Nicolas Dichtel. 4) Fix mis-use of raw rt->rt_pmtu value in ipv4, one must always go via the ipv4_mtu() helper. From Herbert Xu. 5) Fix null deref in bridge netfilter, and miscalculated lengths in jump/goto nf_tables verdicts. From Florian Westphal. 6) Unhash ping sockets properly. 7) Software implementation of BPF divide did 64/32 rather than 64/64 bit divide. The JITs got it right. Fix from Alexei Starovoitov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits) ipv4: Missing sk_nulls_node_init() in ping_unhash(). net: fec: Fix RGMII-ID mode net/mlx4_en: Schedule napi when RX buffers allocation fails netxen_nic: use spin_[un]lock_bh around tx_clean_lock net/mlx4_core: Fix unaligned accesses mlx4_en: Use correct loop cursor in error path. cxgb4: Fix MC1 memory offset calculation bnx2x: Delay during kdump load net: Fix Kernel Panic in bonding driver debugfs file: rlb_hash_table net: dsa: Fix scope of eeprom-length property net: macb: Fix race condition in driver when Rx frame is dropped hv_netvsc: Fix a bug in netvsc_start_xmit() altera_tse: Correct rx packet length mlx4: Fix tx ring affinity_mask creation tipc: fix problem with parallel link synchronization mechanism tipc: remove wrong use of NLM_F_MULTI bridge/nl: remove wrong use of NLM_F_MULTI bridge/mdb: remove wrong use of NLM_F_MULTI net: sched: act_connmark: don't zap skb->nfct trivial: net: systemport: bcmsysport.h: fix 0x0x prefix ...
2015-05-02virtio: fix typo in vring_need_event() doc commentStefan Hajnoczi1-1/+1
Here the "other side" refers to the guest or host. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>