summaryrefslogtreecommitdiff
path: root/kernel/resource.c
AgeCommit message (Collapse)AuthorFilesLines
2016-02-20kernel/resource.c: fix muxed resource handling in __request_region()Simon Guinot1-2/+3
In __request_region, if a conflict with a BUSY and MUXED resource is detected, then the caller goes to sleep and waits for the resource to be released. A pointer on the conflicting resource is kept. At wake-up this pointer is used as a parent to retry to request the region. A first problem is that this pointer might well be invalid (if for example the conflicting resource have already been freed). Another problem is that the next call to __request_region() fails to detect a remaining conflict. The previously conflicting resource is passed as a parameter and __request_region() will look for a conflict among the children of this resource and not at the resource itself. It is likely to succeed anyway, even if there is still a conflict. Instead, the parent of the conflicting resource should be passed to __request_region(). As a fix, this patch doesn't update the parent resource pointer in the case we have to wait for a muxed region right after. Reported-and-tested-by: Vincent Pelletier <plr.vincent@gmail.com> Signed-off-by: Simon Guinot <simon.guinot@sequanux.org> Tested-by: Vincent Donnefort <vdonnefort@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-09restrict /dev/mem to idle io memory rangesDan Williams1-2/+9
This effectively promotes IORESOURCE_BUSY to IORESOURCE_EXCLUSIVE semantics by default. If userspace really believes it is safe to access the memory region it can also perform the extra step of disabling an active driver. This protects device address ranges with read side effects and otherwise directs userspace to use the driver. Persistent memory presents a large "mistake surface" to /dev/mem as now accidental writes can corrupt a filesystem. In general if a device driver is busily using a memory region it already informs other parts of the kernel to not touch it via request_mem_region(). /dev/mem should honor the same safety restriction by default. Debugging a device driver from userspace becomes more difficult with this enabled. Any application using /dev/mem or mmap of sysfs pci resources will now need to perform the extra step of either: 1/ Disabling the driver, for example: echo <device id> > /dev/bus/<parent bus>/drivers/<driver name>/unbind 2/ Rebooting with "iomem=relaxed" on the command line 3/ Recompiling with CONFIG_IO_STRICT_DEVMEM=n Traditional users of /dev/mem like dosemu are unaffected because the first 1MB of memory is not subject to the IO_STRICT_DEVMEM restriction. Legacy X configurations use /dev/mem to talk to graphics hardware, but that functionality has since moved to kernel graphics drivers. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-11mm: enhance region_is_ram() to region_intersects()Dan Williams1-25/+36
region_is_ram() is used to prevent the establishment of aliased mappings to physical "System RAM" with incompatible cache settings. However, it uses "-1" to indicate both "unknown" memory ranges (ranges not described by platform firmware) and "mixed" ranges (where the parameters describe a range that partially overlaps "System RAM"). Fix this up by explicitly tracking the "unknown" vs "mixed" resource cases and returning REGION_INTERSECTS, REGION_MIXED, or REGION_DISJOINT. This re-write also adds support for detecting when the requested region completely eclipses all of a resource. Note, the implementation treats overlaps between "unknown" and the requested memory type as REGION_INTERSECTS. Finally, other memory types can be passed in by name, for now the only usage "System RAM". Suggested-by: Luis R. Rodriguez <mcgrof@suse.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-22mm: Fix bugs in region_is_ram()Toshi Kani1-3/+3
region_is_ram() looks up the iomem_resource table to check if a target range is in RAM. However, it always returns with -1 due to invalid range checks. It always breaks the loop at the first entry of the table. Another issue is that it compares p->flags and flags, but it always fails. flags is declared as int, which makes it as a negative value with IORESOURCE_BUSY (0x80000000) set while p->flags is unsigned long. Fix the range check and flags so that region_is_ram() works as advertised. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Mike Travis <travis@sgi.com> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Roland Dreier <roland@purestorage.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1437088996-28511-4-git-send-email-toshi.kani@hp.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-16kernel/resource.c: remove deprecated __check_region() and friendsJakub Sitnicki1-32/+0
All users of __check_region(), check_region(), and check_mem_region() are gone. We got rid of the last user in v4.0-rc1. Remove them. bloat-o-meter on x86_64 shows: add/remove: 0/3 grow/shrink: 0/0 up/down: 0/-102 (-102) function old new delta __kstrtab___check_region 15 - -15 __ksymtab___check_region 16 - -16 __check_region 71 - -71 Signed-off-by: Jakub Sitnicki <jsitnicki@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-05resources: Move struct resource_list_entry from ACPI into resource coreJiang Liu1-0/+25
Currently ACPI, PCI and pnp all implement the same resource list management with different data structure. We need to transfer from one data structure into another when passing resources from one subsystem into another subsystem. So move struct resource_list_entry from ACPI into resource core and rename it as resource_entry, then it could be reused by different subystems and avoid the data structure conversion. Introduce dedicated header file resource_ext.h instead of embedding it into ioport.h to avoid header file inclusion order issues. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Acked-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-10-14x86: optimize resource lookups for ioremapMike Travis1-0/+36
We have a large university system in the UK that is experiencing very long delays modprobing the driver for a specific I/O device. The delay is from 8-10 minutes per device and there are 31 devices in the system. This 4 to 5 hour delay in starting up those I/O devices is very much a burden on the customer. There are two causes for requiring a restart/reload of the drivers. First is periodic preventive maintenance (PM) and the second is if any of the devices experience a fatal error. Both of these trigger this excessively long delay in bringing the system back up to full capability. The problem was tracked down to a very slow IOREMAP operation and the excessively long ioresource lookup to insure that the user is not attempting to ioremap RAM. These patches provide a speed up to that function. The modprobe time appears to be affected quite a bit by previous activity on the ioresource list, which I suspect is due to cache preloading. While the overall improvement is impacted by other overhead of starting the devices, this drastically improves the modprobe time. Also our system is considerably smaller so the percentages gained will not be the same. Best case improvement with the modprobe on our 20 device smallish system was from 'real 5m51.913s' to 'real 0m18.275s'. This patch (of 2): Since the ioremap operation is verifying that the specified address range is NOT RAM, it will search the entire ioresource list if the condition is true. To make matters worse, it does this one 4k page at a time. For a 128M BAR region this is 32 passes to determine the entire region does not contain any RAM addresses. This patch provides another resource lookup function, region_is_ram, that searches for the entire region specified, verifying that it is completely contained within the resource region. If it is found, then it is checked to be RAM or not, within a single pass. The return result reflects if it was found or not (-1), and whether it is RAM (1) or not (0). This allows the caller to fallback to the previous page by page search if it was not found. [akpm@linux-foundation.org: fix spellos and typos in comment] Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Alex Thorlton <athorlton@sgi.com> Reviewed-by: Cliff Wickman <cpw@sgi.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mark Salter <msalter@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09Merge tag 'pci-v3.18-changes' of ↵Linus Torvalds1-0/+70
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "The interesting things here are: - Turn on Config Request Retry Status Software Visibility. This caused hangs last time, but we included a fix this time. - Rework PCI device configuration to use _HPP/_HPX more aggressively - Allow PCI devices to be put into D3cold during system suspend - Add arm64 PCI support - Add APM X-Gene host bridge driver - Add TI Keystone host bridge driver - Add Xilinx AXI host bridge driver More detailed summary: Enumeration - Check Vendor ID only for Config Request Retry Status (Rajat Jain) - Enable Config Request Retry Status when supported (Rajat Jain) - Add generic domain handling (Catalin Marinas) - Generate uppercase hex for modalias interface class (Ricardo Ribalda Delgado) Resource management - Add missing MEM_64 mask in pci_assign_unassigned_bridge_resources() (Yinghai Lu) - Increase IBM ipr SAS Crocodile BARs to at least system page size (Douglas Lehr) PCI device hotplug - Prevent NULL dereference during pciehp probe (Andreas Noever) - Move _HPP & _HPX handling into core (Bjorn Helgaas) - Apply _HPP to PCIe devices as well as PCI (Bjorn Helgaas) - Apply _HPP/_HPX to display devices (Bjorn Helgaas) - Preserve SERR & PARITY settings when applying _HPP/_HPX (Bjorn Helgaas) - Preserve MPS and MRRS settings when applying _HPP/_HPX (Bjorn Helgaas) - Apply _HPP/_HPX to all devices, not just hot-added ones (Bjorn Helgaas) - Fix wait time in pciehp timeout message (Yinghai Lu) - Add more pciehp Slot Control debug output (Yinghai Lu) - Stop disabling pciehp notifications during init (Yinghai Lu) MSI - Remove arch_msi_check_device() (Alexander Gordeev) - Rename pci_msi_check_device() to pci_msi_supported() (Alexander Gordeev) - Move D0 check into pci_msi_check_device() (Alexander Gordeev) - Remove unused kobject from struct msi_desc (Yijing Wang) - Remove "pos" from the struct msi_desc msi_attrib (Yijing Wang) - Add "msi_bus" sysfs MSI/MSI-X control for endpoints (Yijing Wang) - Use __get_cached_msi_msg() instead of get_cached_msi_msg() (Yijing Wang) - Use __read_msi_msg() instead of read_msi_msg() (Yijing Wang) - Use __write_msi_msg() instead of write_msi_msg() (Yijing Wang) Power management - Drop unused runtime PM support code for PCIe ports (Rafael J. Wysocki) - Allow PCI devices to be put into D3cold during system suspend (Rafael J. Wysocki) AER - Add additional AER error strings (Gong Chen) - Make <linux/aer.h> standalone includable (Thierry Reding) Virtualization - Add ACS quirk for Solarflare SFC9120 & SFC9140 (Alex Williamson) - Add ACS quirk for Intel 10G NICs (Alex Williamson) - Add ACS quirk for AMD A88X southbridge (Marti Raudsepp) - Remove unused pci_find_upstream_pcie_bridge(), pci_get_dma_source() (Alex Williamson) - Add device flag helpers (Ethan Zhao) - Assume all Mellanox devices have broken INTx masking (Gavin Shan) Generic host bridge driver - Fix ioport_map() for !CONFIG_GENERIC_IOMAP (Liviu Dudau) - Add pci_register_io_range() and pci_pio_to_address() (Liviu Dudau) - Define PCI_IOBASE as the base of virtual PCI IO space (Liviu Dudau) - Fix the conversion of IO ranges into IO resources (Liviu Dudau) - Add pci_get_new_domain_nr() and of_get_pci_domain_nr() (Liviu Dudau) - Add support for parsing PCI host bridge resources from DT (Liviu Dudau) - Add pci_remap_iospace() to map bus I/O resources (Liviu Dudau) - Add arm64 architectural support for PCI (Liviu Dudau) APM X-Gene - Add APM X-Gene PCIe driver (Tanmay Inamdar) - Add arm64 DT APM X-Gene PCIe device tree nodes (Tanmay Inamdar) Freescale i.MX6 - Probe in module_init(), not fs_initcall() (Lucas Stach) - Delay enabling reference clock for SS until it stabilizes (Tim Harvey) Marvell MVEBU - Fix uninitialized variable in mvebu_get_tgt_attr() (Thomas Petazzoni) NVIDIA Tegra - Make sure the PCIe PLL is really reset (Eric Yuen) - Add error path tegra_msi_teardown_irq() cleanup (Jisheng Zhang) - Fix extended configuration space mapping (Peter Daifuku) - Implement resource hierarchy (Thierry Reding) - Clear CLKREQ# enable on port disable (Thierry Reding) - Add Tegra124 support (Thierry Reding) ST Microelectronics SPEAr13xx - Pass config resource through reg property (Pratyush Anand) Synopsys DesignWare - Use NULL instead of false (Fabio Estevam) - Parse bus-range property from devicetree (Lucas Stach) - Use pci_create_root_bus() instead of pci_scan_root_bus() (Lucas Stach) - Remove pci_assign_unassigned_resources() (Lucas Stach) - Check private_data validity in single place (Lucas Stach) - Setup and clear exactly one MSI at a time (Lucas Stach) - Remove open-coded bitmap operations (Lucas Stach) - Fix configuration base address when using 'reg' (Minghuan Lian) - Fix IO resource end address calculation (Minghuan Lian) - Rename get_msi_data() to get_msi_addr() (Minghuan Lian) - Add get_msi_data() to pcie_host_ops (Minghuan Lian) - Add support for v3.65 hardware (Murali Karicheri) - Fold struct pcie_port_info into struct pcie_port (Pratyush Anand) TI Keystone - Add TI Keystone PCIe driver (Murali Karicheri) - Limit MRSS for all downstream devices (Murali Karicheri) - Assume controller is already in RC mode (Murali Karicheri) - Set device ID based on SoC to support multiple ports (Murali Karicheri) Xilinx AXI - Add Xilinx AXI PCIe driver (Srikanth Thokala) - Fix xilinx_pcie_assign_msi() return value test (Dan Carpenter) Miscellaneous - Clean up whitespace (Quentin Lambert) - Remove assignments from "if" conditions (Quentin Lambert) - Move PCI_VENDOR_ID_VMWARE to pci_ids.h (Francesco Ruggeri) - x86: Mark DMI tables as initialization data (Mathias Krause) - x86: Move __init annotation to the correct place (Mathias Krause) - x86: Mark constants of pci_mmcfg_nvidia_mcp55() as __initconst (Mathias Krause) - x86: Constify pci_mmcfg_probes[] array (Mathias Krause) - x86: Mark PCI BIOS initialization code as such (Mathias Krause) - Parenthesize PCI_DEVID and PCI_VPD_LRDT_ID parameters (Megan Kamiya) - Remove unnecessary variable in pci_add_dynid() (Tobias Klauser)" * tag 'pci-v3.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (109 commits) arm64: dts: Add APM X-Gene PCIe device tree nodes PCI: Add ACS quirk for AMD A88X southbridge devices PCI: xgene: Add APM X-Gene PCIe driver PCI: designware: Remove open-coded bitmap operations PCI/MSI: Remove unnecessary temporary variable PCI/MSI: Use __write_msi_msg() instead of write_msi_msg() MSI/powerpc: Use __read_msi_msg() instead of read_msi_msg() PCI/MSI: Use __get_cached_msi_msg() instead of get_cached_msi_msg() PCI/MSI: Add "msi_bus" sysfs MSI/MSI-X control for endpoints PCI/MSI: Remove "pos" from the struct msi_desc msi_attrib PCI/MSI: Remove unused kobject from struct msi_desc PCI/MSI: Rename pci_msi_check_device() to pci_msi_supported() PCI/MSI: Move D0 check into pci_msi_check_device() PCI/MSI: Remove arch_msi_check_device() irqchip: armada-370-xp: Remove arch_msi_check_device() PCI/MSI/PPC: Remove arch_msi_check_device() arm64: Add architectural support for PCI PCI: Add pci_remap_iospace() to map bus I/O resources of/pci: Add support for parsing PCI host bridge resources from DT of/pci: Add pci_get_new_domain_nr() and of_get_pci_domain_nr() ... Conflicts: arch/arm64/boot/dts/apm-storm.dtsi
2014-09-05resources: Add device-managed request/release_resource()Thierry Reding1-0/+70
Provide device-managed implementations of the request_resource() and release_resource() functions. Upon failure to request a resource, the new devm_request_resource() function will output an error message for consistent error reporting. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Tejun Heo <tj@kernel.org>
2014-08-30resource: fix the case of null pointer accessVivek Goyal1-7/+4
Richard and Daniel reported that UML is broken due to changes to resource traversal functions. Problem is that iomem_resource.child can be null and new code does not consider that possibility. Old code used a for loop and that loop will not even execute if p was null. Revert back to for() loop logic and bail out if p is null. I also moved sibling_only check out of resource_lock. There is no reason to keep it inside the lock. Following is backtrace of the UML crash. RIP: 0033:[<0000000060039b9f>] RSP: 0000000081459da0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00000000219b3fff RCX: 000000006010d1d9 RDX: 0000000000000001 RSI: 00000000602dfb94 RDI: 0000000081459df8 RBP: 0000000081459de0 R08: 00000000601b59f4 R09: ffffffff0000ff00 R10: ffffffff0000ff00 R11: 0000000081459e88 R12: 0000000081459df8 R13: 00000000219b3fff R14: 00000000602dfb94 R15: 0000000000000000 Kernel panic - not syncing: Segfault with no mm CPU: 0 PID: 1 Comm: swapper Not tainted 3.16.0-10454-g58d08e3 #13 Stack: 00000000 000080d0 81459df0 219b3fff 81459e70 6010d1d9 ffffffff 6033e010 81459e50 6003a269 81459e30 00000000 Call Trace: [<6010d1d9>] ? kclist_add_private+0x0/0xe7 [<6003a269>] walk_system_ram_range+0x61/0xb7 [<6000e859>] ? proc_kcore_init+0x0/0xf1 [<6010d574>] kcore_update_ram+0x4c/0x168 [<6010d72e>] ? kclist_add+0x0/0x2e [<6000e943>] proc_kcore_init+0xea/0xf1 [<6000e859>] ? proc_kcore_init+0x0/0xf1 [<6000e859>] ? proc_kcore_init+0x0/0xf1 [<600189f0>] do_one_initcall+0x13c/0x204 [<6004ca46>] ? parse_args+0x1df/0x2e0 [<6004c82d>] ? parameq+0x0/0x3a [<601b5990>] ? strcpy+0x0/0x18 [<60001e1a>] kernel_init_freeable+0x240/0x31e [<6026f1c0>] kernel_init+0x12/0x148 [<60019fad>] new_thread_handler+0x81/0xa3 Fixes 8c86e70acead629aacb4a ("resource: provide new functions to walk through resources"). Reported-by: Daniel Walter <sahne@0x90.at> Tested-by: Richard Weinberger <richard@nod.at> Tested-by: Toralf Förster <toralf.foerster@gmx.de> Tested-by: Daniel Walter <sahne@0x90.at> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-09resource: provide new functions to walk through resourcesVivek Goyal1-9/+92
I have added two more functions to walk through resources. Currently walk_system_ram_range() deals with pfn and /proc/iomem can contain partial pages. By dealing in pfn, callback function loses the info that last page of a memory range is a partial page and not the full page. So I implemented walk_system_ram_res() which returns u64 values to callback functions and now it properly return start and end address. walk_system_ram_range() uses find_next_system_ram() to find the next ram resource. This in turn only travels through siblings of top level child and does not travers through all the nodes of the resoruce tree. I also need another function where I can walk through all the resources, for example figure out where "GART" aperture is. Figure out where ACPI memory is. So I wrote another function walk_iomem_res() which walks through all /proc/iomem resources and returns matches as asked by caller. Caller can specify "name" of resource, start and end and flags. Got rid of find_next_system_ram_res() and instead implemented more generic find_next_iomem_res() which can be used to traverse top level children only based on an argument. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-23resources: Clarify sanity check messageBjorn Helgaas1-5/+2
The resource map sanity check message is a bit confusing. Change it to be more readable: -resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01 +resource sanity check: requesting [mem 0xfed10000-0xfed15fff], which spans more than pnp 00:01 [mem 0xfed10000-0xfed13fff] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-04-04kernel/resource.c: make reallocate_resource() staticDaeseok Youn1-1/+1
sparse says: kernel/resource.c:518:5: warning: symbol 'reallocate_resource' was not declared. Should it be static? Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-20resources: Set type in __request_region()Bjorn Helgaas1-2/+2
We don't set the type (I/O, memory, etc.) of resources added by __request_region(), which leads to confusing messages like this: address space collision: [io 0x1000-0x107f] conflicts with ACPI CPU throttle [??? 0x00001010-0x00001015 flags 0x80000000] Set the type of a new resource added by __request_region() (used by request_region() and request_mem_region()) to the type of its parent. This makes the resource tree internally consistent and fixes messages like the above, where the ACPI CPU throttle resource really is an I/O port region, but request_region() didn't fill in the type, so %pR didn't know how to print it. Sample dmesg showing the issue at the link below. Link: https://bugzilla.kernel.org/show_bug.cgi?id=71611 Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-02-27resource: Add resource_contains()Bjorn Helgaas1-6/+2
We have two identical copies of resource_contains() already, and more places that could use it. This moves it to ioport.h where it can be shared. resource_contains(struct resource *r1, struct resource *r2) returns true iff r1 and r2 are the same type (most callers already checked this separately) and the r1 address range completely contains r2. In addition, the new resource_contains() checks that both r1 and r2 have addresses assigned to them. If a resource is IORESOURCE_UNSET, it doesn't have a valid address and can't contain or be contained by another resource. Some callers already check this or for res->start. No functional change. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-07-04kernel/resource.c: remove the unneeded assignment in function __find_resourceKevin Hao1-1/+0
This line was introduced by fcb11918 ("resources: add arch hook for preventing allocation in reserved areas"). But the struct tmp was already assigned to *new in the above line, so this seems superfluous. Just remove it. Signed-off-by: Kevin Hao <haokexin@gmail.com> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-07ACPI/APEI: Add parameter check before error injectionChen Gong1-0/+1
When param1 is enabled in EINJ but not assigned with a valid value, sometimes it will cause the error like below: APEI: Can not request [mem 0x7aaa7000-0x7aaa7007] for APEI EINJ Trigger registers It is because some firmware will access target address specified in param1 to trigger the error when injecting memory error. This will cause resource conflict with regular memory. So It must be removed from trigger table resources, but incorrect param1/param2 combination will stop this action. Add extra check to avoid this kind of error. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-04-30mem hotunplug: fix kfree() of bootmem memoryYasuaki Ishimatsu1-13/+55
When hot removing memory presented at boot time, following messages are shown: kernel BUG at mm/slub.c:3409! invalid opcode: 0000 [#1] SMP Modules linked in: ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc ipmi_devintf ipmi_msghandler sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc vfat fat dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr sg i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core e1000e ptp pps_core tpm_infineon ioatdma dca sr_mod cdrom sd_mod crc_t10dif usb_storage megaraid_sas lpfc scsi_transport_fc scsi_tgt scsi_mod CPU 0 Pid: 5091, comm: kworker/0:2 Tainted: G W 3.9.0-rc6+ #15 RIP: kfree+0x232/0x240 Process kworker/0:2 (pid: 5091, threadinfo ffff88084678c000, task ffff88083928ca80) Call Trace: __release_region+0xd4/0xe0 __remove_pages+0x52/0x110 arch_remove_memory+0x89/0xd0 remove_memory+0xc4/0x100 acpi_memory_device_remove+0x6d/0xb1 acpi_device_remove+0x89/0xab __device_release_driver+0x7c/0xf0 device_release_driver+0x2f/0x50 acpi_bus_device_detach+0x6c/0x70 acpi_ns_walk_namespace+0x11a/0x250 acpi_walk_namespace+0xee/0x137 acpi_bus_trim+0x33/0x7a acpi_bus_hot_remove_device+0xc4/0x1a1 acpi_os_execute_deferred+0x27/0x34 process_one_work+0x1f7/0x590 worker_thread+0x11a/0x370 kthread+0xee/0x100 ret_from_fork+0x7c/0xb0 RIP [<ffffffff811c41d2>] kfree+0x232/0x240 RSP <ffff88084678d968> The reason why the messages are shown is to release a resource structure, allocated by bootmem, by kfree(). So when we release a resource structure, we should check whether it is allocated by bootmem or not. But even if we know a resource structure is allocated by bootmem, we cannot release it since SLxB cannot treat it. So for reusing a resource structure, this patch remembers it by using bootmem_resource as follows: When releasing a resource structure by free_resource(), free_resource() checks whether the resource structure is allocated by bootmem or not. If it is allocated by bootmem, free_resource() adds it to bootmem_resource. If it is not allocated by bootmem, free_resource() release it by kfree(). And when getting a new resource structure by get_resource(), get_resource() checks whether bootmem_resource has released resource structures or not. If there is a released resource structure, get_resource() returns it. If there is not a releaed resource structure, get_resource() returns new resource structure allocated by kzalloc(). [akpm@linux-foundation.org: s/get_resource/alloc_resource/] Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30resource: add release_mem_region_adjustable()Toshi Kani1-0/+103
Add release_mem_region_adjustable(), which releases a requested region from a currently busy memory resource. This interface adjusts the matched memory resource accordingly even if the requested region does not match exactly but still fits into. This new interface is intended for memory hot-delete. During bootup, memory resources are inserted from the boot descriptor table, such as EFI Memory Table and e820. Each memory resource entry usually covers the whole contigous memory range. Memory hot-delete request, on the other hand, may target to a particular range of memory resource, and its size can be much smaller than the whole contiguous memory. Since the existing release interfaces like __release_region() require a requested region to be exactly matched to a resource entry, they do not allow a partial resource to be released. This new interface is restrictive (i.e. release under certain conditions), which is consistent with other release interfaces, __release_region() and __release_resource(). Additional release conditions, such as an overlapping region to a resource entry, can be supported after they are confirmed as valid cases. There is no change to the existing interfaces since their restriction is valid for I/O resources. [akpm@linux-foundation.org: use GFP_ATOMIC under write_lock()] [akpm@linux-foundation.org: switch back to GFP_KERNEL, less buggily] [akpm@linux-foundation.org: remove unneeded and wrong kfree(), per Toshi] Signed-off-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by : Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Cc: T Makphaibulchoke <tmac@hp.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30resource: add __adjust_resource() for internal useToshi Kani1-13/+22
Add __adjust_resource(), which is called by adjust_resource() internally after the resource_lock is held. There is no interface change to adjust_resource(). This change allows other functions to call __adjust_resource() internally while the resource_lock is held. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: T Makphaibulchoke <tmac@hp.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-05kernel/resource.c: fix stack overflow in __reserve_region_with_split()T Makphaibulchoke1-12/+38
Using a recursive call add a non-conflicting region in __reserve_region_with_split() could result in a stack overflow in the case that the recursive calls are too deep. Convert the recursive calls to an iterative loop to avoid the problem. Tested on a machine containing 135 regions. The kernel no longer panicked with stack overflow. Also tested with code arbitrarily adding regions with no conflict, embedding two consecutive conflicts and embedding two non-consecutive conflicts. Signed-off-by: T Makphaibulchoke <tmac@hp.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@gmail.com> Cc: Wei Yang <weiyang@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31resource: make sure requested range is included in the root rangeOctavian Purdila1-1/+23
When the requested range is outside of the root range the logic in __reserve_region_with_split will cause an infinite recursion which will overflow the stack as seen in the warning bellow. This particular stack overflow was caused by requesting the (100000000-107ffffff) range while the root range was (0-ffffffff). In this case __request_resource would return the whole root range as conflict range (i.e. 0-ffffffff). Then, the logic in __reserve_region_with_split would continue the recursion requesting the new range as (conflict->end+1, end) which incidentally in this case equals the originally requested range. This patch aborts looking for an usable range when the request does not intersect with the root range. When the request partially overlaps with the root range, it ajust the request to fall in the root range and then continues with the new request. When the request is modified or aborted errors and a stack trace are logged to allow catching the errors in the upper layers. [ 5.968374] WARNING: at kernel/sched.c:4129 sub_preempt_count+0x63/0x89() [ 5.975150] Modules linked in: [ 5.978184] Pid: 1, comm: swapper Not tainted 3.0.22-mid27-00004-gb72c817 #46 [ 5.985324] Call Trace: [ 5.987759] [<c1039dfc>] ? console_unlock+0x17b/0x18d [ 5.992891] [<c1039620>] warn_slowpath_common+0x48/0x5d [ 5.998194] [<c1031758>] ? sub_preempt_count+0x63/0x89 [ 6.003412] [<c1039644>] warn_slowpath_null+0xf/0x13 [ 6.008453] [<c1031758>] sub_preempt_count+0x63/0x89 [ 6.013499] [<c14d60c4>] _raw_spin_unlock+0x27/0x3f [ 6.018453] [<c10c6349>] add_partial+0x36/0x3b [ 6.022973] [<c10c7c0a>] deactivate_slab+0x96/0xb4 [ 6.027842] [<c14cf9d9>] __slab_alloc.isra.54.constprop.63+0x204/0x241 [ 6.034456] [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38 [ 6.039842] [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38 [ 6.045232] [<c10c7dc9>] kmem_cache_alloc_trace+0x51/0xb0 [ 6.050710] [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38 [ 6.056100] [<c103f78f>] kzalloc.constprop.5+0x29/0x38 [ 6.061320] [<c17b45e9>] __reserve_region_with_split+0x1c/0xd1 [ 6.067230] [<c17b4693>] __reserve_region_with_split+0xc6/0xd1 ... [ 7.179057] [<c17b4693>] __reserve_region_with_split+0xc6/0xd1 [ 7.184970] [<c17b4779>] reserve_region_with_split+0x30/0x42 [ 7.190709] [<c17a8ebf>] e820_reserve_resources_late+0xd1/0xe9 [ 7.196623] [<c17c9526>] pcibios_resource_survey+0x23/0x2a [ 7.202184] [<c17cad8a>] pcibios_init+0x23/0x35 [ 7.206789] [<c17ca574>] pci_subsys_init+0x3f/0x44 [ 7.211659] [<c1002088>] do_one_initcall+0x72/0x122 [ 7.216615] [<c17ca535>] ? pci_legacy_init+0x3d/0x3d [ 7.221659] [<c17a27ff>] kernel_init+0xa6/0x118 [ 7.226265] [<c17a2759>] ? start_kernel+0x334/0x334 [ 7.231223] [<c14d7482>] kernel_thread_helper+0x6/0x10 Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-14resources: allow adjust_resource() for resources with no parentYinghai Lu1-5/+8
If a resource has no parent, allow its start/end to be set arbitrarily as long as any children are still contained within the new range. [bhelgaas: changelog] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-06-01kernel/resource.c: correct the comment of allocate_resource()Wei Yang1-2/+2
In the comment of allocate_resource(), the explanation of parameter max and min is not correct. Actually, these two parameters are used to specify the range of the resource that will be allocated, not the min/max size that will be allocated. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-04kernel/resource.c: move EXPORT_SYMBOL right after definitionCong Wang1-2/+1
EXPORT_SYMBOL(adjust_resource) should be right after adjust_resource(). Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-10-31kernel: Map most files to use export.h instead of module.hPaul Gortmaker1-1/+1
The changed files were only including linux/module.h for the EXPORT_SYMBOL infrastructure, and nothing else. Revector them onto the isolated export header for faster compile times. Nothing to see here but a whole lot of instances of: -#include <linux/module.h> +#include <linux/export.h> This commit is only changing the kernel dir; next targets will probably be mm, fs, the arch dirs, etc. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-09-30Resource: fix wrong resource window calculationRam Pai1-1/+6
__find_resource() incorrectly returns a resource window which overlaps an existing allocated window. This happens when the parent's resource-window spans 0x00000000 to 0xffffffff and is entirely allocated to all its children resource-windows. __find_resource() looks for gaps in resource allocation among the children resource windows. When it encounters the last child window it blindly tries the range next to one allocated to the last child. Since the last child's window ends at 0xffffffff the calculation overflows, leading the algorithm to believe that any window in the range 0x0000000 to 0xfffffff is available for allocation. This leads to a conflicting window allocation. Michal Ludvig reported this issue seen on his platform. The following patch fixes the problem and has been verified by Michal. I believe this bug has been there for ages. It got exposed by git commit 2bbc6942273b ("PCI : ability to relocate assigned pci-resources") Signed-off-by: Ram Pai <linuxram@us.ibm.com> Tested-by: Michal Ludvig <mludvig@logix.net.nz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-30resources: Add lookup_resource()Geert Uytterhoeven1-0/+21
Add a function to find an existing resource by a resource start address. This allows to implement simple allocators (with a malloc/free-alike API) on top of the resource system. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2011-07-06resource: ability to resize an allocated resourceRam Pai1-18/+98
Provides the ability to resize a resource that is already allocated. This functionality is put in place to support reallocation needs of pci resources. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-17resources: add arch hook for preventing allocation in reserved areasBjorn Helgaas1-0/+6
This adds arch_remove_reservations(), which an arch can implement if it needs to protect part of the address space from allocation. Sometimes that can be done by just putting a region in the resource tree, but there are cases where that doesn't work well. For example, x86 BIOS E820 reservations are not related to devices, so they may overlap part of, all of, or more than a device resource, so they may not end up at the correct spot in the resource tree. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17Revert "resources: support allocating space within a region from the top down"Bjorn Helgaas1-94/+4
This reverts commit e7f8567db9a7f6b3151b0b275e245c1cef0d9c70. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-28Merge branch 'linux-next' of ↵Linus Torvalds1-16/+135
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (27 commits) x86: allocate space within a region top-down x86: update iomem_resource end based on CPU physical address capabilities x86/PCI: allocate space from the end of a region, not the beginning PCI: allocate bus resources from the top down resources: support allocating space within a region from the top down resources: handle overflow when aligning start of available area resources: ensure callback doesn't allocate outside available space resources: factor out resource_clip() to simplify find_resource() resources: add a default alignf to simplify find_resource() x86/PCI: MMCONFIG: fix region end calculation PCI: Add support for polling PME state on suspended legacy PCI devices PCI: Export some PCI PM functionality PCI: fix message typo PCI: log vendor/device ID always PCI: update Intel chipset names and defines PCI: use new ccflags variable in Makefile PCI: add PCI_MSIX_TABLE/PBA defines PCI: add PCI vendor id for STmicroelectronics x86/PCI: irq and pci_ids patch for Intel Patsburg DeviceIDs PCI: OLPC: Only enable PCI configuration type override on XO-1 ...
2010-10-28kernel/resource.c: handle reinsertion of an already-inserted resourceHuang Shijie1-0/+2
If the same resource is inserted to the resource tree (maybe not on purpose), a dead loop will be created. In this situation, The kernel does not report any warning or error :( The command below will show a endless print. #cat /proc/iomem [akpm@linux-foundation.org: add WARN_ON()] Signed-off-by: Huang Shijie <shijie8@gmail.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27resources: support allocating space within a region from the top downBjorn Helgaas1-4/+94
Allocate space from the top of a region first, then work downward, if an architecture desires this. When we allocate space from a resource, we look for gaps between children of the resource. Previously, we always looked at gaps from the bottom up. For example, given this: [mem 0xbff00000-0xf7ffffff] PCI Bus 0000:00 [mem 0xbff00000-0xbfffffff] gap -- available [mem 0xc0000000-0xdfffffff] PCI Bus 0000:02 [mem 0xe0000000-0xf7ffffff] gap -- available we attempted to allocate from the [mem 0xbff00000-0xbfffffff] gap first, then the [mem 0xe0000000-0xf7ffffff] gap. With this patch an architecture can choose to allocate from the top gap [mem 0xe0000000-0xf7ffffff] first. We can't do this across the board because iomem_resource.end is initialized to 0xffffffff_ffffffff on 64-bit architectures, and most machines can't address the entire 64-bit physical address space. Therefore, we only allocate top-down if the arch requests it by clearing "resource_alloc_from_bottom". Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-27resources: handle overflow when aligning start of available areaBjorn Helgaas1-8/+13
If tmp.start is near ~0, ALIGN(tmp.start) may overflow, which would make us think there's more available space than there really is. We would likely return something that conflicts with a previous resource, which would cause a failure when allocate_resource() requests the newly- allocated region. Reference: https://bugzilla.redhat.com/show_bug.cgi?id=646027 Reported-by: Fabrice Bellet <fabrice@bellet.info> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-27resources: ensure callback doesn't allocate outside available spaceBjorn Helgaas1-5/+11
The alignment callback returns a proposed location, which may have been adjusted to avoid ISA aliases or for other architecture-specific reasons. We already had a check ("tmp.start < tmp.end") to make sure the callback doesn't return an area that extends past the available area. This patch reworks the check to make sure it doesn't return an area that extends either below or above the available area. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-27resources: factor out resource_clip() to simplify find_resource()Bjorn Helgaas1-4/+11
This factors out the min/max clipping to simplify find_resource(). No functional change. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-27resources: add a default alignf to simplify find_resource()Bjorn Helgaas1-2/+13
This removes a test from find_resource(), which is getting cluttered. No functional change. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-05-11resource: shared I/O region supportAlan Cox1-1/+15
SuperIO devices share regions and use lock/unlock operations to chip select. We therefore need to be able to request a resource and wait for it to be freed by whichever other SuperIO device currently hogs it. Right now you have to poll which is horrible. Add a MUXED field to IO port resources. If the MUXED field is set on the resource and on the request (via request_muxed_region) then we block until the previous owner of the muxed resource releases their region. This allows us to implement proper resource sharing and locking for superio chips using code of the form enable_my_superio_dev() { request_muxed_region(0x44, 0x02, "superio:watchdog"); outb() ..sequence to enable chip } disable_my_superio_dev() { outb() .. sequence of disable chip release_region(0x44, 0x02); } Signed-off-by: Giel van Schijndel <me@mortis.eu> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-03-23resources: add interfaces that return conflict informationBjorn Helgaas1-7/+37
request_resource() and insert_resource() only return success or failure, which no information about what existing resource conflicted with the proposed new reservation. This patch adds request_resource_conflict() and insert_resource_conflict(), which return the conflicting resource. Callers may use this for better error messages or to adjust the new resource and retry the request. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-03-03Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds1-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: resource: Fix broken indentation resource: Fix generic page_is_ram() for partial RAM pages x86, paravirt: Remove kmap_atomic_pte paravirt op. x86, vmi: Disable highmem PTE allocation even when CONFIG_HIGHPTE=y x86, xen: Disable highmem PTE allocation even when CONFIG_HIGHPTE=y
2010-03-02resource: Fix broken indentationH. Peter Anvin1-1/+1
Fix broken indentation in patch 37b99dd5372cff42f83210c280f314f10f99138e. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Wu Fengguang <fengguang.wu@intel.com> LKML-Reference: <20100301135551.GA9998@localhost> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-03-01resource: Fix generic page_is_ram() for partial RAM pagesWu Fengguang1-4/+5
The System RAM walk shall skip partial RAM pages and avoid calling func() on them. So that page_is_ram() return 0 for a partial RAM page. In particular, it shall not call func() with len=0. This fixes a boot time bug reported by Sachin and root caused by Thomas: > >>> WARNING: at arch/x86/mm/ioremap.c:111 __ioremap_caller+0x169/0x2f1() > >>> Hardware name: BladeCenter LS21 -[79716AA]- > >>> Modules linked in: > >>> Pid: 0, comm: swapper Not tainted 2.6.33-git6-autotest #1 > >>> Call Trace: > >>> [<ffffffff81047cff>] ? __ioremap_caller+0x169/0x2f1 > >>> [<ffffffff81063b7d>] warn_slowpath_common+0x77/0xa4 > >>> [<ffffffff81063bb9>] warn_slowpath_null+0xf/0x11 > >>> [<ffffffff81047cff>] __ioremap_caller+0x169/0x2f1 > >>> [<ffffffff813747a3>] ? acpi_os_map_memory+0x12/0x1b > >>> [<ffffffff81047f10>] ioremap_nocache+0x12/0x14 > >>> [<ffffffff813747a3>] acpi_os_map_memory+0x12/0x1b > >>> [<ffffffff81282fa0>] acpi_tb_verify_table+0x29/0x5b > >>> [<ffffffff812827f0>] acpi_load_tables+0x39/0x15a > >>> [<ffffffff8191c8f8>] acpi_early_init+0x60/0xf5 > >>> [<ffffffff818f2cad>] start_kernel+0x397/0x3a7 > >>> [<ffffffff818f2295>] x86_64_start_reservations+0xa5/0xa9 > >>> [<ffffffff818f237a>] x86_64_start_kernel+0xe1/0xe8 > >>> ---[ end trace 4eaa2a86a8e2da22 ]--- > >>> ioremap reserve_memtype failed -22 The return code is -EINVAL, so it failed in the is_ram check, which is not too surprising > BIOS-provided physical RAM map: > BIOS-e820: 0000000000000000 - 000000000009c000 (usable) > BIOS-e820: 000000000009c000 - 00000000000a0000 (reserved) > BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) > BIOS-e820: 0000000000100000 - 00000000cffa3900 (usable) > BIOS-e820: 00000000cffa3900 - 00000000cffa7400 (ACPI data) The ACPI data is not starting on a page boundary and neither does the usable RAM area end on a page boundary. Very useful ! > ACPI: DSDT 00000000cffa3900 036CE (v01 IBM SERLEWIS 00001000 INTL 20060912) ACPI is trying to map DSDT at cffa3900, which results in a check vs. cffa3000 which is the relevant page boundary. The generic is_ram check correctly identifies that as RAM because it's in the usable resource area. The old e820 based is_ram check does not take overlapping resource areas into account. That's why it works. CC: Sachin Sant <sachinp@in.ibm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> LKML-Reference: <20100301135551.GA9998@localhost> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-28Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds1-0/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mm: Unify kernel_physical_mapping_init() API x86, mm: Allow highmem user page tables to be disabled at boot time x86: Do not reserve brk for DMI if it's not going to be used x86: Convert tlbstate_lock to raw_spinlock x86: Use the generic page_is_ram() x86: Remove BIOS data range from e820 Move page_is_ram() declaration to mm.h Generic page_is_ram: use __weak resources: introduce generic page_is_ram()
2010-02-23resource: add release_child_resourcesYinghai Lu1-0/+30
Useful for freeing a portion of the resource tree, e.g. when trying to reallocate resources more efficiently. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-23resource/PCI: mark struct resource as constDominik Brodowski1-2/+2
Now that we return the new resource start position, there is no need to update "struct resource" inside the align function. Therefore, mark the struct resource as const. Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-23resource/PCI: align functions now return start of resourceDominik Brodowski1-5/+9
As suggested by Linus, align functions should return the start of a resource, not void. An update of "res->start" is no longer necessary. Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-17Merge branch 'linus' into x86/mmThomas Gleixner1-14/+16
x86/mm is on 32-rc4 and missing the spinlock namespace changes which are needed for further commits into this topic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-02-02Generic page_is_ram: use __weakAndrew Morton1-1/+1
Use __weak instead of __attribute__((weak)). Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-02resources: introduce generic page_is_ram()Wu Fengguang1-0/+13
It's based on walk_system_ram_range(), for archs that don't have their own page_is_ram(). The static verions in MIPS and SCORE are also made global. v4: prefer plain 1 instead of PAGE_IS_RAM (H. Peter Anvin) v3: add comment (KAMEZAWA Hiroyuki) "AFAIK, this "System RAM" information has been used for kdump to grab valid memory area and seems good for the kernel itself." v2: add PAGE_IS_RAM macro (Américo Wang) Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Américo Wang <xiyou.wangcong@gmail.com> Cc: linux-mips@linux-mips.org Cc: Yinghai Lu <yinghai@kernel.org> Acked-by: Ralf Baechle <ralf@linux-mips.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> LKML-Reference: <20100122081619.GA6431@localhost> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>