summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2012-12-13MIPS: Fix potencial corruptionRalf Baechle1-11/+0
Normally r4k_dma_cache_inv should only ever be called with cacheline aligned addresses. If however, it isn't there is the theoretical possibility of data corruption. There is no correct way of handling this and anyway, it should only happen if the DMA API is used incorrectly so drop There is a different corruption scenario with these CACHE instructions removed but again there is no way of handling this correctly and it can be triggered only through incorrect use of the DMA API. So just get rid of the complexity. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Reported-by: James Rodriguez <jamesr@juniper.net>
2012-12-13MIPS: Fix for warning from FPU emulation codeRalf Baechle1-7/+8
The default implementation of 'cpu_has_fpu' macro calls smp_processor_id() which causes this warning to be printed when preemption is enabled: [ 4.664000] Algorithmics/MIPS FPU Emulator v1.5 [ 4.676000] BUG: using smp_processor_id() in preemptible [00000000] code: ini [ 4.700000] caller is fpu_emulator_cop1Handler+0x434/0x27b8 This problem got introduced in November 2009 by af1d2af877ef6c36990671bc86a5b9c5bb50b1da (lmo) [MIPS: Fix emulation of 64-bit FPU on 64-bit CPUs.] rsp. da0bac33413b2888d3623dad3ad19ce76b688f07 (kernel.org) [MIPS: Fix emulation of 64-bit FPU on FPU-less 64-bit CPUs.] in 2.6.32. Fixed by rewriting cop1_64bit() to return a constant whenever possible but most importantly avoid the use pf cpu_has_fpu entirely. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Reported-by: Jayachandran C <jchandra@broadcom.com> Initial-patch-by: Jayachandran C <jchandra@broadcom.com> Patchwork: https://patchwork.linux-mips.org/patch/4225/
2012-12-13MIPS: Handle COP3 Unusable exception as COP1X for FP emulationMaciej W. Rozycki1-3/+18
Our FP emulator is hardcoded for the MIPS IV FP instruction set and does not match the FP ISA with the general ISA. However for the few MIPS IV FP instructions that use the COP1X major opcode it relies on the Coprocessor Unusable exception to be delivered as a COP1 rather than COP3 exception. This includes indexed transfer (LDXC1, etc.) and FP multiply-accumulate (MADD.D, etc.) instructions. All the MIPS I, II, III and IV processors and some newer chips that do not implement the FPU use the COP3 exception however. Therefore I believe the kernel should follow and redirect any COP3 Unusable traps to the emulator unless an actual FPU part or core is present. This is a change that implements it. Any minor opcode encodings that are not recognised as valid FP instructions are rejected by the emulator and will result in a SIGILL signal being delivered as they currently do. We do not support vendor-specific coprocessor 3 implementations supported with MIPS I and MIPS II ISA processors; we never set CP0.Status.CU3. [Ralf: On MIPS IV processors the kernel always enables the XX bit which replaces the CU3 bit off earlier architecture revisions.] If matching between the CPU and the FPU ISA is considered required one day, this can still be done in the emulator itself. I think the CpU exception dispatcher is not the right place to do this anyway, as there are further differences between MIPS I, MIPS II, MIPS III, MIPS IV and MIPS32 FP ISAs. Corresponding explanation of this implementation is included within the change itself. Signed-off-by: Maciej W. Rozycki <macro@codesourcery.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/project/linux-mips/list/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13MIPS: Fix poweroff failure when HOTPLUG_CPU configured.Huacai Chen1-3/+1
When poweroff machine, kernel_power_off() call disable_nonboot_cpus(). And if we have HOTPLUG_CPU configured, disable_nonboot_cpus() is not an empty function but attempt to actually disable the nonboot cpus. Since system state is SYSTEM_POWER_OFF, play_dead() won't be called and thus disable_nonboot_cpus() hangs. Therefore, we make this patch to avoid poweroff failure. Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Hongliang Tao <taohl@lemote.com> Signed-off-by: Hua Yan <yanh@lemote.com> Cc: Yong Zhang <yong.zhang@windriver.com> Cc: stable@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Patchwork: https://patchwork.linux-mips.org/patch/4211/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13MIPS: MT: Fix build with CONFIG_UIDGID_STRICT_TYPE_CHECKS=yFlorian Fainelli1-2/+2
When CONFIG_UIDGID_STRICT_TYPE_CHECKS is enabled, plain integer checking between different uids/gids is explicitely turned into a build failure by making the k{uid,gid}_t types a structure containing a value: arch/mips/kernel/mips-mt-fpaff.c: In function 'check_same_owner': arch/mips/kernel/mips-mt-fpaff.c:53:22: error: invalid operands to binary == (have 'kuid_t' and 'kuid_t') arch/mips/kernel/mips-mt-fpaff.c:54:15: error: invalid operands to binary == (have 'kuid_t' and 'kuid_t') In order to ensure proper comparison between uids, using the helper function uid_eq() which performs the right thing whenever this config option is turned on or off. Signed-off-by: Florian Fainelli <florian@openwrt.org> Patchwork: https://patchwork.linux-mips.org/patch/4717/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13MIPS: Remove unused smvp.hPaul Bolle1-19/+0
This header was added in commit 39b8d5254246ac56342b72f812255c8f7a74dca9 (kernel.org) / b6e90cd0ae7a556080d9ea2ec1b8f6d9accad9d4 (lmo( ([MIPS] Add support for MIPS CMP platform.). None of the functions it declared were ever included in the tree. Commit cb7f39d2bc5a20615d016dd86fca0fd233c13b5d (kernel.org) / b6e90cd0ae7a556080d9ea2ec1b8f6d9accad9d4 (lmo) [MIPS] Remove unused maltasmp.h.] removeed the sole file that included it because that file was itself unused. [ralf@linux-mips.org: The whole mess happened because somebody at MIPS thought it was a good idea to rename VSMP ("Vitual SMP") to SMVP. Which is an IBMeque ETLA in contrast to VSMP, so public kernels as opposed to MTI's inhouse kernels never followed suit.] Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/3950/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13MIPS/EDAC: Improve OCTEON EDAC support.David Daney8-335/+406
Some initialization errors are reported with the existing OCTEON EDAC support patch. Also some parts have more than one memory controller. Fix the errors and add multiple controllers if present. Signed-off-by: David Daney <david.daney@cavium.com>
2012-12-13MIPS: OCTEON: Add definitions for OCTEON memory contoller registers.David Daney1-0/+3457
Signed-off-by: David Daney <david.daney@cavium.com>
2012-12-13MIPS: OCTEON: Add OCTEON family definitions to octeon-model.hDavid Daney1-0/+6
Used by follow-on EDAC patches. Signed-off-by: David Daney <david.daney@cavium.com>
2012-12-13ata: pata_octeon_cf: Use correct byte order for DMA in when built little-endian.David Daney1-0/+4
We need to set the 'endian' bit in this case. Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David Daney <david.daney@cavium.com>
2012-12-13MIPS/OCTEON/ata: Convert pata_octeon_cf.c to use device tree.David Daney5-246/+284
The patch needs to eliminate the definition of OCTEON_IRQ_BOOTDMA so that the device tree code can map the interrupt, so in order to not temporarily break things, we do a single patch to both the interrupt registration code and the pata_octeon_cf driver. Also rolled in is a conversion to use hrtimers and corrections to the timing calculations. Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David Daney <david.daney@cavium.com>
2012-12-13MIPS: Remove usage of CEVT_R4K_LIB config option.Ralf Baechle3-8/+2
Manuel Lauss <manuel.lauss@gmail.com> writes: I introduced it as a fallback because early revisions of Alchemy hardware we shipped had a non-functional 32kHz timer and had to rely on the r4k timer instead. Previously the r4k timer was initialized regardless, but it's useless with the "wait" instruction. So long story short: I need either the on-chip 32kHz timer OR the r4k timer if the 32kHz one is unusable, but not both, and r4k timer is useless when au1k_idle is in use. The current in-kernel Alchemy boards all work with the 32kHz timer, so I'm not against removing R4K_LIB symbols. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13MIPS: Remove usage of CSRC_R4K_LIB config option.Steven J. Hill3-7/+3
Manuel Lauss <manuel.lauss@gmail.com> writes: I introduced it as a fallback because early revisions of Alchemy hardware we shipped had a non-functional 32kHz timer and had to rely on the r4k timer instead. Previously the r4k timer was initialized regardless, but it's useless with the "wait" instruction. So long story short: I need either the on-chip 32kHz timer OR the r4k timer if the 32kHz one is unusable, but not both, and r4k timer is useless when au1k_idle is in use. The current in-kernel Alchemy boards all work with the 32kHz timer, so I'm not against removing R4K_LIB symbols. Signed-off-by: Steven J. Hill <sjhill@mips.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13MIPS: AR7: use part_probe_types to specificy the partition parser to useFlorian Fainelli1-0/+3
This patch changes the physmap-flash platform data on AR7 to pass the correct partition parser: ar7part to used by the "physmap-flash" mapping driver so we get the partitions probed correctly. Signed-off-by: Florian Fainelli <florian@openwrt.org> Cc: blogic@openwrt.org Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/4654/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13MIPS: Lantiq: Fix typo in "endianness" in dma.cMasanari Iida1-3/+3
Correct spelling typo ENDIANESS to ENDIANNESS in arc/mips/lantiq/xway/dma.c Signed-off-by: Masanari Iida <standby24x7@gmail.com> Cc: trivial@kernel.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/4613/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13MIPS: Kconfig: Rename several firmware related config symbols.Ralf Baechle7-34/+34
With the upcoming merge of the ARC architecture there is a small likelyhood of conflicting use for the CONFIG_ARC config symbol. Rename it to CONFIG_FW_ARC. Also rename CONFIG_ARC32 to CONFIG_FW_ARC32, CONFIG_ARC64 to CONFIG_FW_ARC64. For consistence also rename CONFIG_SNIPROM to CONFIG_FW_SNIPROM and CONFIG_CFE to CONFIG_FW_CFE. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13MIPS: Octeon: Add kexec and kdump supportRalf Baechle6-9/+335
[ralf@linux-mips.org: Original patch by Maxim Uvarov <muvarov@gmail.com> with plenty of further shining, polishing, debugging and testing by me.] Signed-off-by: Maxim Uvarov <muvarov@gmail.com> Cc: linux-mips@linux-mips.org Cc: kexec@lists.infradead.org Cc: horms@verge.net.au Patchwork: https://patchwork.linux-mips.org/patch/1026/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13MIPS: kdump: Add supportRalf Baechle11-9/+396
[ralf@linux-mips.org: Original patch by Maxim Uvarov <muvarov@gmail.com> with plenty of further shining, polishing, debugging and testing by me.] Signed-off-by: Maxim Uvarov <muvarov@gmail.com> Cc: linux-mips@linux-mips.org Cc: kexec@lists.infradead.org Cc: horms@verge.net.au Patchwork: https://patchwork.linux-mips.org/patch/1025/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1353-34453/+82808
Pull networking changes from David Miller: 1) Allow to dump, monitor, and change the bridge multicast database using netlink. From Cong Wang. 2) RFC 5961 TCP blind data injection attack mitigation, from Eric Dumazet. 3) Networking user namespace support from Eric W. Biederman. 4) tuntap/virtio-net multiqueue support by Jason Wang. 5) Support for checksum offload of encapsulated packets (basically, tunneled traffic can still be checksummed by HW). From Joseph Gasparakis. 6) Allow BPF filter access to VLAN tags, from Eric Dumazet and Daniel Borkmann. 7) Bridge port parameters over netlink and BPDU blocking support from Stephen Hemminger. 8) Improve data access patterns during inet socket demux by rearranging socket layout, from Eric Dumazet. 9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and Jon Maloy. 10) Update TCP socket hash sizing to be more in line with current day realities. The existing heurstics were choosen a decade ago. From Eric Dumazet. 11) Fix races, queue bloat, and excessive wakeups in ATM and associated drivers, from Krzysztof Mazur and David Woodhouse. 12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions in VXLAN driver, from David Stevens. 13) Add "oops_only" mode to netconsole, from Amerigo Wang. 14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also allow DCB netlink to work on namespaces other than the initial namespace. From John Fastabend. 15) Support PTP in the Tigon3 driver, from Matt Carlson. 16) tun/vhost zero copy fixes and improvements, plus turn it on by default, from Michael S. Tsirkin. 17) Support per-association statistics in SCTP, from Michele Baldessari. And many, many, driver updates, cleanups, and improvements. Too numerous to mention individually. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits) net/mlx4_en: Add support for destination MAC in steering rules net/mlx4_en: Use generic etherdevice.h functions. net: ethtool: Add destination MAC address to flow steering API bridge: add support of adding and deleting mdb entries bridge: notify mdb changes via netlink ndisc: Unexport ndisc_{build,send}_skb(). uapi: add missing netconf.h to export list pkt_sched: avoid requeues if possible solos-pci: fix double-free of TX skb in DMA mode bnx2: Fix accidental reversions. bna: Driver Version Updated to 3.1.2.1 bna: Firmware update bna: Add RX State bna: Rx Page Based Allocation bna: TX Intr Coalescing Fix bna: Tx and Rx Optimizations bna: Code Cleanup and Enhancements ath9k: check pdata variable before dereferencing it ath5k: RX timestamp is reported at end of frame ath9k_htc: RX timestamp is reported at end of frame ...
2012-12-13Merge tag 'for-linus-20121212' of ↵Linus Torvalds10-136/+234
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-mn10300 Pull MN10300 changes from David Howells: "miscellaneous MN10300 arch patches. I've based it on top of Al Viro's signal tree - so these patches should be pulled after that." * tag 'for-linus-20121212' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-mn10300: MN10300: Use asm-generic/pci_iomap.h MN10300: Get rid of unused variable from ASB2305 PCI code MN10300: ASB2305 PCI code needs linux/irq.h mn10300/mm/fault.c: Port OOM changes to do_page_fault MN10300: Handle cacheable PCI regions in pci_iomap() MN10300: fix debug polling in ttySM driver MN10300: ttySM: clean up unnecessary casting MN10300: fix SMP synchronization between txdma and serial driver MN10300: fix serial port vdma irq setup for SMP MN10300: cleanup IRQ affinity setting MN10300: ttySM: Use memory barriers correctly in circular buffer logic
2012-12-13mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic()Lin Feng2-9/+0
reserve_bootmem_generic() has no caller, Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13mm/memory.c: remove unused code from do_wp_page()Dominik Dingel1-6/+1
page_mkwrite is initalized with zero and only set once, from that point exists no way to get to the oom or oom_free_new labels. [akpm@linux-foundation.org: cleanup] Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13asm-generic, mm: pgtable: consolidate zero page helpersKirill A. Shutemov5-35/+28
We have two different implementation of is_zero_pfn() and my_zero_pfn() helpers: for architectures with and without zero page coloring. Let's consolidate them in <asm-generic/pgtable.h>. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13mm/hugetlb.c: fix warning on freeing hwpoisoned hugepageNaoya Horiguchi1-1/+7
Fix the warning from __list_del_entry() which is triggered when a process tries to do free_huge_page() for a hwpoisoned hugepage. free_huge_page() can be called for hwpoisoned hugepage from unpoison_memory(). This function gets refcount once and clears PageHWPoison, and then puts refcount twice to return the hugepage back to free pool. The second put_page() finally reaches free_huge_page(). Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13hwpoison, hugetlbfs: fix RSS-counter warningNaoya Horiguchi1-5/+7
Memory error handling on hugepages can break a RSS counter, which emits a message like "Bad rss-counter state mm:ffff88040abecac0 idx:1 val:-1". This is because PageAnon returns true for hugepage (this behavior is necessary for reverse mapping to work on hugetlbfs). [akpm@linux-foundation.org: clean up code layout] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13hwpoison, hugetlbfs: fix "bad pmd" warning in unmapping hwpoisoned hugepageNaoya Horiguchi1-1/+3
When a process which used a hwpoisoned hugepage tries to exit() or munmap(), the kernel can print out "bad pmd" message because page table walker in free_pgtables() encounters 'hwpoisoned entry' on pmd. This is because currently we fail to clear the hwpoisoned entry in __unmap_hugepage_range(), so this patch simply does it. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13mm: protect against concurrent vma expansionMichel Lespinasse1-0/+28
expand_stack() runs with a shared mmap_sem lock. Because of this, there could be multiple concurrent stack expansions in the same mm, which may cause problems in the vma gap update code. I propose to solve this by taking the mm->page_table_lock around such vma expansions, in order to avoid the concurrency issue. We only have to worry about concurrent expand_stack() calls here, since we hold a shared mmap_sem lock and all vma modificaitons other than expand_stack() are done under an exclusive mmap_sem lock. I previously tried to achieve the same effect by making sure all growable vmas in a given mm would share the same anon_vma, which we already lock here. However this turned out to be difficult - all of the schemes I tried for refcounting the growable anon_vma and clearing turned out ugly. So, I'm now proposing only the minimal fix. The overhead of taking the page table lock during stack expansion is expected to be small: glibc doesn't use expandable stacks for the threads it creates, so having multiple growable stacks is actually uncommon and we don't expect the page table lock to get bounced between threads. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13memcg: do not check for mm in __mem_cgroup_count_vm_eventMichal Hocko1-3/+0
The mm given to __mem_cgroup_count_vm_event() cannot be NULL because the function is either called from the page fault path or vma->vm_mm is used. So the check can be dropped. The check was introduced by commit 456f998ec817 ("memcg: add the pagefault count into memcg stats") because the originally proposed patch used current->mm for shmem but this has been changed to vma->vm_mm later on without the check being removed (thanks to Hugh for this recollection). Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Ying Han <yinghan@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13tmpfs: support SEEK_DATA and SEEK_HOLE (reprise)Hugh Dickins1-1/+91
Revert 3.5's commit f21f8062201f ("tmpfs: revert SEEK_DATA and SEEK_HOLE") to reinstate 4fb5ef089b28 ("tmpfs: support SEEK_DATA and SEEK_HOLE"), with the intervening additional arg to generic_file_llseek_size(). In 3.8, ext4 is expected to join btrfs, ocfs2 and xfs with proper SEEK_DATA and SEEK_HOLE support; and a good case has now been made for it on tmpfs, so let's join the party. It's quite easy for tmpfs to scan the radix_tree to support llseek's new SEEK_DATA and SEEK_HOLE options: so add them while the minutiae are still on my mind (in particular, the !PageUptodate-ness of pages fallocated but still unwritten). [akpm@linux-foundation.org: fix warning with CONFIG_TMPFS=n] Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jaegeuk Hanse <jaegeuk.hanse@gmail.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Zheng Liu <wenqing.lz@taobao.com> Cc: Jeff liu <jeff.liu@oracle.com> Cc: Paul Eggert <eggert@cs.ucla.edu> Cc: Christoph Hellwig <hch@infradead.org> Cc: Josef Bacik <josef@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Andreas Dilger <adilger@dilger.ca> Cc: Marco Stornelli <marco.stornelli@gmail.com> Cc: Chris Mason <chris.mason@fusionio.com> Cc: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13mm: provide more accurate estimation of pages occupied by memmapJiang Liu1-2/+24
If SPARSEMEM is enabled, it won't build page structures for non-existing pages (holes) within a zone, so provide a more accurate estimation of pages occupied by memmap if there are bigger holes within the zone. And pages for highmem zones' memmap will be allocated from lowmem, so charge nr_kernel_pages for that. [akpm@linux-foundation.org: mark calc_memmap_size __paging_init] Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Cc: Chris Clayton <chris2553@googlemail.com> Cc: "Rafael J . Wysocki" <rjw@sisk.pl> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Tested-by: Jianguo Wu <wujianguo@huawei.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13fs/buffer.c: remove redundant initialization in alloc_page_buffers()Yan Hong1-3/+0
buffer_head comes from kmem_cache_zalloc(), no need to zero its fields. Signed-off-by: Yan Hong <clouds.yan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13fs/buffer.c: do not inline exported functionYan Hong1-2/+1
It makes no sense to inline an exported function. Signed-off-by: Yan Hong <clouds.yan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13writeback: fix a typo in commentYan Hong1-1/+1
Signed-off-by: Yan Hong <clouds.yan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13mm: introduce new field "managed_pages" to struct zoneJiang Liu6-23/+121
Currently a zone's present_pages is calcuated as below, which is inaccurate and may cause trouble to memory hotplug. spanned_pages - absent_pages - memmap_pages - dma_reserve. During fixing bugs caused by inaccurate zone->present_pages, we found zone->present_pages has been abused. The field zone->present_pages may have different meanings in different contexts: 1) pages existing in a zone. 2) pages managed by the buddy system. For more discussions about the issue, please refer to: http://lkml.org/lkml/2012/11/5/866 https://patchwork.kernel.org/patch/1346751/ This patchset tries to introduce a new field named "managed_pages" to struct zone, which counts "pages managed by the buddy system". And revert zone->present_pages to count "physical pages existing in a zone", which also keep in consistence with pgdat->node_present_pages. We will set an initial value for zone->managed_pages in function free_area_init_core() and will adjust it later if the initial value is inaccurate. For DMA/normal zones, the initial value is set to: (spanned_pages - absent_pages - memmap_pages - dma_reserve) Later zone->managed_pages will be adjusted to the accurate value when the bootmem allocator frees all free pages to the buddy system in function free_all_bootmem_node() and free_all_bootmem(). The bootmem allocator doesn't touch highmem pages, so highmem zones' managed_pages is set to the accurate value "spanned_pages - absent_pages" in function free_area_init_core() and won't be updated anymore. This patch also adds a new field "managed_pages" to /proc/zoneinfo and sysrq showmem. [akpm@linux-foundation.org: small comment tweaks] Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Tested-by: Chris Clayton <chris2553@googlemail.com> Cc: "Rafael J . Wysocki" <rjw@sisk.pl> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13mm, oom: remove statically defined arch functions of same nameDavid Rientjes3-42/+27
out_of_memory() is a globally defined function to call the oom killer. x86, sh, and powerpc all use a function of the same name within file scope in their respective fault.c unnecessarily. Inline the functions into the pagefault handlers to clean the code up. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13mm, oom: remove redundant sleep in pagefault oom handlerDavid Rientjes1-1/+0
out_of_memory() will already cause current to schedule if it has not been killed, so doing it again in pagefault_out_of_memory() is redundant. Remove it. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13mm, oom: cleanup pagefault oom handlerDavid Rientjes1-42/+7
To lock the entire system from parallel oom killing, it's possible to pass in a zonelist with all zones rather than using for_each_populated_zone() for the iteration. This obsoletes try_set_system_oom() and clear_system_oom() so that they can be removed. Signed-off-by: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13memory_hotplug: allow online/offline memory to result movable nodeLai Jiangshan1-0/+16
Now, memory management can handle movable node or nodes which don't have any normal memory, so we can dynamic configure and add movable node by: online a ZONE_MOVABLE memory from a previous offline node offline the last normal memory which result a non-normal-memory-node movable-node is very important for power-saving, hardware partitioning and high-available-system(hardware fault management). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Tested-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13numa: add CONFIG_MOVABLE_NODE for movable-dedicated nodeLai Jiangshan4-0/+21
We need a node which only contains movable memory. This feature is very important for node hotplug. If a node has normal/highmem, the memory may be used by the kernel and can't be offlined. If the node only contains movable memory, we can offline the memory and the node. All are prepared, we can actually introduce N_MEMORY. add CONFIG_MOVABLE_NODE make we can use it for movable-dedicated node [akpm@linux-foundation.org: fix Kconfig text] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Tested-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13mm, memcg: avoid unnecessary function call when memcg is disabledDavid Rientjes2-3/+12
While profiling numa/core v16 with cgroup_disable=memory on the command line, I noticed mem_cgroup_count_vm_event() still showed up as high as 0.60% in perftop. This occurs because the function is called extremely often even when memcg is disabled. To fix this, inline the check for mem_cgroup_disabled() so we avoid the unnecessary function call if memcg is disabled. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Glauber Costa <glommer@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13mm: add a reminder comment for __GFP_BITS_SHIFTAndrew Morton1-0/+1
Cc: Glauber Costa <glommer@parallels.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13mm: WARN_ON_ONCE if f_op->mmap() change vma's start addressJoonsoo Kim1-0/+4
During reviewing the source code, I found a comment which mention that after f_op->mmap(), vma's start address can be changed. I didn't verify that it is really possible, because there are so many f_op->mmap() implementation. But if there are some mmap() which change vma's start address, it is possible error situation, because we already prepare prev vma, rb_link and rb_parent and these are related to original address. So add WARN_ON_ONCE for finding that this situtation really happens. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13res_counter: delete res_counter_write()Greg Thelen2-27/+0
Since commit 628f42355389 ("memcg: limit change shrink usage") both res_counter_write() and write_strategy_fn have been unused. This patch deletes them both. Signed-off-by: Greg Thelen <gthelen@google.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Tejun Heo <tj@kernel.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13hotplug: update nodemasks managementLai Jiangshan3-16/+77
Update nodemasks management for N_MEMORY. [lliubbo@gmail.com: fix build] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Bob Liu <lliubbo@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states ↵Lai Jiangshan2-19/+25
initialization N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Since we introduced N_MEMORY, we update the initialization of node_states. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13vmscan: use N_MEMORY instead N_HIGH_MEMORYLai Jiangshan1-2/+2
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13init: use N_MEMORY instead N_HIGH_MEMORYLai Jiangshan1-1/+1
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13kthread: use N_MEMORY instead N_HIGH_MEMORYLai Jiangshan1-1/+1
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13vmstat: use N_MEMORY instead N_HIGH_MEMORYLai Jiangshan1-2/+2
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13hugetlb: use N_MEMORY instead N_HIGH_MEMORYLai Jiangshan2-13/+13
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>