summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2011-09-14SUNRPC: compare scopeid for link-local addressesMi Jinlong1-1/+7
For ipv6 link-local addresses, sunrpc do not compare those scope id. This patch let sunrpc compares scope id only on link-local addresses. Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-14SUNRPC: Replace svc_addr_u by sockaddr_storageMi Jinlong1-9/+21
For IPv6 local address, lockd can not callback to client for missing scope id when binding address at inet6_bind: 324 if (addr_type & IPV6_ADDR_LINKLOCAL) { 325 if (addr_len >= sizeof(struct sockaddr_in6) && 326 addr->sin6_scope_id) { 327 /* Override any existing binding, if another one 328 * is supplied by user. 329 */ 330 sk->sk_bound_dev_if = addr->sin6_scope_id; 331 } 332 333 /* Binding to link-local address requires an interface */ 334 if (!sk->sk_bound_dev_if) { 335 err = -EINVAL; 336 goto out_unlock; 337 } Replacing svc_addr_u by sockaddr_storage, let rqstp->rq_daddr contains more info besides address. Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-14NFSD: Remove the ex_pathname field from struct svc_exportTrond Myklebust1-1/+0
There are no more users... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-14NFSD: Cleanup for nfsd4_path()Trond Myklebust1-0/+1
The current code is sort of hackish in that it assumes a referral is always matched to an export. When we add support for junctions that may not be the case. We can replace nfsd4_path() with a function that encodes the components directly from the dentries. Since nfsd4_path is currently the only user of the 'ex_pathname' field in struct svc_export, this has the added benefit of allowing us to get rid of that. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-31nfsd: remove include/linux/nfsd/syscall.hJ. Bruce Fields2-117/+0
We don't need this any more. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27nfsd4: cleanup and consolidate seqid_mutating_errJ. Bruce Fields1-0/+16
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27Remove include/linux/nfsd/const.hJ. Bruce Fields3-45/+5
Userspace shouldn't have a use for these constants. Nothing here is used outside fs/nfsd. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27nfsd: remove unused definesJ. Bruce Fields1-13/+0
At least one of these is actually wrong anyway. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-19sunrpc: use better NUMA affinitiesEric Dumazet1-1/+1
Use NUMA aware allocations to reduce latencies and increase throughput. sunrpc kthreads can use kthread_create_on_node() if pool_mode is "percpu" or "pernode", and svc_prepare_thread()/svc_init_buffer() can also take into account NUMA node affinity for memory allocations. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: "J. Bruce Fields" <bfields@fieldses.org> CC: Neil Brown <neilb@suse.de> CC: David Miller <davem@davemloft.net> Reviewed-by: Greg Banks <gnb@fastmail.fm> [bfields@redhat.com: fix up caller nfs41_callback_up] Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-19locks: fix tracking of inprogress lease breaksJ. Bruce Fields1-2/+5
We currently use a bit in fl_flags to record whether a lease is being broken, and set fl_type to the type (RDLCK or UNLCK) that it will eventually have. This means that once the lease break starts, we forget what the lease's type *used* to be. Breaking a read lease will then result in blocking read opens, even though there's no conflict--because the lease type is now F_UNLCK and we can no longer tell whether it was previously a read or write lease. So, instead keep fl_type as the original type (the type which we enforce), and keep track of whether we're unlocking or merely downgrading by replacing the single FL_INPROGRESS flag by FL_UNLOCK_PENDING and FL_DOWNGRADE_PENDING flags. To get this right we also need to track separate downgrade and break times, to handle the case where a write-leased file gets conflicting opens first for read, then later for write. (I first considered just eliminating the downgrade behavior completely--nfsv4 doesn't need it, and nobody as far as I can tell actually uses it currently--but Jeremy Allison tells me that Windows oplocks do behave this way, so Samba will probably use this some day.) Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-19locks: move F_INPROGRESS from fl_type to fl_flags fieldJ. Bruce Fields2-6/+2
F_INPROGRESS isn't exposed to userspace. To me it makes more sense in fl_flags.... Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-08fix rcu annotations noise in cred.hAl Viro1-5/+6
task->cred is declared as __rcu, and access to other tasks' ->cred is, indeed, protected. Access to current->cred does not need rcu_dereference() at all, since only the task itself can change its ->cred. sparse, of course, has no way of knowing that... Add force-cast in current_cred(), make current_fsuid() et.al. use it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-07Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osdLinus Torvalds2-5/+130
* 'for-linus' of git://git.open-osd.org/linux-open-osd: ore: Make ore its own module exofs: Rename raid engine from exofs/ios.c => ore exofs: ios: Move to a per inode components & device-table exofs: Move exofs specific osd operations out of ios.c exofs: Add offset/length to exofs_get_io_state exofs: Fix truncate for the raid-groups case exofs: Small cleanup of exofs_fill_super exofs: BUG: Avoid sbi realloc exofs: Remove pnfs-osd private definitions nfs_xdr: Move nfs4_string definition out of #ifdef CONFIG_NFS_V4
2011-08-07vfs: optimize inode cache access patternsLinus Torvalds1-22/+37
The inode structure layout is largely random, and some of the vfs paths really do care. The path lookup in particular is already quite D$ intensive, and profiles show that accessing the 'inode->i_op->xyz' fields is quite costly. We already optimized the dcache to not unnecessarily load the d_op structure for members that are often NULL using the DCACHE_OP_xyz bits in dentry->d_flags, and this does something very similar for the inode ops that are used during pathname lookup. It also re-orders the fields so that the fields accessed by 'stat' are together at the beginning of the inode structure, and roughly in the order accessed. The effect of this seems to be in the 1-2% range for an empty kernel "make -j" run (which is fairly kernel-intensive, mostly in filename lookup), so it's visible. The numbers are fairly noisy, though, and likely depend a lot on exact microarchitecture. So there's more tuning to be done. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-07vfs: renumber DCACHE_xyz flags, remove some stale onesLinus Torvalds1-17/+13
Gcc tends to generate better code with small integers, including the DCACHE_xyz flag tests - so move the common ones to be first in the list. Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and DCACHE_AUTOFS_PENDING values, their users no longer exists in the source tree. And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the common case to be a nice straight-line fall-through. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds3-12/+25
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: Compute protocol sequence numbers and fragment IDs using MD5. crypto: Move md5_transform to lib/md5.c
2011-08-07exofs: Rename raid engine from exofs/ios.c => oreBoaz Harrosh1-0/+125
ORE stands for "Objects Raid Engine" This patch is a mechanical rename of everything that was in ios.c and its API declaration to an ore.c and an osd_ore.h header. The ore engine will later be used by the pnfs objects layout driver. * File ios.c => ore.c * Declaration of types and API are moved from exofs.h to a new osd_ore.h * All used types are prefixed by ore_ from their exofs_ name. * Shift includes from exofs.h to osd_ore.h so osd_ore.h is independent, include it from exofs.h. Other than a pure rename there are no other changes. Next patch will move the ore into it's own module and will export the API to be used by exofs and later the layout driver Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-07net: Compute protocol sequence numbers and fragment IDs using MD5.David S. Miller2-12/+20
Computers have become a lot faster since we compromised on the partial MD4 hash which we use currently for performance reasons. MD5 is a much safer choice, and is inline with both RFC1948 and other ISS generators (OpenBSD, Solaris, etc.) Furthermore, only having 24-bits of the sequence number be truly unpredictable is a very serious limitation. So the periodic regeneration and 8-bit counter have been removed. We compute and use a full 32-bit sequence number. For ipv6, DCCP was found to use a 32-bit truncated initial sequence number (it needs 43-bits) and that is fixed here as well. Reported-by: Dan Kaminsky <dan@doxpara.com> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07crypto: Move md5_transform to lib/md5.cDavid S. Miller1-0/+5
We are going to use this for TCP/IP sequence number and fragment ID generation. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07Merge branch 'for_linus' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86: (38 commits) acer-wmi: support Lenovo ideapad S205 wifi switch acerhdf.c: spaces in aliased changed to * platform-drivers-x86: ideapad-laptop: add missing ideapad_input_exit in ideapad_acpi_add error path x86 driver: fix typo in TDP override enabling Platform: fix samsung-laptop DMI identification for N150/N210/220/N230 dell-wmi: Add keys for Dell XPS L502X platform-drivers-x86: samsung-q10: make dmi_check_callback return 1 Platform: Samsung Q10 backlight driver platform-drivers-x86: intel_scu_ipc: convert to DEFINE_PCI_DEVICE_TABLE platform-drivers-x86: intel_rar_register: convert to DEFINE_PCI_DEVICE_TABLE platform-drivers-x86: intel_menlow: add missing return AE_OK for intel_menlow_register_sensor() platform-drivers-x86: intel_mid_thermal: fix memory leak platform-drivers-x86: msi-wmi: add missing sparse_keymap_free in msi_wmi_init error path Samsung Laptop platform driver: support N510 asus-wmi: add uwb rfkill support asus-wmi: add gps rfkill support asus-wmi: add CWAP support and clarify the meaning of WAPF bits asus-wmi: return proper value in store_cpufv() asus-wmi: check for temp1 presence asus-wmi: add thermal sensor ...
2011-08-06lib/sha1: use the git implementation of SHA-1Mandeep Singh Baines1-1/+1
For ChromiumOS, we use SHA-1 to verify the integrity of the root filesystem. The speed of the kernel sha-1 implementation has a major impact on our boot performance. To improve boot performance, we investigated using the heavily optimized sha-1 implementation used in git. With the git sha-1 implementation, we see a 11.7% improvement in boot time. 10 reboots, remove slowest/fastest. Before: Mean: 6.58 seconds Stdev: 0.14 After (with git sha-1, this patch): Mean: 5.89 seconds Stdev: 0.07 The other cool thing about the git SHA-1 implementation is that it only needs 64 bytes of stack for the workspace while the original kernel implementation needed 320 bytes. Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Cc: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Cc: Nicolas Pitre <nico@cam.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David S. Miller <davem@davemloft.net> Cc: linux-crypto@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-05Add KEY_MICMUTE and enable it on Lenovo X220Andy Lutomirski1-0/+2
I suspect that this works on T410. Signed-off-by: Andy Lutomirski <luto@mit.edu> Signed-off-by: Matthew Garrett <mjg@redhat.com>
2011-08-05Merge branch 'drm-fixes' of ↵Linus Torvalds2-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (55 commits) Revert "drm/i915: Try enabling RC6 by default (again)" drm/radeon: Extended DDC Probing for ECS A740GM-M DVI-D Connector drm/radeon: Log Subsystem Vendor and Device Information drm/radeon: Extended DDC Probing for Connectors with Improperly Wired DDC Lines (here: Asus M2A-VM HDMI) drm: Separate EDID Header Check from EDID Block Check drm: Add NULL check about irq functions drm: Fix irq install error handling drm/radeon: fix potential NULL dereference in drivers/gpu/drm/radeon/atom.c drm/radeon: clean reg header files drm/debugfs: Initialise empty variable drm/radeon/kms: add thermal chip quirk for asus 9600xt drm/radeon: off by one in check_reg() functions drm/radeon/kms: fix version comment due to merge timing drm/i915: allow cache sharing policy control drm/i915/hdmi: HDMI source product description infoframe support drm/i915/hdmi: split infoframe setting from infoframe type code drm: track CEA version number if present drm/i915: Try enabling RC6 by default (again) Revert "drm/i915/dp: Zero the DPCD data before connection probe" drm/i915/dp: wait for previous AUX channel activity to clear ...
2011-08-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds3-6/+15
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits) ipv6: check for IPv4 mapped addresses when connecting IPv6 sockets mlx4: decreasing ref count when removing mac net: Fix security_socket_sendmsg() bypass problem. net: Cap number of elements for sendmmsg net: sendmmsg should only return an error if no messages were sent ixgbe: fix PHY link setup for 82599 ixgbe: fix __ixgbe_notify_dca() bail out code igb: fix WOL on second port of i350 device e1000e: minor re-order of #include files e1000e: remove unnecessary check for NULL pointer intel drivers: repair missing flush operations macb: restore wrap bit when performing underrun cleanup cdc_ncm: fix endianness problem. irda: use PCI_VENDOR_ID_* mlx4: Fixing Ethernet unicast packet steering net: fix NULL dereferences in check_peer_redir() bnx2x: Clear MDIO access warning during first driver load bnx2x: Fix BCM578xx MAC test bnx2x: Fix BCM54618se invalid link indication bnx2x: Fix BCM84833 link ...
2011-08-05Merge branch 'for-linus' of ↵Linus Torvalds1-48/+26
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: RCUify freeing acls, let check_acl() go ahead in RCU mode if acl is cached get rid of boilerplate switches in posix_acl.h fix block device fallout from ->fsync() changes
2011-08-05Merge branch 'next' of ↵Linus Torvalds1-0/+10
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: dmaengine: use DEFINE_IDR for static initialization ioat: fix xor_idx_to_desc Avoid section type conflict in dma/ioat/dma_v3.c ioat: Adding PCI IDs for IOAT devices on SandyBridge platforms
2011-08-04nfs_xdr: Move nfs4_string definition out of #ifdef CONFIG_NFS_V4Boaz Harrosh1-5/+5
exofs file system wants to use pnfs_osd_xdr.h file instead of redefining pnfs-objects types in it's private "pnfs.h" headr. Before we do the switch we must make sure pnfs_osd_xdr.h is compilable also under NFS versions smaller than 4.1. Since now it is needed regardless of version, by the exofs code. nfs4_string is not the only nfs4 type out in the global scope. Ack-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-04Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds2-8/+1
* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6: Revert "dt: add of_alias_scan and of_alias_get_id" dt: remove of_alias_get_id() reference
2011-08-04drm: Separate EDID Header Check from EDID Block CheckThomas Reim1-0/+1
Provides function drm_edid_header_is_valid() for EDID header check and replaces EDID header check part of function drm_edid_block_valid() by a call of drm_edid_header_is_valid(). This is a prerequisite to extend DDC probing, e. g. in function radeon_ddc_probe() for Radeon devices, by a central EDID header check. Tested for kernel 2.6.35, 2.6.38 and 3.0 Cc: <stable@kernel.org> Signed-off-by: Thomas Reim <reimth@gmail.com> Reviewed-by: Alex Deucher <alexdeucher@gmail.com> Acked-by: Stephen Michaels <Stephen.Micheals@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-08-04Merge branch 'drm-intel-next' of ↵Dave Airlie2-1/+3
ssh://master.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6 into drm-fixes * 'drm-intel-next' of ssh://master.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6: (42 commits) drm/i915: allow cache sharing policy control drm/i915/hdmi: HDMI source product description infoframe support drm/i915/hdmi: split infoframe setting from infoframe type code drm: track CEA version number if present drm/i915: Try enabling RC6 by default (again) Revert "drm/i915/dp: Zero the DPCD data before connection probe" drm/i915/dp: wait for previous AUX channel activity to clear drm/i915: don't use uninitialized EDID bpc values when picking pipe bpp drm/i915/pch: Save/restore PCH_PORT_HOTPLUG across suspend drm/i915: apply phase pointer override on SNB+ too drm/i915: Add quirk to disable SSC on Sony Vaio Y2 drm/i915: provide more error output when mode sets fail drm/i915: add GPU max frequency control file i915: add Dell OptiPlex FX170 to intel_no_lvds drm/i915: Ignore GPU wedged errors while pinning scanout buffers drm/i915/hdmi: send AVI info frames on ILK+ as well drm/i915: fix CB tuning check for ILK+ drm/i915: Flush other plane register writes drm/i915: flush plane control changes on ILK+ as well drm/i915: apply timing generator bug workaround on CPT and PPT ...
2011-08-04Revert "dt: add of_alias_scan and of_alias_get_id"Grant Likely2-8/+1
This reverts commit 750f463a749e28464151ad26938d11b07b1c43cb. of_alias_* still needs work to be generalized for 'promtree' dt platforms, and to no implicitly create entries for available ids. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-08-04Merge branch 'idle-release' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: cpuidle: stop depending on pm_idle x86 idle: move mwait_idle_with_hints() to where it is used cpuidle: replace xen access to x86 pm_idle and default_idle cpuidle: create bootparam "cpuidle.off=1" mrst_pmu: driver for Intel Moorestown Power Management Unit
2011-08-04Merge branch 'apei-release' of ↵Linus Torvalds6-6/+163
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'apei-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: ACPI, APEI, EINJ Param support is disabled by default APEI GHES: 32-bit buildfix ACPI: APEI build fix ACPI, APEI, GHES: Add hardware memory error recovery support HWPoison: add memory_failure_queue() ACPI, APEI, GHES, Error records content based throttle ACPI, APEI, GHES, printk support for recoverable error via NMI lib, Make gen_pool memory allocator lockless lib, Add lock-less NULL terminated single list Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG ACPI, APEI, Add WHEA _OSC support ACPI, APEI, Add APEI bit support in generic _OSC call ACPI, APEI, GHES, Support disable GHES at boot time ACPI, APEI, GHES, Prevent GHES to be built as module ACPI, APEI, Use apei_exec_run_optional in APEI EINJ and ERST ACPI, APEI, Add apei_exec_run_optional ACPI, APEI, GHES, Do not ratelimit fatal error printk before panic ACPI, APEI, ERST, Fix erst-dbg long record reading issue ACPI, APEI, ERST, Prevent erst_dbg from loading if ERST is disabled
2011-08-04Merge branch 'devicetree/next' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds2-1/+8
* 'devicetree/next' of git://git.secretlab.ca/git/linux-2.6: dt: add of_alias_scan and of_alias_get_id
2011-08-04drm: track CEA version number if presentJesse Barnes1-0/+2
Drivers need to know the CEA version number in addition to other display info (like whether the display is an HDMI sink) before enabling certain features. So track the CEA version number in the display info structure. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Keith Packard <keithp@keithp.com>
2011-08-04tmpfs radix_tree: locate_item to speed up swapoffHugh Dickins1-0/+1
We have already acknowledged that swapoff of a tmpfs file is slower than it was before conversion to the generic radix_tree: a little slower there will be acceptable, if the hotter paths are faster. But it was a shock to find swapoff of a 500MB file 20 times slower on my laptop, taking 10 minutes; and at that rate it significantly slows down my testing. Now, most of that turned out to be overhead from PROVE_LOCKING and PROVE_RCU: without those it was only 4 times slower than before; and more realistic tests on other machines don't fare as badly. I've tried a number of things to improve it, including tagging the swap entries, then doing lookup by tag: I'd expected that to halve the time, but in practice it's erratic, and often counter-productive. The only change I've so far found to make a consistent improvement, is to short-circuit the way we go back and forth, gang lookup packing entries into the array supplied, then shmem scanning that array for the target entry. Scanning in place doubles the speed, so it's now only twice as slow as before (or three times slower when the PROVEs are on). So, add radix_tree_locate_item() as an expedient, once-off, single-caller hack to do the lookup directly in place. #ifdef it on CONFIG_SHMEM and CONFIG_SWAP, as much to document its limited applicability as save space in other configurations. And, sadly, #include sched.h for cond_resched(). Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04tmpfs: use kmemdup for short symlinksHugh Dickins1-8/+3
But we've not yet removed the old swp_entry_t i_direct[16] from shmem_inode_info. That's because it was still being shared with the inline symlink. Remove it now (saving 64 or 128 bytes from shmem inode size), and use kmemdup() for short symlinks, say, those up to 128 bytes. I wonder why mpol_free_shared_policy() is done in shmem_destroy_inode() rather than shmem_evict_inode(), where we usually do such freeing? I guess it doesn't matter, and I'm not into NUMA mpol testing right now. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04tmpfs: convert mem_cgroup shmem to radix-swapHugh Dickins2-10/+0
Remove mem_cgroup_shmem_charge_fallback(): it was only required when we had to move swappage to filecache with GFP_NOWAIT. Remove the GFP_NOWAIT special case from mem_cgroup_cache_charge(), by moving its call out from shmem_add_to_page_cache() to two of thats three callers. But leave it doing mem_cgroup_uncharge_cache_page() on error: although asymmetrical, it's easier for all 3 callers to handle. These two changes would also be appropriate if anyone were to start using shmem_read_mapping_page_gfp() with GFP_NOWAIT. Remove mem_cgroup_get_shmem_target(): mc_handle_file_pte() can test radix_tree_exceptional_entry() to get what it needs for itself. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04tmpfs: miscellaneous trivial cleanupsHugh Dickins1-1/+1
While it's at its least, make a number of boring nitpicky cleanups to shmem.c, mostly for consistency of variable naming. Things like "swap" instead of "entry", "pgoff_t index" instead of "unsigned long idx". And since everything else here is prefixed "shmem_", better change init_tmpfs() to shmem_init(). Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04tmpfs: demolish old swap vector supportHugh Dickins1-2/+0
The maximum size of a shmem/tmpfs file has been limited by the maximum size of its triple-indirect swap vector. With 4kB page size, maximum filesize was just over 2TB on a 32-bit kernel, but sadly one eighth of that on a 64-bit kernel. (With 8kB page size, maximum filesize was just over 4TB on a 64-bit kernel, but 16TB on a 32-bit kernel, MAX_LFS_FILESIZE being then more restrictive than swap vector layout.) It's a shame that tmpfs should be more restrictive than ramfs, and this limitation has now been noticed. Add another level to the swap vector? No, it became obscure and hard to maintain, once I complicated it to make use of highmem pages nine years ago: better choose another way. Surely, if 2.4 had had the radix tree pagecache introduced in 2.5, then tmpfs would never have invented its own peculiar radix tree: we would have fitted swap entries into the common radix tree instead, in much the same way as we fit swap entries into page tables. And why should each file have a separate radix tree for its pages and for its swap entries? The swap entries are required precisely where and when the pages are not. We want to put them together in a single radix tree: which can then avoid much of the locking which was needed to prevent them from being exchanged underneath us. This also avoids the waste of memory devoted to swap vectors, first in the shmem_inode itself, then at least two more pages once a file grew beyond 16 data pages (pages accounted by df and du, but not by memcg). Allocated upfront, to avoid allocation when under swapping pressure, but pure waste when CONFIG_SWAP is not set - I have never spattered around the ifdefs to prevent that, preferring this move to sharing the common radix tree instead. There are three downsides to sharing the radix tree. One, that it binds tmpfs more tightly to the rest of mm, either requiring knowledge of swap entries in radix tree there, or duplication of its code here in shmem.c. I believe that the simplications and memory savings (and probable higher performance, not yet measured) justify that. Two, that on HIGHMEM systems with SWAP enabled, it's the lowmem radix nodes that cannot be freed under memory pressure - whereas before it was the less precious highmem swap vector pages that could not be freed. I'm hoping that 64-bit has now been accessible for long enough, that the highmem argument has grown much less persuasive. Three, that swapoff is slower than it used to be on tmpfs files, since it's using a simple generic mechanism not tailored to it: I find this noticeable, and shall want to improve, but maybe nobody else will notice. So... now remove most of the old swap vector code from shmem.c. But, for the moment, keep the simple i_direct vector of 16 pages, with simple accessors shmem_put_swap() and shmem_get_swap(), as a toy implementation to help mark where swap needs to be handled in subsequent patches. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04mm: let swap use exceptional entriesHugh Dickins1-0/+23
If swap entries are to be stored along with struct page pointers in a radix tree, they need to be distinguished as exceptional entries. Most of the handling of swap entries in radix tree will be contained in shmem.c, but a few functions in filemap.c's common code need to check for their appearance: find_get_page(), find_lock_page(), find_get_pages() and find_get_pages_contig(). So as not to slow their fast paths, tuck those checks inside the existing checks for unlikely radix_tree_deref_slot(); except for find_lock_page(), where it is an added test. And make it a BUG in find_get_pages_tag(), which is not applied to tmpfs files. A part of the reason for eliminating shmem_readpage() earlier, was to minimize the places where common code would need to allow for swap entries. The swp_entry_t known to swapfile.c must be massaged into a slightly different form when stored in the radix tree, just as it gets massaged into a pte_t when stored in page tables. In an i386 kernel this limits its information (type and page offset) to 30 bits: given 32 "types" of swapfile and 4kB pagesize, that's a maximum swapfile size of 128GB. Which is less than the 512GB we previously allowed with X86_PAE (where the swap entry can occupy the entire upper 32 bits of a pte_t), but not a new limitation on 32-bit without PAE; and there's not a new limitation on 64-bit (where swap filesize is already limited to 16TB by a 32-bit page offset). Thirty areas of 128GB is probably still enough swap for a 64GB 32-bit machine. Provide swp_to_radix_entry() and radix_to_swp_entry() conversions, and enforce filesize limit in read_swap_header(), just as for ptes. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04radix_tree: exceptional entries and indicesHugh Dickins1-3/+33
A patchset to extend tmpfs to MAX_LFS_FILESIZE by abandoning its peculiar swap vector, instead keeping a file's swap entries in the same radix tree as its struct page pointers: thus saving memory, and simplifying its code and locking. This patch: The radix_tree is used by several subsystems for different purposes. A major use is to store the struct page pointers of a file's pagecache for memory management. But what if mm wanted to store something other than page pointers there too? The low bit of a radix_tree entry is already used to denote an indirect pointer, for internal use, and the unlikely radix_tree_deref_retry() case. Define the next bit as denoting an exceptional entry, and supply inline functions radix_tree_exception() to return non-0 in either unlikely case, and radix_tree_exceptional_entry() to return non-0 in the second case. If a subsystem already uses radix_tree with that bit set, no problem: it does not affect internal workings at all, but is defined for the convenience of those storing well-aligned pointers in the radix_tree. The radix_tree_gang_lookups have an implicit assumption that the caller can deduce the offset of each entry returned e.g. by the page->index of a struct page. But that may not be feasible for some kinds of item to be stored there. radix_tree_gang_lookup_slot() allow for an optional indices argument, output array in which to return those offsets. The same could be added to other radix_tree_gang_lookups, but for now keep it to the only one for which we need it. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04drivers/video/backlight/aat2870_bl.c: fix setting max_currentAxel Lin1-1/+1
- Current implementation tests wrong value for setting aat2870_bl->max_current. - In the current implementation, we cannot differentiate between 2 cases: a) if pdata->max_current is not set , or b) pdata->max_current is set to AAT2870_CURRENT_0_45 (which is also 0). Fix it by setting AAT2870_CURRENT_0_45 to be 1 and adjust the equation in aat2870_brightness() accordingly. Signed-off-by: Axel Lin <axel.lin@gmail.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Samuel Ortiz <sameo@linux.intel.com> Tested-by: Jin Park <jinyoungp@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04mm: page_alloc: increase __GFP_BITS_SHIFT to include __GFP_OTHER_NODEJohannes Weiner1-1/+1
__GFP_OTHER_NODE is used for NUMA allocations on behalf of other nodes. It's supposed to be passed through from the page allocator to zone_statistics(), but it never gets there as gfp_allowed_mask is not wide enough and masks out the flag early in the allocation path. The result is an accounting glitch where successful NUMA allocations by-agent are not properly attributed as local. Increase __GFP_BITS_SHIFT so that it includes __GFP_OTHER_NODE. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04ida: simplified functions for id allocationRusty Russell1-0/+4
The current hyper-optimized functions are overkill if you simply want to allocate an id for a device. Create versions which use an internal lock. In followup patches, numerous drivers are converted to use this interface. Thanks to Tejun for feedback. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Jonathan Cameron <jic23@cam.ac.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04fault-injection: add ability to export fault_attr in arbitrary directoryAkinobu Mita1-13/+5
init_fault_attr_dentries() is used to export fault_attr via debugfs. But it can only export it in debugfs root directory. Per Forlin is working on mmc_fail_request which adds support to inject data errors after a completed host transfer in MMC subsystem. The fault_attr for mmc_fail_request should be defined per mmc host and export it in debugfs directory per mmc host like /sys/kernel/debug/mmc0/mmc_fail_request. init_fault_attr_dentries() doesn't help for mmc_fail_request. So this introduces fault_create_debugfs_attr() which is able to create a directory in the arbitrary directory and replace init_fault_attr_dentries(). [akpm@linux-foundation.org: extraneous semicolon, per Randy] Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Tested-by: Per Forlin <per.forlin@linaro.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04cpuidle: stop depending on pm_idleLen Brown1-0/+2
cpuidle users should call cpuidle_call_idle() directly rather than via (pm_idle)() function pointer. Architecture may choose to continue using (pm_idle)(), but cpuidle need not depend on it: my_arch_cpu_idle() ... if(cpuidle_call_idle()) pm_idle(); cc: Kevin Hilman <khilman@deeprootsystems.com> cc: Paul Mundt <lethal@linux-sh.org> cc: x86@kernel.org Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-04cpuidle: replace xen access to x86 pm_idle and default_idleLen Brown1-0/+2
When a Xen Dom0 kernel boots on a hypervisor, it gets access to the raw-hardware ACPI tables. While it parses the idle tables for the hypervisor's beneift, it uses HLT for its own idle. Rather than have xen scribble on pm_idle and access default_idle, have it simply disable_cpuidle() so acpi_idle will not load and architecture default HLT will be used. cc: xen-devel@lists.xensource.com Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-03Merge branch 'apei' into apei-releaseLen Brown6-6/+163
Some trivial conflicts due to other various merges adding to the end of common lists sooner than this one. arch/ia64/Kconfig arch/powerpc/Kconfig arch/x86/Kconfig lib/Kconfig lib/Makefile Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-03ACPI: APEI build fixLen Brown1-0/+4
as GHES is optional... When # CONFIG_ACPI_APEI_GHES is not set: (.init.text+0x4c22): undefined reference to `ghes_disable' Reported-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Len Brown <len.brown@intel.com>