summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-03-30KVM: arm/arm64: prepare GICv3 emulation to use kvm_io_bus MMIO handlingAndre Przywara2-1/+39
Using the framework provided by the recent vgic.c changes, we register a kvm_io_bus device on mapping the virtual GICv3 resources. The distributor mapping is pretty straight forward, but the redistributors need some more love, since they need to be tagged with the respective redistributor (read: VCPU) they are connected with. We use the kvm_io_bus framework to register one devices per VCPU. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-30KVM: arm/arm64: merge GICv3 RD_base and SGI_base register framesAndre Przywara1-91/+83
Currently we handle the redistributor registers in two separate MMIO regions, one for the overall behaviour and SPIs and one for the SGIs/PPIs. That latter forces the creation of _two_ KVM I/O bus devices for each redistributor. Since the spec mandates those two pages to be contigious, we could as well merge them and save the churn with the second KVM I/O bus device. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-27KVM: arm/arm64: prepare GICv2 emulation to be handled by kvm_io_busAndre Przywara2-6/+17
Using the framework provided by the recent vgic.c changes we register a kvm_io_bus device when initializing the virtual GICv2. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-27KVM: arm/arm64: implement kvm_io_bus MMIO handling for the VGICAndre Przywara3-0/+145
Currently we use a lot of VGIC specific code to do the MMIO dispatching. Use the previous reworks to add kvm_io_bus style MMIO handlers. Those are not yet called by the MMIO abort handler, also the actual VGIC emulator function do not make use of it yet, but will be enabled with the following patches. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-27KVM: arm/arm64: simplify vgic_find_range() and callersAndre Przywara3-17/+10
The vgic_find_range() function in vgic.c takes a struct kvm_exit_mmio argument, but actually only used the length field in there. Since we need to get rid of that structure in that part of the code anyway, let's rework the function (and it's callers) to pass the length argument to the function directly. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-27KVM: arm/arm64: rename struct kvm_mmio_range to vgic_io_rangeAndre Przywara4-22/+22
The name "kvm_mmio_range" is a bit bold, given that it only covers the VGIC's MMIO ranges. To avoid confusion with kvm_io_range, rename it to vgic_io_range. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-27KVM: x86: remove now unneeded include directory from MakefileAndre Przywara1-1/+1
virt/kvm was never really a good include directory for anything else than locally included headers. With the move of iodev.h there is no need anymore to add this directory the compiler's include path, so remove it from the x86 kvm Makefile. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-27KVM: arm/arm64: remove now unneeded include directory from MakefileAndre Przywara2-2/+2
virt/kvm was never really a good include directory for anything else than locally included headers. With the move of iodev.h there is no need anymore to add this directory the compiler's include path, so remove it from the arm and arm64 kvm Makefile. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-27KVM: move iodev.h from virt/kvm/ to include/kvmAndre Przywara9-11/+10
iodev.h contains definitions for the kvm_io_bus framework. This is needed both by the generic KVM code in virt/kvm as well as by architecture specific code under arch/. Putting the header file in virt/kvm and using local includes in the architecture part seems at least dodgy to me, so let's move the file into include/kvm, so that a more natural "#include <kvm/iodev.h>" can be used by all of the code. This also solves a problem later when using struct kvm_io_device in arm_vgic.h. Fixing up the FSF address in the GPL header and a wrong include path on the way. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-27KVM: Redesign kvm_io_bus_ API to pass VCPU structure to the callbacks.Nikolay Nikolaev14-64/+79
This is needed in e.g. ARM vGIC emulation, where the MMIO handling depends on the VCPU that does the access. Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-14arm/arm64: KVM: Fix migration race in the arch timerChristoffer Dall3-10/+39
When a VCPU is no longer running, we currently check to see if it has a timer scheduled in the future, and if it does, we schedule a host hrtimer to notify is in case the timer expires while the VCPU is still not running. When the hrtimer fires, we mask the guest's timer and inject the timer IRQ (still relying on the guest unmasking the time when it receives the IRQ). This is all good and fine, but when migration a VM (checkpoint/restore) this introduces a race. It is unlikely, but possible, for the following sequence of events to happen: 1. Userspace stops the VM 2. Hrtimer for VCPU is scheduled 3. Userspace checkpoints the VGIC state (no pending timer interrupts) 4. The hrtimer fires, schedules work in a workqueue 5. Workqueue function runs, masks the timer and injects timer interrupt 6. Userspace checkpoints the timer state (timer masked) At restore time, you end up with a masked timer without any timer interrupts and your guest halts never receiving timer interrupts. Fix this by only kicking the VCPU in the workqueue function, and sample the expired state of the timer when entering the guest again and inject the interrupt and mask the timer only then. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-14arm/arm64: KVM: support for un-queuing active IRQsChristoffer Dall4-38/+212
Migrating active interrupts causes the active state to be lost completely. This implements some additional bitmaps to track the active state on the distributor and export this to user space. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-14arm/arm64: KVM: add a common vgic_queue_irq_to_lr fnAlex Bennée1-7/+17
This helps re-factor away some of the repetitive code and makes the code flow more nicely. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-14arm/arm64: KVM: export VCPU power state via MP_STATE ioctlAlex Bennée2-6/+31
To cleanly restore an SMP VM we need to ensure that the current pause state of each vcpu is correctly recorded. Things could get confused if the CPU starts running after migration restore completes when it was paused before it state was captured. We use the existing KVM_GET/SET_MP_STATE ioctl to do this. The arm/arm64 interface is a lot simpler as the only valid states are KVM_MP_STATE_RUNNABLE and KVM_MP_STATE_STOPPED. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-13arm/arm64: KVM: Optimize handling of Access Flag faultsMarc Zyngier2-0/+61
Now that we have page aging in Stage-2, it becomes obvious that we're doing way too much work handling the fault. The page is not going anywhere (it is still mapped), the page tables are already allocated, and all we want is to flip a bit in the PMD or PTE. Also, we can avoid any form of TLB invalidation, since a page with the AF bit off is not allowed to be cached. An obvious solution is to have a separate handler for FSC_ACCESS, where we pride ourselves to only do the very minimum amount of work. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-13arm/arm64: KVM: Implement Stage-2 page agingMarc Zyngier7-23/+104
Until now, KVM/arm didn't care much for page aging (who was swapping anyway?), and simply provided empty hooks to the core KVM code. With server-type systems now being available, things are quite different. This patch implements very simple support for page aging, by clearing the Access flag in the Stage-2 page tables. On access fault, the current fault handling will write the PTE or PMD again, putting the Access flag back on. It should be possible to implement a much faster handling for Access faults, but that's left for a later patch. With this in place, performance in VMs is degraded much more gracefully. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-13arm/arm64: KVM: Allow handle_hva_to_gpa to return a valueMarc Zyngier1-9/+14
So far, handle_hva_to_gpa was never required to return a value. As we prepare to age pages at Stage-2, we need to be able to return a value from the iterator (kvm_test_age_hva). Adapt the code to handle this situation. No semantic change. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-12KVM: arm/arm64: add irqfd supportEric Auger9-3/+66
This patch enables irqfd on arm/arm64. Both irqfd and resamplefd are supported. Injection is implemented in vgic.c without routing. This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD. KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set. Irqfd injection is restricted to SPI. The rationale behind not supporting PPI irqfd injection is that any device using a PPI would be a private-to-the-CPU device (timer for instance), so its state would have to be context-switched along with the VCPU and would require in-kernel wiring anyhow. It is not a relevant use case for irqfds. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-12KVM: arm/arm64: remove coarse grain dist locking at kvm_vgic_sync_hwstateEric Auger1-8/+5
To prepare for irqfd addition, coarse grain locking is removed at kvm_vgic_sync_hwstate level and finer grain locking is introduced in vgic_process_maintenance only. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-12KVM: arm/arm64: implement kvm_arch_intc_initializedEric Auger3-0/+9
On arm/arm64 the VGIC is dynamically instantiated and it is useful to expose its state, especially for irqfd setup. This patch defines __KVM_HAVE_ARCH_INTC_INITIALIZED and implements kvm_arch_intc_initialized. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-12KVM: introduce kvm_arch_intc_initialized and use it in irqfdEric Auger2-0/+17
Introduce __KVM_HAVE_ARCH_INTC_INITIALIZED define and associated kvm_arch_intc_initialized function. This latter allows to test whether the virtual interrupt controller is initialized and ready to accept virtual IRQ injection. On some architectures, the virtual interrupt controller is dynamically instantiated, justifying that kind of check. The new function can now be used by irqfd to check whether the virtual interrupt controller is ready on KVM_IRQFD request. If not, KVM_IRQFD returns -EAGAIN. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-12KVM: arm/arm64: unset CONFIG_HAVE_KVM_IRQCHIPEric Auger2-2/+0
CONFIG_HAVE_KVM_IRQCHIP is needed to support IRQ routing (along with irq_comm.c and irqchip.c usage). This is not the case for arm/arm64 currently. This patch unsets the flag for both arm and arm64. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-12arm/arm64: KVM: Kill CONFIG_KVM_ARM_{VGIC,TIMER}Christoffer Dall10-202/+20
We can definitely decide at run-time whether to use the GIC and timers or not, and the extra code and data structures that we allocate space for is really negligable with this config option, so I don't think it's worth the extra complexity of always having to define stub static inlines. The !CONFIG_KVM_ARM_VGIC/TIMER case is pretty much an untested code path anyway, so we're better off just getting rid of it. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-11KVM: arm/arm64: prefer IS_ENABLED to a static variablePaolo Bonzini1-12/+5
IS_ENABLED gives compile-time checking and keeps the code clearer. The one exception is inside kvm_vm_ioctl_check_extension, where the established idiom is to wrap the case labels with an #ifdef. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-11KVM: vgic: add virt-capable compatible stringsMark Rutland1-2/+4
Several dts only list "arm,cortex-a7-gic" or "arm,gic-400" in their GIC compatible list, and while this is correct (and supported by the GIC driver), KVM will fail to detect that it can support these cases. This patch adds the missing strings to the VGIC code. The of_device_id entries are padded to keep the probe function data aligned. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michal Simek <monstr@monstr.eu> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-02-24KVM: emulate: fix CMPXCHG8B on 32-bit hostsPaolo Bonzini1-1/+2
This has been broken for a long time: it broke first in 2.6.35, then was almost fixed in 2.6.36 but this one-liner slipped through the cracks. The bug shows up as an infinite loop in Windows 7 (and newer) boot on 32-bit hosts without EPT. Windows uses CMPXCHG8B to write to page tables, which causes a page fault if running without EPT; the emulator is then called from kvm_mmu_page_fault. The loop then happens if the higher 4 bytes are not 0; the common case for this is that the NX bit (bit 63) is 1. Fixes: 6550e1f165f384f3a46b60a1be9aba4bc3c2adad Fixes: 16518d5ada690643453eb0aef3cc7841d3623c2d Cc: stable@vger.kernel.org # 2.6.35+ Reported-by: Erik Rull <erik.rull@rdsoftware.de> Tested-by: Erik Rull <erik.rull@rdsoftware.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-24KVM: VMX: fix build without CONFIG_SMPRadim Krčmář1-9/+14
'apic' is not defined if !CONFIG_X86_64 && !CONFIG_X86_LOCAL_APIC. Posted interrupt makes no sense without CONFIG_SMP, and CONFIG_X86_LOCAL_APIC will be set with it. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-24arm/arm64: KVM: Add exit reaons to kvm_exit event tracingWei Huang2-4/+8
This patch extends trace_kvm_exit() to include KVM exit reasons (i.e. EC of HSR). The tracing function then dumps both exit reason and PC of vCPU, shown as the following. Tracing tools can use this new exit_reason field to better understand the behavior of guest VMs. 886.301252: kvm_exit: HSR_EC: 0x0024, PC: 0xfffffe0000506b28 Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-02-24ARM: KVM: Fix size check in __coherent_cache_guest_pageJan Kiszka1-1/+1
The check is supposed to catch page-unaligned sizes, not the inverse. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-02-23Linux 4.0-rc1v4.0-rc1Linus Torvalds1-4/+4
.. after extensive statistical analysis of my G+ polling, I've come to the inescapable conclusion that internet polls are bad. Big surprise. But "Hurr durr I'ma sheep" trounced "I like online polls" by a 62-to-38% margin, in a poll that people weren't even supposed to participate in. Who can argue with solid numbers like that? 5,796 votes from people who can't even follow the most basic directions? In contrast, "v4.0" beat out "v3.20" by a slimmer margin of 56-to-44%, but with a total of 29,110 votes right now. Now, arguably, that vote spread is only about 3,200 votes, which is less than the almost six thousand votes that the "please ignore" poll got, so it could be considered noise. But hey, I asked, so I'll honor the votes.
2015-02-23Merge tag 'ext4_for_linus' of ↵Linus Torvalds5-56/+108
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Ext4 bug fixes. We also reserved code points for encryption and read-only images (for which the implementation is mostly just the reserved code point for a read-only feature :-)" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix indirect punch hole corruption ext4: ignore journal checksum on remount; don't fail ext4: remove duplicate remount check for JOURNAL_CHECKSUM change ext4: fix mmap data corruption in nodelalloc mode when blocksize < pagesize ext4: support read-only images ext4: change to use setup_timer() instead of init_timer() ext4: reserve codepoints used by the ext4 encryption feature jbd2: complain about descriptor block checksum errors
2015-02-23Merge branch 'for-linus-2' of ↵Linus Torvalds70-758/+907
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted stuff from this cycle. The big ones here are multilayer overlayfs from Miklos and beginning of sorting ->d_inode accesses out from David" * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (51 commits) autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation procfs: fix race between symlink removals and traversals debugfs: leave freeing a symlink body until inode eviction Documentation/filesystems/Locking: ->get_sb() is long gone trylock_super(): replacement for grab_super_passive() fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) SELinux: Use d_is_positive() rather than testing dentry->d_inode Smack: Use d_is_positive() rather than testing dentry->d_inode TOMOYO: Use d_is_dir() rather than d_inode and S_ISDIR() Apparmor: Use d_is_positive/negative() rather than testing dentry->d_inode Apparmor: mediated_filesystem() should use dentry->d_sb not inode->i_sb VFS: Split DCACHE_FILE_TYPE into regular and special types VFS: Add a fallthrough flag for marking virtual dentries VFS: Add a whiteout dentry type VFS: Introduce inode-getting helpers for layered/unioned fs environments Infiniband: Fix potential NULL d_inode dereference posix_acl: fix reference leaks in posix_acl_create autofs4: Wrong format for printing dentry ...
2015-02-22Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-armLinus Torvalds1-1/+1
Pull ARM fix from Russell King: "Just one fix this time around. __iommu_alloc_buffer() can cause a BUG() if dma_alloc_coherent() is called with either __GFP_DMA32 or __GFP_HIGHMEM set. The patch from Alexandre addresses this" * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: ARM: 8305/1: DMA: Fix kzalloc flags in __iommu_alloc_buffer()
2015-02-22autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocationAl Viro1-2/+6
X-Coverup: just ask spender Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22procfs: fix race between symlink removals and traversalsAl Viro3-12/+22
use_pde()/unuse_pde() in ->follow_link()/->put_link() resp. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22debugfs: leave freeing a symlink body until inode evictionAl Viro1-17/+17
As it is, we have debugfs_remove() racing with symlink traversals. Supply ->evict_inode() and do freeing there - inode will remain pinned until we are done with the symlink body. And rip the idiocy with checking if dentry is positive right after we'd verified debugfs_positive(), which is a stronger check... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22Documentation/filesystems/Locking: ->get_sb() is long goneAl Viro1-2/+0
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22trylock_super(): replacement for grab_super_passive()Konstantin Khlebnikov3-26/+22
I've noticed significant locking contention in memory reclaimer around sb_lock inside grab_super_passive(). Grab_super_passive() is called from two places: in icache/dcache shrinkers (function super_cache_scan) and from writeback (function __writeback_inodes_wb). Both are required for progress in memory allocator. Grab_super_passive() acquires sb_lock to increment sb->s_count and check sb->s_instances. It seems sb->s_umount locked for read is enough here: super-block deactivation always runs under sb->s_umount locked for write. Protecting super-block itself isn't a problem: in super_cache_scan() sb is protected by shrinker_rwsem: it cannot be freed if its slab shrinkers are still active. Inside writeback super-block comes from inode from bdi writeback list under wb->list_lock. This patch removes locking sb_lock and checks s_instances under s_umount: generic_shutdown_super() unlinks it under sb->s_umount locked for write. New variant is called trylock_super() and since it only locks semaphore, callers must call up_read(&sb->s_umount) instead of drop_super(sb) when they're done. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversionsDavid Howells1-1/+1
Fanotify probably doesn't want to watch autodirs so make it use d_can_lookup() rather than d_is_dir() when checking a dir watch and give an error on fake directories. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversionsDavid Howells4-9/+9
Fix up the following scripted S_ISDIR/S_ISREG/S_ISLNK conversions (or lack thereof) in cachefiles: (1) Cachefiles mostly wants to use d_can_lookup() rather than d_is_dir() as it doesn't want to deal with automounts in its cache. (2) Coccinelle didn't find S_IS* expressions in ASSERT() statements in cachefiles. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)David Howells34-71/+71
Convert the following where appropriate: (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry). (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry). (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more complicated than it appears as some calls should be converted to d_can_lookup() instead. The difference is whether the directory in question is a real dir with a ->lookup op or whether it's a fake dir with a ->d_automount op. In some circumstances, we can subsume checks for dentry->d_inode not being NULL into this, provided we the code isn't in a filesystem that expects d_inode to be NULL if the dirent really *is* negative (ie. if we're going to use d_inode() rather than d_backing_inode() to get the inode pointer). Note that the dentry type field may be set to something other than DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS manages the fall-through from a negative dentry to a lower layer. In such a case, the dentry type of the negative union dentry is set to the same as the type of the lower dentry. However, if you know d_inode is not NULL at the call site, then you can use the d_is_xxx() functions even in a filesystem. There is one further complication: a 0,0 chardev dentry may be labelled DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was intended for special directory entry types that don't have attached inodes. The following perl+coccinelle script was used: use strict; my @callers; open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') || die "Can't grep for S_ISDIR and co. callers"; @callers = <$fd>; close($fd); unless (@callers) { print "No matches\n"; exit(0); } my @cocci = ( '@@', 'expression E;', '@@', '', '- S_ISLNK(E->d_inode->i_mode)', '+ d_is_symlink(E)', '', '@@', 'expression E;', '@@', '', '- S_ISDIR(E->d_inode->i_mode)', '+ d_is_dir(E)', '', '@@', 'expression E;', '@@', '', '- S_ISREG(E->d_inode->i_mode)', '+ d_is_reg(E)' ); my $coccifile = "tmp.sp.cocci"; open($fd, ">$coccifile") || die $coccifile; print($fd "$_\n") || die $coccifile foreach (@cocci); close($fd); foreach my $file (@callers) { chomp $file; print "Processing ", $file, "\n"; system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 || die "spatch failed"; } [AV: overlayfs parts skipped] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22SELinux: Use d_is_positive() rather than testing dentry->d_inodeDavid Howells1-2/+2
Use d_is_positive() rather than testing dentry->d_inode in SELinux to get rid of direct references to d_inode outside of the VFS. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22Smack: Use d_is_positive() rather than testing dentry->d_inodeDavid Howells1-2/+2
Use d_is_positive() rather than testing dentry->d_inode in Smack to get rid of direct references to d_inode outside of the VFS. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22TOMOYO: Use d_is_dir() rather than d_inode and S_ISDIR()David Howells1-3/+1
Use d_is_dir() rather than d_inode and S_ISDIR(). Note that this will include fake directories such as automount triggers. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22Apparmor: Use d_is_positive/negative() rather than testing dentry->d_inodeDavid Howells1-1/+1
Use d_is_positive(dentry) or d_is_negative(dentry) rather than testing dentry->d_inode as the dentry may cover another layer that has an inode when the top layer doesn't or may hold a 0,0 chardev that's actually a whiteout. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22Apparmor: mediated_filesystem() should use dentry->d_sb not inode->i_sbDavid Howells2-12/+12
mediated_filesystem() should use dentry->d_sb not dentry->d_inode->i_sb and should avoid file_inode() also since it is really dealing with the path. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22VFS: Split DCACHE_FILE_TYPE into regular and special typesDavid Howells2-8/+27
Split DCACHE_FILE_TYPE into DCACHE_REGULAR_TYPE (dentries representing regular files) and DCACHE_SPECIAL_TYPE (representing blockdev, chardev, FIFO and socket files). d_is_reg() and d_is_special() are added to detect these subtypes and d_is_file() is left as the union of the two. This allows a number of places that use S_ISREG(dentry->d_inode->i_mode) to use d_is_reg(dentry) instead. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22VFS: Add a fallthrough flag for marking virtual dentriesDavid Howells2-1/+27
Add a DCACHE_FALLTHRU flag to indicate that, in a layered filesystem, this is a virtual dentry that covers another one in a lower layer that should be used instead. This may be recorded on medium if directory integration is stored there. The flag can be set with d_set_fallthru() and tested with d_is_fallthru(). Original-author: Valerie Aurora <vaurora@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22VFS: Add a whiteout dentry typeDavid Howells1-6/+18
Add DCACHE_WHITEOUT_TYPE and provide a d_is_whiteout() accessor function. A d_is_miss() accessor is also added for ordinary cache misses and d_is_negative() is modified to indicate either an ordinary miss or an enforced miss (whiteout). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22VFS: Introduce inode-getting helpers for layered/unioned fs environmentsDavid Howells1-0/+57
Introduce some function for getting the inode (and also the dentry) in an environment where layered/unioned filesystems are in operation. The problem is that we have places where we need *both* the union dentry and the lower source or workspace inode or dentry available, but we can only have a handle on one of them. Therefore we need to derive the handle to the other from that. The idea is to introduce an extra field in struct dentry that allows the union dentry to refer to and pin the lower dentry. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>