summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-12-04drm/amdgpu: Fixup hw vblank counter/ts for new drm_update_vblank_count() (v3)Alex Deucher6-30/+140
commit 4dfd6486 "drm: Use vblank timestamps to guesstimate how many vblanks were missed" introduced in Linux 4.4-rc1 makes the drm core more fragile to drivers which don't update hw vblank counters and vblank timestamps in sync with firing of the vblank irq and essentially at leading edge of vblank. This exposed a problem with radeon-kms/amdgpu-kms which do not satisfy above requirements: The vblank irq fires a few scanlines before start of vblank, but programmed pageflips complete at start of vblank and vblank timestamps update at start of vblank, whereas the hw vblank counter increments only later, at start of vsync. This leads to problems like off by one errors for vblank counter updates, vblank counters apparently going backwards or vblank timestamps apparently having time going backwards. The net result is stuttering of graphics in games, or little hangs, as well as total failure of timing sensitive applications. See bug #93147 for an example of the regression on Linux 4.4-rc: https://bugs.freedesktop.org/show_bug.cgi?id=93147 This patch tries to align all above events better from the viewpoint of the drm core / of external callers to fix the problem: 1. The apparent start of vblank is shifted a few scanlines earlier, so the vblank irq now always happens after start of this extended vblank interval and thereby drm_update_vblank_count() always samples the updated vblank count and timestamp of the new vblank interval. To achieve this, the reporting of scanout positions by radeon_get_crtc_scanoutpos() now operates as if the vblank starts radeon_crtc->lb_vblank_lead_lines before the real start of the hw vblank interval. This means that the vblank timestamps which are based on these scanout positions will now update at this earlier start of vblank. 2. The driver->get_vblank_counter() function will bump the returned vblank count as read from the hw by +1 if the query happens after the shifted earlier start of the vblank, but before the real hw increment at start of vsync, so the counter appears to increment at start of vblank in sync with the timestamp update. 3. Calls from vblank irq-context and regular non-irq calls are now treated identical, always simulating the shifted vblank start, to avoid inconsistent results for queries happening from vblank irq vs. happening from drm_vblank_enable() or vblank_disable_fn(). 4. The radeon_flip_work_func will delay mmio programming a pageflip until the start of the real vblank iff it happens to execute inside the shifted earlier start of the vblank, so pageflips now also appear to execute at start of the shifted vblank, in sync with vblank counter and timestamp updates. This to avoid some races between updates of vblank count and timestamps that are used for swap scheduling and pageflip execution which could cause pageflips to execute before the scheduled target vblank. The lb_vblank_lead_lines "fudge" value is calculated as the size of the display controllers line buffer in scanlines for the given video mode: Vblank irq's are triggered by the line buffer logic when the line buffer refill for a video frame ends, ie. when the line buffer source read position enters the hw vblank. This means that a vblank irq could fire at most as many scanlines before the current reported scanout position of the crtc timing generator as the number of scanlines the line buffer can maximally hold for a given video mode. This patch has been successfully tested on a RV730 card with DCE-3 display engine and on a evergreen card with DCE-4 display engine, in single-display and dual-display configuration, with different video modes. A similar patch is needed for amdgpu-kms to fix the same problem. Limitations: - Maybe replace the udelay() in the flip_work_func() by a suitable usleep_range() for a bit better efficiency? Will try that. - Line buffer sizes in pixels are hard-coded on < DCE-4 to a value i just guessed to be high enough to work ok, lacking info on the true sizes atm. Probably fixes: fdo#93147 Port of Mario's radeon fix to amdgpu. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (v1) Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com> (v2) Refine amdgpu_flip_work_func() for better efficiency. In amdgpu_flip_work_func, replace the busy waiting udelay(5) with event lock held by a more performance and energy efficient usleep_range() until at least predicted true start of hw vblank, with some slack for scheduler happiness. Release the event lock during waits to not delay other outputs in doing their stuff, as the waiting can last up to 200 usecs in some cases. Also small fix to code comment and formatting in that function. (v2) Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> (v3) Fix crash in crtc disabled case
2015-12-04drm/radeon: Fixup hw vblank counter/ts for new drm_update_vblank_count() (v2)Mario Kleiner9-29/+164
commit 4dfd6486 "drm: Use vblank timestamps to guesstimate how many vblanks were missed" introduced in Linux 4.4-rc1 makes the drm core more fragile to drivers which don't update hw vblank counters and vblank timestamps in sync with firing of the vblank irq and essentially at leading edge of vblank. This exposed a problem with radeon-kms/amdgpu-kms which do not satisfy above requirements: The vblank irq fires a few scanlines before start of vblank, but programmed pageflips complete at start of vblank and vblank timestamps update at start of vblank, whereas the hw vblank counter increments only later, at start of vsync. This leads to problems like off by one errors for vblank counter updates, vblank counters apparently going backwards or vblank timestamps apparently having time going backwards. The net result is stuttering of graphics in games, or little hangs, as well as total failure of timing sensitive applications. See bug #93147 for an example of the regression on Linux 4.4-rc: https://bugs.freedesktop.org/show_bug.cgi?id=93147 This patch tries to align all above events better from the viewpoint of the drm core / of external callers to fix the problem: 1. The apparent start of vblank is shifted a few scanlines earlier, so the vblank irq now always happens after start of this extended vblank interval and thereby drm_update_vblank_count() always samples the updated vblank count and timestamp of the new vblank interval. To achieve this, the reporting of scanout positions by radeon_get_crtc_scanoutpos() now operates as if the vblank starts radeon_crtc->lb_vblank_lead_lines before the real start of the hw vblank interval. This means that the vblank timestamps which are based on these scanout positions will now update at this earlier start of vblank. 2. The driver->get_vblank_counter() function will bump the returned vblank count as read from the hw by +1 if the query happens after the shifted earlier start of the vblank, but before the real hw increment at start of vsync, so the counter appears to increment at start of vblank in sync with the timestamp update. 3. Calls from vblank irq-context and regular non-irq calls are now treated identical, always simulating the shifted vblank start, to avoid inconsistent results for queries happening from vblank irq vs. happening from drm_vblank_enable() or vblank_disable_fn(). 4. The radeon_flip_work_func will delay mmio programming a pageflip until the start of the real vblank iff it happens to execute inside the shifted earlier start of the vblank, so pageflips now also appear to execute at start of the shifted vblank, in sync with vblank counter and timestamp updates. This to avoid some races between updates of vblank count and timestamps that are used for swap scheduling and pageflip execution which could cause pageflips to execute before the scheduled target vblank. The lb_vblank_lead_lines "fudge" value is calculated as the size of the display controllers line buffer in scanlines for the given video mode: Vblank irq's are triggered by the line buffer logic when the line buffer refill for a video frame ends, ie. when the line buffer source read position enters the hw vblank. This means that a vblank irq could fire at most as many scanlines before the current reported scanout position of the crtc timing generator as the number of scanlines the line buffer can maximally hold for a given video mode. This patch has been successfully tested on a RV730 card with DCE-3 display engine and on a evergreen card with DCE-4 display engine, in single-display and dual-display configuration, with different video modes. A similar patch is needed for amdgpu-kms to fix the same problem. Limitations: - Line buffer sizes in pixels are hard-coded on < DCE-4 to a value i just guessed to be high enough to work ok, lacking info on the true sizes atm. Fixes: fdo#93147 Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Michel Dänzer <michel.daenzer@amd.com> Cc: Harry Wentland <Harry.Wentland@amd.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1) Tested-by: Dave Witbrodt <dawitbro@sbcglobal.net> (v2) Refine radeon_flip_work_func() for better efficiency: In radeon_flip_work_func, replace the busy waiting udelay(5) with event lock held by a more performance and energy efficient usleep_range() until at least predicted true start of hw vblank, with some slack for scheduler happiness. Release the event lock during waits to not delay other outputs in doing their stuff, as the waiting can last up to 200 usecs in some cases. Retested on DCE-3 and DCE-4 to verify it still works nicely. (v2) Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-04drm/radeon: Retry DDC probing on DVI on failure if we got an HPD interruptLyude10-12/+32
HPD signals on DVI ports can be fired off before the pins required for DDC probing actually make contact, due to the pins for HPD making contact first. This results in a HPD signal being asserted but DDC probing failing, resulting in hotplugging occasionally failing. This is somewhat rare on most cards (depending on what angle you plug the DVI connector in), but on some cards it happens constantly. The Radeon R5 on the machine used for testing this patch for instance, runs into this issue just about every time I try to hotplug a DVI monitor and as a result hotplugging almost never works. Rescheduling the hotplug work for a second when we run into an HPD signal with a failing DDC probe usually gives enough time for the rest of the connector's pins to make contact, and fixes this issue. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lyude <cpaul@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-04drm/amdgpu: add spin lock to protect freed list in vm (v2)jimqu2-3/+15
there is a protection fault about freed list when OCL test. add a spin lock to protect it. v2: drop changes in vm_fini Signed-off-by: JimQu <jim.qu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>
2015-12-04drm/amdgpu: partially revert "drm/amdgpu: fix ↵Christian König2-2/+2
VM_CONTEXT*_PAGE_TABLE_END_ADDR" v2 The gtt_end is already inclusive, we don't need to subtract one here. v2 (chk): keep the fix for the VM code, cause here it really applies. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Anatoli Antonovitch <anatoli.antonovitch@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-04drm/amdgpu: take a BO reference for the user fenceChristian König1-2/+4
No need for a GEM reference here. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-04drm/amdgpu: take a BO reference in the display codeChristian König1-3/+3
No need for the GEM reference here. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-04drm/amdgpu: set snooped flags only on system addresses v2Christian König1-3/+4
Not necessary for VRAM. v2: no need to check if ttm is NULL. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-02drm/amdgpu: fix race condition in amd_sched_entity_push_jobNicolai Hähnle1-2/+3
As soon as we leave the spinlock after the job has been added to the job queue, we can no longer rely on the job's data to be available. I have seen a null-pointer dereference due to sched == NULL in amd_sched_wakeup via amd_sched_entity_push_job and amd_sched_ib_submit_kernel_helper. Since the latter initializes sched_job->sched with the address of the ring scheduler, which is guaranteed to be non-NULL, this race appears to be a likely culprit. Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Bugzilla: https://bugs.freedesktop.org/attachment.cgi?bugid=93079 Reviewed-by: Christian König <christian.koenig@amd.com>
2015-12-02drm/amdgpu: add err check for pin userptrChunming Zhou1-3/+7
Missing error check if the operation failed. Signed-off-by: Chunming Zhou <David1.Zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>
2015-11-30add blacklist for thinkpad T40pPavel Machek1-0/+3
Thinkpad T40p needs agpmode 1. Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-30drm/amdgpu: fix VM page table reference countingChristian König3-0/+7
We use the reservation object of the page directory for the page tables as well, because of this the page directory should be freed last. Ensure that by keeping a reference from the page tables to the directory. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-30drm/amdgpu: fix userptr flags checkChristian König1-2/+3
That got messed up while porting it from Radeon. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-27Merge branch 'linux-4.4' of ↵Dave Airlie16-917/+948
git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes Ben Skeggs wrote: A couple of regression fixes, some more boards whitelisted for a hw bug workaround, gr/ucode fixes for hangs a user is seeing. The changes look larger than they actually are due to the ucode binaries (*.fucN.h) being regenerated. * 'linux-4.4' of git://anongit.freedesktop.org/git/nouveau/linux-2.6: drm/nouveau/volt/pwm/gk104: fix an off-by-one resulting in the voltage not being set drm/nouveau/nvif: allow userspace access to its own client object drm/nouveau/gr/gf100-: fix oops when calling zbc methods drm/nouveau/gr/gf117-: assume no PPC if NV_PGRAPH_GPC_GPM_PD_PES_TPC_ID_MASK is zero drm/nouveau/gr/gf117-: read NV_PGRAPH_GPC_GPM_PD_PES_TPC_ID_MASK from correct GPC drm/nouveau/gr/gf100-: split out per-gpc address calculation macro drm/nouveau/bios: return actual size of the buffer retrieved via _ROM drm/nouveau/instmem: protect instobj list with a spinlock drm/nouveau/pci: enable c800 magic for some unknown Samsung laptop drm/nouveau/pci: enable c800 magic for Clevo P157SM
2015-11-26Merge branch 'drm-fixes-4.4' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie11-76/+145
into drm-fixes Radeon and amdgpu fixes for 4.4: - DPM fixes for r7xx devices - VCE fixes for Stoney - GPUVM fixes - Scheduler fixes * 'drm-fixes-4.4' of git://people.freedesktop.org/~agd5f/linux: drm/radeon: make some dpm errors debug only drm/radeon: make rv770_set_sw_state failures non-fatal drm/amdgpu: move dependency handling out of atomic section v2 drm/amdgpu: optimize scheduler fence handling drm/amdgpu: remove vm->mutex drm/amdgpu: add mutex for ba_va->valids/invalids drm/amdgpu: adapt vce session create interface changes drm/amdgpu: vce use multiple cache surface starting from stoney drm/amdgpu: reset vce trap interrupt flag
2015-11-26Merge branch 'for-linus' of ↵Linus Torvalds2-9/+10
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "A couple of fixes for sendfile lockups caught by Dmitry + a fix for ancient sysvfs symlink breakage" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: Avoid softlockups with sendfile(2) vfs: Make sendfile(2) killable even better fix sysvfs symlinks
2015-11-25Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds3-16/+24
Pull more block layer fixes from Jens Axboe: "I wasn't going to send off a new pull before next week, but the blk flush fix from Jan from the other day introduced a regression. It's rare enough not to have hit during testing, since it requires both a device that rejects the first flush, and bad timing while it does that. But since someone did hit it, let's get the revert into 4.4-rc3 so we don't have a released rc with that known issue. Apart from that revert, three other fixes: - From Christoph, a fix for a missing unmap in NVMe request preparation. - An NVMe fix from Nishanth that fixes data corruption on powerpc. - Also from Christoph, fix a list_del() attempt on blk-mq that didn't have a matching list_add() at timer start" * 'for-linus' of git://git.kernel.dk/linux-block: Revert "blk-flush: Queue through IO scheduler when flush not required" block: fix blk_abort_request for blk-mq drivers nvme: add missing unmaps in nvme_queue_rq NVMe: default to 4k device page size
2015-11-25Revert "blk-flush: Queue through IO scheduler when flush not required"Jens Axboe1-1/+1
This reverts commit 1b2ff19e6a957b1ef0f365ad331b608af80e932e. Jan writes: -- Thanks for report! After some investigation I found out we allocate elevator specific data in __get_request() only for non-flush requests. And this is actually required since the flush machinery uses the space in struct request for something else. Doh. So my patch is just wrong and not easy to fix since at the time __get_request() is called we are not sure whether the flush machinery will be used in the end. Jens, please revert 1b2ff19e6a957b1ef0f365ad331b608af80e932e. Thanks! I'm somewhat surprised that you can reliably hit the race where flushing gets disabled for the device just while the request is in flight. But I guess during boot it makes some sense. -- So let's just revert it, we can fix the queue run manually after the fact. This race is rare enough that it didn't trigger in testing, it requires the specific disable-while-in-flight scenario to trigger.
2015-11-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds21-111/+171
Pull KVM fixes from Paolo Bonzini: "Bug fixes for all architectures. Nothing really stands out" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits) KVM: nVMX: remove incorrect vpid check in nested invvpid emulation arm64: kvm: report original PAR_EL1 upon panic arm64: kvm: avoid %p in __kvm_hyp_panic KVM: arm/arm64: vgic: Trust the LR state for HW IRQs KVM: arm/arm64: arch_timer: Preserve physical dist. active state on LR.active KVM: arm/arm64: Fix preemptible timer active state crazyness arm64: KVM: Add workaround for Cortex-A57 erratum 834220 arm64: KVM: Fix AArch32 to AArch64 register mapping ARM/arm64: KVM: test properly for a PTE's uncachedness KVM: s390: fix wrong lookup of VCPUs by array index KVM: s390: avoid memory overwrites on emergency signal injection KVM: Provide function for VCPU lookup by id KVM: s390: fix pfmf intercept handler KVM: s390: enable SIMD only when no VCPUs were created KVM: x86: request interrupt window when IRQ chip is split KVM: x86: set KVM_REQ_EVENT on local interrupt request from user space KVM: x86: split kvm_vcpu_ready_for_interrupt_injection out of dm_request_for_irq_injection KVM: x86: fix interrupt window handling in split IRQ chip case MIPS: KVM: Uninit VCPU in vcpu_create error path MIPS: KVM: Fix CACHE immediate offset sign extension ...
2015-11-25drm/radeon: make some dpm errors debug onlyAlex Deucher2-3/+3
"Could not force DPM to low", etc. is usually harmless and just confuses users. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2015-11-25KVM: nVMX: remove incorrect vpid check in nested invvpid emulationHaozhong Zhang1-5/+0
This patch removes the vpid check when emulating nested invvpid instruction of type all-contexts invalidation. The existing code is incorrect because: (1) According to Intel SDM Vol 3, Section "INVVPID - Invalidate Translations Based on VPID", invvpid instruction does not check vpid in the invvpid descriptor when its type is all-contexts invalidation. (2) According to the same document, invvpid of type all-contexts invalidation does not require there is an active VMCS, so/and get_vmcs12() in the existing code may result in a NULL-pointer dereference. In practice, it can crash both KVM itself and L1 hypervisors that use invvpid (e.g. Xen). Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25drm/nouveau/volt/pwm/gk104: fix an off-by-one resulting in the voltage not ↵Martin Peres1-1/+1
being set Reported-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Martin Peres <martin.peres@free.fr>
2015-11-25drm/nouveau/nvif: allow userspace access to its own client objectBen Skeggs2-2/+7
Regression from "abi16: implement limited interoperability with usif/nvif". Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25drm/nouveau/gr/gf100-: fix oops when calling zbc methodsBen Skeggs1-2/+2
Somehow missed these two when removing dodgy void casts during the rework. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25drm/nouveau/gr/gf117-: assume no PPC if NV_PGRAPH_GPC_GPM_PD_PES_TPC_ID_MASK ↵Ben Skeggs3-0/+5
is zero fdo#92761 Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25drm/nouveau/gr/gf117-: read NV_PGRAPH_GPC_GPM_PD_PES_TPC_ID_MASK from ↵Ben Skeggs6-897/+897
correct GPC Each GPCCS unit was reading the mask from GPC0, which causes problems on boards where some GPCs are missing PPCs. Part of the fix for fdo#92761. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25drm/nouveau/gr/gf100-: split out per-gpc address calculation macroBen Skeggs2-47/+49
There's a few places where we need to access a GPC register from ucode, but outside of the falcon's io address space. To do this we need to calculate the offset based on which GPC we're executing on. This used to be done manually, but we've since found a "base" offset that can be added by the hardware. To use this, an extra bit needs to be set in the register address, which is what this macro achieves. There should be no functional change from this commit. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25drm/nouveau/bios: return actual size of the buffer retrieved via _ROMBen Skeggs1-0/+1
Fixes detection of a failed attempt at fetching the entire ROM image in one-shot (a violation of the spec, that works a lot of the time). Tested on a HP Zbook 15 G2. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25drm/nouveau/instmem: protect instobj list with a spinlockBen Skeggs2-0/+6
No locking is required for the traversal of this list, as it only happens during suspend/resume where nothing else can be executing. Fixes some of the issues noticed during parallel piglit runs. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25drm/nouveau/pci: enable c800 magic for some unknown Samsung laptopBen Skeggs1-1/+7
fdo#70354 - comment #88. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25drm/nouveau/pci: enable c800 magic for Clevo P157SMKarol Herbst1-1/+7
this is needed for my gpu Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25block: fix blk_abort_request for blk-mq driversChristoph Hellwig1-3/+5
We only added the request to the request list for the !blk-mq case, so we should only delete it in that case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-25nvme: add missing unmaps in nvme_queue_rqChristoph Hellwig1-3/+12
When we fail various metadata related operations in nvme_queue_rq we need to unmap the data SGL. Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-25NVMe: default to 4k device page sizeNishanth Aravamudan1-9/+6
We received a bug report recently when DDW (64-bit direct DMA on Power) is not enabled for NVMe devices. In that case, we fall back to 32-bit DMA via the IOMMU, which is always done via 4K TCEs (Translation Control Entries). The NVMe device driver, though, assumes that the DMA alignment for the PRP entries will match the device's page size, and that the DMA aligment matches the kernel's page aligment. On Power, the the IOMMU page size, as mentioned above, can be 4K, while the device can have a page size of 8K, while the kernel has a page size of 64K. This eventually trips the BUG_ON in nvme_setup_prps(), as we have a 'dma_len' that is a multiple of 4K but not 8K (e.g., 0xF000). In this particular case of page sizes, we clearly want to use the IOMMU's page size in the driver. And generally, the NVMe driver in this function should be using the IOMMU's page size for the default device page size, rather than the kernel's page size. There is not currently an API to obtain the IOMMU's page size across all architectures and in the interest of a stop-gap fix to this functional issue, default the NVMe device page size to 4K, with the intent of adding such an API and implementation across all architectures in the next merge window. With the functionally equivalent v3 of this patch, our hardware test exerciser survives when using 32-bit DMA; without the patch, the kernel will BUG within a few minutes. Signed-off-by: Nishanth Aravamudan <nacc at linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-24Merge tag 'dm-4.4-fixes' of ↵Linus Torvalds4-29/+36
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: "Two fixes for 4.4-rc1's DM ioctl changes that introduced the potential for infinite recursion on ioctl (with DM multipath). And four stable fixes: - A DM thin-provisioning fix to restore 'error_if_no_space' setting when a thin-pool is made writable again (after having been out of space). - A DM thin-provisioning fix to properly advertise discard support for thin volumes that are stacked on a thin-pool whose underlying data device doesn't support discards. - A DM ioctl fix to allow ctrl-c to break out of an ioctl retry loop when DM multipath is configured to 'queue_if_no_path'. - A DM crypt fix for a possible hang on dm-crypt device removal" * tag 'dm-4.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm thin: fix regression in advertised discard limits dm crypt: fix a possible hang due to race condition on exit dm mpath: fix infinite recursion in ioctl when no paths and !queue_if_no_path dm: do not reuse dm_blk_ioctl block_device input as local variable dm: fix ioctl retry termination with signal dm thin: restore requested 'error_if_no_space' setting on OODS to WRITE transition
2015-11-24pidns: fix NULL dereference in __task_pid_nr_ns()Eric Dumazet1-2/+2
I got a crash during a "perf top" session that was caused by a race in __task_pid_nr_ns() : pid_nr_ns() was inlined, but apparently compiler chose to read task->pids[type].pid twice, and the pid->level dereference crashed because we got a NULL pointer at the second read : if (pid && ns->level <= pid->level) { // CRASH Just use RCU API properly to solve this race, and not worry about "perf top" crashing hosts :( get_task_pid() can benefit from same fix. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-24Merge tag 'kvm-arm-for-v4.4-rc3' of ↵Paolo Bonzini345-5556/+4249
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/ARM Fixes for v4.4-rc3. Includes some timer fixes, properly unmapping PTEs, an errata fix, and two tweaks to the EL2 panic code.
2015-11-24Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds17-244/+525
Pull block layer fixes from Jens Axboe: "A round of fixes/updates for the current series. This looks a little bigger than it is, but that's mainly because we pushed the lightnvm enabled null_blk change out of the merge window so it could be updated a bit. The rest of the volume is also mostly lightnvm. In particular: - Lightnvm. Various fixes, additions, updates from Matias and Javier, as well as from Wenwei Tao. - NVMe: - Fix for potential arithmetic overflow from Keith. - Also from Keith, ensure that we reap pending completions from a completion queue before deleting it. Fixes kernel crashes when resetting a device with IO pending. - Various little lightnvm related tweaks from Matias. - Fixup flushes to go through the IO scheduler, for the cases where a flush is not required. Fixes a case in CFQ where we would be idling and not see this request, hence not break the idling. From Jan Kara. - Use list_{first,prev,next} in elevator.c for cleaner code. From Gelian Tang. - Fix for a warning trigger on btrfs and raid on single queue blk-mq devices, where we would flush plug callbacks with preemption disabled. From me. - A mac partition validation fix from Kees Cook. - Two merge fixes from Ming, marked stable. A third part is adding a new warning so we'll notice this quicker in the future, if we screw up the accounting. - Cleanup of thread name/creation in mtip32xx from Rasmus Villemoes" * 'for-linus' of git://git.kernel.dk/linux-block: (32 commits) blk-merge: warn if figured out segment number is bigger than nr_phys_segments blk-merge: fix blk_bio_segment_split block: fix segment split blk-mq: fix calling unplug callbacks with preempt disabled mac: validate mac_partition is within sector mtip32xx: use formatting capability of kthread_create_on_node NVMe: reap completion entries when deleting queue lightnvm: add free and bad lun info to show luns lightnvm: keep track of block counts nvme: lightnvm: use admin queues for admin cmds lightnvm: missing free on init error lightnvm: wrong return value and redundant free null_blk: do not del gendisk with lightnvm null_blk: use device addressing mode null_blk: use ppa_cache pool NVMe: Fix possible arithmetic overflow for max segments blk-flush: Queue through IO scheduler when flush not required null_blk: register as a LightNVM device elevator: use list_{first,prev,next}_entry lightnvm: cleanup queue before target removal ...
2015-11-24drm/radeon: make rv770_set_sw_state failures non-fatalAlex Deucher1-1/+1
On some cards it takes a relatively long time for the change to take place. Make a timeout non-fatal. bug: https://bugs.freedesktop.org/show_bug.cgi?id=76130 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2015-11-24arm64: kvm: report original PAR_EL1 upon panicMark Rutland1-1/+5
If we call __kvm_hyp_panic while a guest context is active, we call __restore_sysregs before acquiring the system register values for the panic, in the process throwing away the PAR_EL1 value at the point of the panic. This patch modifies __kvm_hyp_panic to stash the PAR_EL1 value prior to restoring host register values, enabling us to report the original values at the point of the panic. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24arm64: kvm: avoid %p in __kvm_hyp_panicMark Rutland1-1/+1
Currently __kvm_hyp_panic uses %p for values which are not pointers, such as the ESR value. This can confusingly lead to "(null)" being printed for the value. Use %x instead, and only use %p for host pointers. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24KVM: arm/arm64: vgic: Trust the LR state for HW IRQsChristoffer Dall1-14/+2
We were probing the physial distributor state for the active state of a HW virtual IRQ, because we had seen evidence that the LR state was not cleared when the guest deactivated a virtual interrupted. However, this issue turned out to be a software bug in the GIC, which was solved by: 84aab5e68c2a5e1e18d81ae8308c3ce25d501b29 (KVM: arm/arm64: arch_timer: Preserve physical dist. active state on LR.active, 2015-11-24) Therefore, get rid of the complexities and just look at the LR. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24KVM: arm/arm64: arch_timer: Preserve physical dist. active state on LR.activeChristoffer Dall3-24/+40
We were incorrectly removing the active state from the physical distributor on the timer interrupt when the timer output level was deasserted. We shouldn't be doing this without considering the virtual interrupt's active state, because the architecture requires that when an LR has the HW bit set and the pending or active bits set, then the physical interrupt must also have the corresponding bits set. This addresses an issue where we have been observing an inconsistency between the LR state and the physical distributor state where the LR state was active and the physical distributor was not active, which shouldn't happen. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24KVM: arm/arm64: Fix preemptible timer active state crazynessChristoffer Dall1-6/+1
We were setting the physical active state on the GIC distributor in a preemptible section, which could cause us to set the active state on different physical CPU from the one we were actually going to run on, hacoc ensues. Since we are no longer descheduling/scheduling soft timers in the flush/sync timer functions, simply moving the timer flush into a non-preemptible section. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24arm64: KVM: Add workaround for Cortex-A57 erratum 834220Marc Zyngier4-1/+38
Cortex-A57 parts up to r1p2 can misreport Stage 2 translation faults when a Stage 1 permission fault or device alignment fault should have been reported. This patch implements the workaround (which is to validate that the Stage-1 translation actually succeeds) by using code patching. Cc: stable@vger.kernel.org Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24arm64: KVM: Fix AArch32 to AArch64 register mappingMarc Zyngier2-4/+6
When running a 32bit guest under a 64bit hypervisor, the ARMv8 architecture defines a mapping of the 32bit registers in the 64bit space. This includes banked registers that are being demultiplexed over the 64bit ones. On exceptions caused by an operation involving a 32bit register, the HW exposes the register number in the ESR_EL2 register. It was so far understood that SW had to distinguish between AArch32 and AArch64 accesses (based on the current AArch32 mode and register number). It turns out that I misinterpreted the ARM ARM, and the clue is in D1.20.1: "For some exceptions, the exception syndrome given in the ESR_ELx identifies one or more register numbers from the issued instruction that generated the exception. Where the exception is taken from an Exception level using AArch32 these register numbers give the AArch64 view of the register." Which means that the HW is already giving us the translated version, and that we shouldn't try to interpret it at all (for example, doing an MMIO operation from the IRQ mode using the LR register leads to very unexpected behaviours). The fix is thus not to perform a call to vcpu_reg32() at all from vcpu_reg(), and use whatever register number is supplied directly. The only case we need to find out about the mapping is when we actively generate a register access, which only occurs when injecting a fault in a guest. Cc: stable@vger.kernel.org Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24ARM/arm64: KVM: test properly for a PTE's uncachednessArd Biesheuvel1-8/+7
The open coded tests for checking whether a PTE maps a page as uncached use a flawed '(pte_val(xxx) & CONST) != CONST' pattern, which is not guaranteed to work since the type of a mapping is not a set of mutually exclusive bits For HYP mappings, the type is an index into the MAIR table (i.e, the index itself does not contain any information whatsoever about the type of the mapping), and for stage-2 mappings it is a bit field where normal memory and device types are defined as follows: #define MT_S2_NORMAL 0xf #define MT_S2_DEVICE_nGnRE 0x1 I.e., masking *and* comparing with the latter matches on the former, and we have been getting lucky merely because the S2 device mappings also have the PTE_UXN bit set, or we would misidentify memory mappings as device mappings. Since the unmap_range() code path (which contains one instance of the flawed test) is used both for HYP mappings and stage-2 mappings, and considering the difference between the two, it is non-trivial to fix this by rewriting the tests in place, as it would involve passing down the type of mapping through all the functions. However, since HYP mappings and stage-2 mappings both deal with host physical addresses, we can simply check whether the mapping is backed by memory that is managed by the host kernel, and only perform the D-cache maintenance if this is the case. Cc: stable@vger.kernel.org Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Pavel Fedin <p.fedin@samsung.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24blk-merge: warn if figured out segment number is bigger than nr_phys_segmentsMing Lei1-0/+6
We had seen lots of reports of this kind issue, so add one warnning in blk-merge, then it can be triggered easily and avoid to depend on warning/bug from drivers. Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-24blk-merge: fix blk_bio_segment_splitMing Lei1-3/+19
Commit bdced438acd83a(block: setup bi_phys_segments after splitting) introduces function of computing bio->bi_phys_segments during bio splitting. Unfortunately both bio->bi_seg_front_size and bio->bi_seg_back_size arn't computed, so too many physical segments may be obtained for one request since both the two are used to check if one segment across two bios can be possible. This patch fixes the issue by computing the two variables in blk_bio_segment_split(). Fixes: bdced438acd83a(block: setup bi_phys_segments after splitting) Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reported-by: Mark Salter <msalter@redhat.com> Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Tested-by: Mark Salter <msalter@redhat.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-24block: fix segment splitMing Lei1-2/+2
Inside blk_bio_segment_split(), previous bvec pointer(bvprvp) always points to the iterator local variable, which is obviously wrong, so fix it by pointing to the local variable of 'bvprv'. Fixes: 5014c311baa2b(block: fix bogus compiler warnings in blk-merge.c) Cc: stable@kernel.org #4.3 Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reported-by: Mark Salter <msalter@redhat.com> Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Tested-by: Mark Salter <msalter@redhat.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>