summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/i915_memcpy.c
AgeCommit message (Collapse)AuthorFilesLines
2020-04-08drm/i915: remove always-defined CONFIG_AS_MOVNTDQAMasahiro Yamada1-5/+0
CONFIG_AS_MOVNTDQA was introduced by commit 0b1de5d58e19 ("drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory"). We raise the minimal supported binutils version from time to time. The last bump was commit 1fb12b35e5ff ("kbuild: Raise the minimum required binutils version to 2.21"). I confirmed the code in $(call as-instr,...) can be assembled by the binutils 2.21 assembler and also by LLVM integrated assembler. Remove CONFIG_AS_MOVNTDQA, which is always defined. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Jani Nikula <jani.nikula@intel.com>
2019-12-12drm/i915: Align start for memcpy_from_wcChris Wilson1-4/+71
The movntqda requires 16-byte alignment for the source pointer. Avoid falling back to clflush if the source pointer is misaligned by doing the doing a small uncached memcpy to fixup the alignments. v2: Turn the unaligned copy into a genuine helper Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191211110437.4082687-5-chris@chris-wilson.co.uk
2019-08-09drm/i915: extract i915_memcpy.h from i915_drv.hJani Nikula1-1/+1
It used to be handy that we only had a couple of headers, but over time i915_drv.h has become unwieldy. Extract declarations to a separate header file corresponding to the implementation module, clarifying the modularity of the driver. Ensure the new header is self-contained, and do so with minimal further includes, using forward declarations as needed. Include the new header only where needed, and sort the modified include directives while at it and as needed. No functional changes. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/f2b887002150acdf218385ea846f7aa617aa5f15.1565271681.git.jani.nikula@intel.com
2017-12-22drm/i915: Do not enable movntdqa optimization in hypervisor guestChangbin Du1-1/+6
Our QA reported a problem caused by movntdqa instructions. Currently, the KVM hypervisor doesn't support VEX-prefix instructions emulation. If users passthrough a GPU to guest with vfio option 'x-no-mmap=on', then all access to the BARs will be trapped and emulated. The KVM hypervisor would raise an inertal error to qemu which cause the guest killed. (Since 'movntdqa' ins is not supported.) This patch try not to enable movntdqa optimization if the driver is running in hypervisor guest. Signed-off-by: Changbin Du <changbin.du@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1513924309-3113-1-git-send-email-changbin.du@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-08-17drm/i915: Mark the static key for movntqda as staticChris Wilson1-1/+1
Silence sparse who warns that the global variable is not declared static. Fixes: 0b1de5d58e19 ("drm/i915: Use SSE4.1 movntdqa to ...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: Damien Lespiau <damien.lespiau@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471432146-5196-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-08-12drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memoryChris Wilson1-0/+101
This patch provides the infrastructure for performing a 16-byte aligned read from WC memory using non-temporal instructions introduced with sse4.1. Using movntdqa we can bypass the CPU caches and read directly from memory and ignoring the page attributes set on the CPU PTE i.e. negating the impact of an otherwise UC access. Copying using movntdqa from WC is almost as fast as reading from WB memory, modulo the possibility of both hitting the CPU cache or leaving the data in the CPU cache for the next consumer. (The CPU cache itself my be flushed for the region of the movntdqa and on later access the movntdqa reads from a separate internal buffer for the cacheline.) The write back to the memory is however cached. This will be used in later patches to accelerate accessing WC memory. v2: Report whether the accelerated copy is successful/possible. v3: Function alignment override was only necessary when using the function target("sse4.1") - which is not necessary for emitting movntdqa from __asm__. v4: Improve notes on CPU cache behaviour vs non-temporal stores. v5: Fix byte offsets for unrolled moves. v6: Find all remaining typos of "movntqda", use kernel_fpu_begin. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: Damien Lespiau <damien.lespiau@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-2-git-send-email-chris@chris-wilson.co.uk