Age | Commit message (Collapse) | Author | Files | Lines |
|
In order to re-write Jcc.d32 instructions text_poke_bp() needs to be
taught about them.
The biggest hurdle is that the whole machinery is currently made for 5
byte instructions and extending this would grow struct text_poke_loc
which is currently a nice 16 bytes and used in an array.
However, since text_poke_loc contains a full copy of the (s32)
displacement, it is possible to map the Jcc.d32 2 byte opcodes to
Jcc.d8 1 byte opcode for the int3 emulation.
This then leaves the replacement bytes; fudge that by only storing the
last 5 bytes and adding the rule that 'length == 6' instruction will
be prefixed with a 0x0f byte.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20230123210607.115718513@infradead.org
|
|
Add a struct alt_instr.flags field which will contain different flags
controlling alternatives patching behavior.
The initial idea was to be able to specify it as a separate macro
parameter but that would mean touching all possible invocations of the
alternatives macros and thus a lot of churn.
What is more, as PeterZ suggested, being able to say ALT_NOT(feature) is
very readable and explains exactly what is meant.
So make the feature field a u32 where the patching flags are the upper
u16 part of the dword quantity while the lower u16 word is the feature.
The highest feature number currently is 0x26a (i.e., word 19) so there
is plenty of space. If that becomes insufficient, the field can be
extended to u64 which will then make struct alt_instr of the nice size
of 16 bytes (14 bytes currently).
There should be no functional changes resulting from this.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/Y6RCoJEtxxZWwotd@zn.tnic
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Dave Hansen:
"New Feature:
- Randomize the per-cpu entry areas
Cleanups:
- Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open coding it
- Move to "native" set_memory_rox() helper
- Clean up pmd_get_atomic() and i386-PAE
- Remove some unused page table size macros"
* tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
x86/mm: Ensure forced page table splitting
x86/kasan: Populate shadow for shared chunk of the CPU entry area
x86/kasan: Add helpers to align shadow addresses up and down
x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten names
x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area
x86/mm: Recompute physical address for every page of per-CPU CEA mapping
x86/mm: Rename __change_page_attr_set_clr(.checkalias)
x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()
x86/mm: Untangle __change_page_attr_set_clr(.checkalias)
x86/mm: Add a few comments
x86/mm: Fix CR3_ADDR_MASK
x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros
mm: Convert __HAVE_ARCH_P..P_GET to the new style
mm: Remove pointless barrier() after pmdp_get_lockless()
x86/mm/pae: Get rid of set_64bit()
x86_64: Remove pointless set_64bit() usage
x86/mm/pae: Be consistent with pXXp_get_and_clear()
x86/mm/pae: Use WRITE_ONCE()
x86/mm/pae: Don't (ab)use atomic64
mm/gup: Fix the lockless PMD access
...
|
|
Now that text_poke is available before ftrace, remove the
SYSTEM_BOOTING exceptions.
Specifically, this cures a W+X case during boot.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201057.945960823@infradead.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Borislav Petkov:
- Add the call depth tracking mitigation for Retbleed which has been
long in the making. It is a lighterweight software-only fix for
Skylake-based cores where enabling IBRS is a big hammer and causes a
significant performance impact.
What it basically does is, it aligns all kernel functions to 16 bytes
boundary and adds a 16-byte padding before the function, objtool
collects all functions' locations and when the mitigation gets
applied, it patches a call accounting thunk which is used to track
the call depth of the stack at any time.
When that call depth reaches a magical, microarchitecture-specific
value for the Return Stack Buffer, the code stuffs that RSB and
avoids its underflow which could otherwise lead to the Intel variant
of Retbleed.
This software-only solution brings a lot of the lost performance
back, as benchmarks suggest:
https://lore.kernel.org/all/20220915111039.092790446@infradead.org/
That page above also contains a lot more detailed explanation of the
whole mechanism
- Implement a new control flow integrity scheme called FineIBT which is
based on the software kCFI implementation and uses hardware IBT
support where present to annotate and track indirect branches using a
hash to validate them
- Other misc fixes and cleanups
* tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits)
x86/paravirt: Use common macro for creating simple asm paravirt functions
x86/paravirt: Remove clobber bitmask from .parainstructions
x86/debug: Include percpu.h in debugreg.h to get DECLARE_PER_CPU() et al
x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit
x86/Kconfig: Enable kernel IBT by default
x86,pm: Force out-of-line memcpy()
objtool: Fix weak hole vs prefix symbol
objtool: Optimize elf_dirty_reloc_sym()
x86/cfi: Add boot time hash randomization
x86/cfi: Boot time selection of CFI scheme
x86/ibt: Implement FineIBT
objtool: Add --cfi to generate the .cfi_sites section
x86: Add prefix symbols for function padding
objtool: Add option to generate prefix symbols
objtool: Avoid O(bloody terrible) behaviour -- an ode to libelf
objtool: Slice up elf_create_section_symbol()
kallsyms: Revert "Take callthunks into account"
x86: Unconfuse CONFIG_ and X86_FEATURE_ namespaces
x86/retpoline: Fix crash printing warning
x86/paravirt: Fix a !PARAVIRT build warning
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Thomas Gleixner:
"A set of x86 cleanups:
- Rework the handling of x86_regset for 32 and 64 bit.
The original implementation tried to minimize the allocation size
with quite some hard to understand and fragile tricks. Make it
robust and straight forward by separating the register enumerations
for 32 and 64 bit completely.
- Add a few missing static annotations
- Remove the stale unused setup_once() assembly function
- Address a few minor static analysis and kernel-doc warnings"
* tag 'x86-cleanups-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm/32: Remove setup_once()
x86/kaslr: Fix process_mem_region()'s return value
x86: Fix misc small issues
x86/boot: Repair kernel-doc for boot_kstrtoul()
x86: Improve formatting of user_regset arrays
x86: Separate out x86_regset for 32 and 64 bit
x86/i8259: Make default_legacy_pic static
x86/tsc: Make art_related_clocksource static
|
|
Due to the explicit 'noinline' GCC-7.3 is not able to optimize away the
argument setup of:
apply_ibt_endbr(__ibt_endbr_seal, __ibt_enbr_seal_end);
even when X86_KERNEL_IBT=n and the function is an empty stub, which leads
to link errors due to missing __ibt_endbr_seal* symbols:
ld: arch/x86/kernel/alternative.o: in function `alternative_instructions':
alternative.c:(.init.text+0x15d): undefined reference to `__ibt_endbr_seal_end'
ld: alternative.c:(.init.text+0x164): undefined reference to `__ibt_endbr_seal'
Remove the explicit 'noinline' to help gcc optimize them away.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20221011113803.956808-1-linmiaohe@huawei.com
|
|
Fix:
./arch/x86/kernel/traps.c: asm/proto.h is included more than once.
./arch/x86/kernel/alternative.c:1610:2-3: Unneeded semicolon.
[ bp: Merge into a single patch. ]
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/1620902768-53822-1-git-send-email-jiapeng.chong@linux.alibaba.com
Link: https://lore.kernel.org/r/20220926054628.116957-1-jiapeng.chong@linux.alibaba.com
|
|
In order to avoid known hashes (from knowing the boot image),
randomize the CFI hashes with a per-boot random seed.
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221027092842.765195516@infradead.org
|
|
Add the "cfi=" boot parameter to allow people to select a CFI scheme
at boot time. Mostly useful for development / debugging.
Requested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221027092842.699804264@infradead.org
|
|
Implement an alternative CFI scheme that merges both the fine-grained
nature of kCFI but also takes full advantage of the coarse grained
hardware CFI as provided by IBT.
To contrast:
kCFI is a pure software CFI scheme and relies on being able to read
text -- specifically the instruction *before* the target symbol, and
does the hash validation *before* doing the call (otherwise control
flow is compromised already).
FineIBT is a software and hardware hybrid scheme; by ensuring every
branch target starts with a hash validation it is possible to place
the hash validation after the branch. This has several advantages:
o the (hash) load is avoided; no memop; no RX requirement.
o IBT WAIT-FOR-ENDBR state is a speculation stop; by placing
the hash validation in the immediate instruction after
the branch target there is a minimal speculation window
and the whole is a viable defence against SpectreBHB.
o Kees feels obliged to mention it is slightly more vulnerable
when the attacker can write code.
Obviously this patch relies on kCFI, but additionally it also relies
on the padding from the call-depth-tracking patches. It uses this
padding to place the hash-validation while the call-sites are
re-written to modify the indirect target to be 16 bytes in front of
the original target, thus hitting this new preamble.
Notably, there is no hardware that needs call-depth-tracking (Skylake)
and supports IBT (Tigerlake and onwards).
Suggested-by: Joao Moreira (Intel) <joao@overdrivepizza.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221027092842.634714496@infradead.org
|
|
The first argument of WARN() is a condition, so this will use "addr"
as the format string and possibly crash.
Fixes: 3b6c1747da48 ("x86/retpoline: Add SKL retthunk retpolines")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/Y1gBoUZrRK5N%2FlCB@kili/
|
|
Ensure that retpolines do the proper call accounting so that the return
accounting works correctly.
Specifically; retpolines are used to replace both 'jmp *%reg' and
'call *%reg', however these two cases do not have the same accounting
requirements. Therefore split things up and provide two different
retpoline arrays for SKL.
The 'jmp *%reg' case needs no accounting, the
__x86_indirect_jump_thunk_array[] covers this. The retpoline is
changed to not use the return thunk; it's a simple call;ret construct.
[ strictly speaking it should do:
andq $(~0x1f), PER_CPU_VAR(__x86_call_depth)
but we can argue this can be covered by the fuzz we already have
in the accounting depth (12) vs the RSB depth (16) ]
The 'call *%reg' case does need accounting, the
__x86_indirect_call_thunk_array[] covers this. Again, this retpoline
avoids the use of the return-thunk, in this case to avoid double
accounting.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220915111147.996634749@infradead.org
|
|
In preparation for call depth tracking on Intel SKL CPUs, make it possible
to patch in a SKL specific return thunk.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220915111147.680469665@infradead.org
|
|
Mitigating the Intel SKL RSB underflow issue in software requires to
track the call depth. That is every CALL and every RET need to be
intercepted and additional code injected.
The existing retbleed mitigations already include means of redirecting
RET to __x86_return_thunk; this can be re-purposed and RET can be
redirected to another function doing RET accounting.
CALL accounting will use the function padding introduced in prior
patches. For each CALL instruction, the destination symbol's padding
is rewritten to do the accounting and the CALL instruction is adjusted
to call into the padding.
This ensures only affected CPUs pay the overhead of this accounting.
Unaffected CPUs will leave the padding unused and have their 'JMP
__x86_return_thunk' replaced with an actual 'RET' instruction.
Objtool has been modified to supply a .call_sites section that lists
all the 'CALL' instructions. Additionally the paravirt instruction
sites are iterated since they will have been patched from an indirect
call to direct calls (or direct instructions in which case it'll be
ignored).
Module handling and the actual thunk code for SKL will be added in
subsequent steps.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220915111147.470877038@infradead.org
|
|
The upcoming call thunk patching must hold text_mutex and needs access to
text_poke_copy(), which takes text_mutex.
Provide a _locked postfixed variant to expose the inner workings.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220915111147.159977224@infradead.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core fixes from Borislav Petkov:
- Make sure an INT3 is slapped after every unconditional retpoline JMP
as both vendors suggest
- Clean up pciserial a bit
* tag 'x86_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86,retpoline: Be sure to emit INT3 after JMP *%\reg
x86/earlyprintk: Clean up pciserial
|
|
I encountered some occasional crashes of poke_int3_handler() when
kprobes are set, while accessing desc->vec.
The text poke mechanism claims to have an RCU-like behavior, but it
does not appear that there is any quiescent state to ensure that
nobody holds reference to desc. As a result, the following race
appears to be possible, which can lead to memory corruption.
CPU0 CPU1
---- ----
text_poke_bp_batch()
-> smp_store_release(&bp_desc, &desc)
[ notice that desc is on
the stack ]
poke_int3_handler()
[ int3 might be kprobe's
so sync events are do not
help ]
-> try_get_desc(descp=&bp_desc)
desc = __READ_ONCE(bp_desc)
if (!desc) [false, success]
WRITE_ONCE(bp_desc, NULL);
atomic_dec_and_test(&desc.refs)
[ success, desc space on the stack
is being reused and might have
non-zero value. ]
arch_atomic_inc_not_zero(&desc->refs)
[ might succeed since desc points to
stack memory that was freed and might
be reused. ]
Fix this issue with small backportable patch. Instead of trying to
make RCU-like behavior for bp_desc, just eliminate the unnecessary
level of indirection of bp_desc, and hold the whole descriptor as a
global. Anyhow, there is only a single descriptor at any given
moment.
Fixes: 1f676247f36a4 ("x86/alternatives: Implement a better poke_int3_handler() completion scheme")
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@kernel.org
Link: https://lkml.kernel.org/r/20220920224743.3089-1-namit@vmware.com
|
|
Both AMD and Intel recommend using INT3 after an indirect JMP. Make sure
to emit one when rewriting the retpoline JMP irrespective of compiler
SLS options or even CONFIG_SLS.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Link: https://lkml.kernel.org/r/Yxm+QkFPOhrVSH6q@hirez.programming.kicks-ass.net
|
|
Debugging missing return thunks is easier if we can see where they're
happening.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Ys66hwtFcGbYmoiZ@hirez.programming.kicks-ass.net/
|
|
Do fine-grained Kconfig for all the various retbleed parts.
NOTE: if your compiler doesn't support return thunks this will
silently 'upgrade' your mitigation to IBPB, you might not like this.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
In addition to teaching static_call about the new way to spell 'RET',
there is an added complication in that static_call() is allowed to
rewrite text before it is known which particular spelling is required.
In order to deal with this; have a static_call specific fixup in the
apply_return() 'alternative' patching routine that will rewrite the
static_call trampoline to match the definite sequence.
This in turn creates the problem of uniquely identifying static call
trampolines. Currently trampolines are 8 bytes, the first 5 being the
jmp.d32/ret sequence and the final 3 a byte sequence that spells out
'SCT'.
This sequence is used in __static_call_validate() to ensure it is
patching a trampoline and not a random other jmp.d32. That is,
false-positives shouldn't be plenty, but aren't a big concern.
OTOH the new __static_call_fixup() must not have false-positives, and
'SCT' decodes to the somewhat weird but semi plausible sequence:
push %rbx
rex.XB push %r12
Additionally, there are SLS concerns with immediate jumps. Combined it
seems like a good moment to change the signature to a single 3 byte
trap instruction that is unique to this usage and will not ever get
generated by accident.
As such, change the signature to: '0x0f, 0xb9, 0xcc', which decodes
to:
ud1 %esp, %ecx
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Introduce X86_FEATURE_RETHUNK for those afflicted with needing this.
[ bp: Do only INT3 padding - simpler. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core
----
- Support TCPv6 segmentation offload with super-segments larger than
64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP).
- Generalize skb freeing deferral to per-cpu lists, instead of
per-socket lists.
- Add a netdev statistic for packets dropped due to L2 address
mismatch (rx_otherhost_dropped).
- Continue work annotating skb drop reasons.
- Accept alternative netdev names (ALT_IFNAME) in more netlink
requests.
- Add VLAN support for AF_PACKET SOCK_RAW GSO.
- Allow receiving skb mark from the socket as a cmsg.
- Enable memcg accounting for veth queues, sysctl tables and IPv6.
BPF
---
- Add libbpf support for User Statically-Defined Tracing (USDTs).
- Speed up symbol resolution for kprobes multi-link attachments.
- Support storing typed pointers to referenced and unreferenced
objects in BPF maps.
- Add support for BPF link iterator.
- Introduce access to remote CPU map elements in BPF per-cpu map.
- Allow middle-of-the-road settings for the
kernel.unprivileged_bpf_disabled sysctl.
- Implement basic types of dynamic pointers e.g. to allow for
dynamically sized ringbuf reservations without extra memory copies.
Protocols
---------
- Retire port only listening_hash table, add a second bind table
hashed by port and address. Avoid linear list walk when binding to
very popular ports (e.g. 443).
- Add bridge FDB bulk flush filtering support allowing user space to
remove all FDB entries matching a condition.
- Introduce accept_unsolicited_na sysctl for IPv6 to implement
router-side changes for RFC9131.
- Support for MPTCP path manager in user space.
- Add MPTCP support for fallback to regular TCP for connections that
have never connected additional subflows or transmitted
out-of-sequence data (partial support for RFC8684 fallback).
- Avoid races in MPTCP-level window tracking, stabilize and improve
throughput.
- Support lockless operation of GRE tunnels with seq numbers enabled.
- WiFi support for host based BSS color collision detection.
- Add support for SO_TXTIME/SCM_TXTIME on CAN sockets.
- Support transmission w/o flow control in CAN ISOTP (ISO 15765-2).
- Support zero-copy Tx with TLS 1.2 crypto offload (sendfile).
- Allow matching on the number of VLAN tags via tc-flower.
- Add tracepoint for tcp_set_ca_state().
Driver API
----------
- Improve error reporting from classifier and action offload.
- Add support for listing line cards in switches (devlink).
- Add helpers for reporting page pool statistics with ethtool -S.
- Add support for reading clock cycles when using PTP virtual clocks,
instead of having the driver convert to time before reporting. This
makes it possible to report time from different vclocks.
- Support configuring low-latency Tx descriptor push via ethtool.
- Separate Clause 22 and Clause 45 MDIO accesses more explicitly.
New hardware / drivers
----------------------
- Ethernet:
- Marvell's Octeon NIC PCI Endpoint support (octeon_ep)
- Sunplus SP7021 SoC (sp7021_emac)
- Add support for Renesas RZ/V2M (in ravb)
- Add support for MediaTek mt7986 switches (in mtk_eth_soc)
- Ethernet PHYs:
- ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting)
- TI DP83TD510 PHY
- Microchip LAN8742/LAN88xx PHYs
- WiFi:
- Driver for pureLiFi X, XL, XC devices (plfxlc)
- Driver for Silicon Labs devices (wfx)
- Support for WCN6750 (in ath11k)
- Support Realtek 8852ce devices (in rtw89)
- Mobile:
- MediaTek T700 modems (Intel 5G 5000 M.2 cards)
- CAN:
- ctucanfd: add support for CTU CAN FD open-source IP core from
Czech Technical University in Prague
Drivers
-------
- Delete a number of old drivers still using virt_to_bus().
- Ethernet NICs:
- intel: support TSO on tunnels MPLS
- broadcom: support multi-buffer XDP
- nfp: support VF rate limiting
- sfc: use hardware tx timestamps for more than PTP
- mlx5: multi-port eswitch support
- hyper-v: add support for XDP_REDIRECT
- atlantic: XDP support (including multi-buffer)
- macb: improve real-time perf by deferring Tx processing to NAPI
- High-speed Ethernet switches:
- mlxsw: implement basic line card information querying
- prestera: add support for traffic policing on ingress and egress
- Embedded Ethernet switches:
- lan966x: add support for packet DMA (FDMA)
- lan966x: add support for PTP programmable pins
- ti: cpsw_new: enable bc/mc storm prevention
- Qualcomm 802.11ax WiFi (ath11k):
- Wake-on-WLAN support for QCA6390 and WCN6855
- device recovery (firmware restart) support
- support setting Specific Absorption Rate (SAR) for WCN6855
- read country code from SMBIOS for WCN6855/QCA6390
- enable keep-alive during WoWLAN suspend
- implement remain-on-channel support
- MediaTek WiFi (mt76):
- support Wireless Ethernet Dispatch offloading packet movement
between the Ethernet switch and WiFi interfaces
- non-standard VHT MCS10-11 support
- mt7921 AP mode support
- mt7921 IPv6 NS offload support
- Ethernet PHYs:
- micrel: ksz9031/ksz9131: cabletest support
- lan87xx: SQI support for T1 PHYs
- lan937x: add interrupt support for link detection"
* tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1809 commits)
ptp: ocp: Add firmware header checks
ptp: ocp: fix PPS source selector debugfs reporting
ptp: ocp: add .init function for sma_op vector
ptp: ocp: vectorize the sma accessor functions
ptp: ocp: constify selectors
ptp: ocp: parameterize input/output sma selectors
ptp: ocp: revise firmware display
ptp: ocp: add Celestica timecard PCI ids
ptp: ocp: Remove #ifdefs around PCI IDs
ptp: ocp: 32-bit fixups for pci start address
Revert "net/smc: fix listen processing for SMC-Rv2"
ath6kl: Use cc-disable-warning to disable -Wdangling-pointer
selftests/bpf: Dynptr tests
bpf: Add dynptr data slices
bpf: Add bpf_dynptr_read and bpf_dynptr_write
bpf: Dynptr support for ring buffers
bpf: Add bpf_dynptr_from_mem for local dynptrs
bpf: Add verifier support for dynptrs
bpf: Suppress 'passing zero to PTR_ERR' warning
bpf: Introduce bpf_arch_text_invalidate for bpf_prog_pack
...
|
|
Introduce a memset like API for text_poke. This will be used to fill the
unused RX memory with illegal instructions.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20220520235758.1858153-3-song@kernel.org
|
|
Now that stack validation is an optional feature of objtool, add
CONFIG_OBJTOOL and replace most usages of CONFIG_STACK_VALIDATION with
it.
CONFIG_STACK_VALIDATION can now be considered to be frame-pointer
specific. CONFIG_UNWINDER_ORC is already inherently valid for live
patching, so no need to "validate" it.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/939bf3d85604b2a126412bf11af6e3bd3b872bcb.1650300597.git.jpoimboe@redhat.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CET-IBT (Control-Flow-Integrity) support from Peter Zijlstra:
"Add support for Intel CET-IBT, available since Tigerlake (11th gen),
which is a coarse grained, hardware based, forward edge
Control-Flow-Integrity mechanism where any indirect CALL/JMP must
target an ENDBR instruction or suffer #CP.
Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation
is limited to 2 instructions (and typically fewer) on branch targets
not starting with ENDBR. CET-IBT also limits speculation of the next
sequential instruction after the indirect CALL/JMP [1].
CET-IBT is fundamentally incompatible with retpolines, but provides,
as described above, speculation limits itself"
[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html
* tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
kvm/emulate: Fix SETcc emulation for ENDBR
x86/Kconfig: Only allow CONFIG_X86_KERNEL_IBT with ld.lld >= 14.0.0
x86/Kconfig: Only enable CONFIG_CC_HAS_IBT for clang >= 14.0.0
kbuild: Fixup the IBT kbuild changes
x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy
x86: Remove toolchain check for X32 ABI capability
x86/alternative: Use .ibt_endbr_seal to seal indirect calls
objtool: Find unused ENDBR instructions
objtool: Validate IBT assumptions
objtool: Add IBT/ENDBR decoding
objtool: Read the NOENDBR annotation
x86: Annotate idtentry_df()
x86,objtool: Move the ASM_REACHABLE annotation to objtool.h
x86: Annotate call_on_stack()
objtool: Rework ASM_REACHABLE
x86: Mark __invalid_creds() __noreturn
exit: Mark do_group_exit() __noreturn
x86: Mark stop_this_cpu() __noreturn
objtool: Ignore extra-symbol code
objtool: Rename --duplicate to --lto
...
|
|
Objtool's --ibt option generates .ibt_endbr_seal which lists
superfluous ENDBR instructions. That is those instructions for which
the function is never indirectly called.
Overwrite these ENDBR instructions with a NOP4 such that these
function can never be indirect called, reducing the number of viable
ENDBR targets in the kernel.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154319.822545231@infradead.org
|
|
Annotate away some of the generic code references. This is things
where we take the address of a symbol for exception handling or return
addresses (eg. context switch).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.877758523@infradead.org
|
|
Similar to ibt_selftest_ip, apply the same pattern.
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.700456643@infradead.org
|
|
net/dsa/dsa2.c
commit afb3cc1a397d ("net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails")
commit e83d56537859 ("net: dsa: replay master state events in dsa_tree_{setup,teardown}_master")
https://lore.kernel.org/all/20220307101436.7ae87da0@canb.auug.org.au/
drivers/net/ethernet/intel/ice/ice.h
commit 97b0129146b1 ("ice: Fix error with handling of bonding MTU")
commit 43113ff73453 ("ice: add TTY for GNSS module for E810T device")
https://lore.kernel.org/all/20220310112843.3233bcf1@canb.auug.org.au/
drivers/staging/gdm724x/gdm_lte.c
commit fc7f750dc9d1 ("staging: gdm724x: fix use after free in gdm_lte_rx()")
commit 4bcc4249b4cf ("staging: Use netif_rx().")
https://lore.kernel.org/all/20220308111043.1018a59d@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The RETPOLINE_AMD name is unfortunate since it isn't necessarily
AMD only, in fact Hygon also uses it. Furthermore it will likely be
sufficient for some Intel processors. Therefore rename the thing to
RETPOLINE_LFENCE to better describe what it is.
Add the spectre_v2=retpoline,lfence option as an alias to
spectre_v2=retpoline,amd to preserve existing setups. However, the output
of /sys/devices/system/cpu/vulnerabilities/spectre_v2 will be changed.
[ bp: Fix typos, massage. ]
Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
|
|
This will be used by BPF jit compiler to dump JITed binary to a RX huge
page, and thus allow multiple BPF programs sharing the a huge (2MB) page.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20220204185742.271030-6-song@kernel.org
|
|
Currently, text_poke_bp() is very strict to only allow patching a
single instruction; however with straight-line-speculation it will be
required to patch: ret; int3, which is two instructions.
As such, relax the constraints a little to allow int3 padding for all
instructions that do not imply the execution of the next instruction,
ie: RET, JMP.d8 and JMP.d32.
While there, rename the text_poke_loc::rel32 field to ::disp.
Note: this fills up the text_poke_loc structure which is now a round
16 bytes big.
[ bp: Put comments ontop instead of on the side. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211204134908.082342723@infradead.org
|
|
Replace all ret/retq instructions with ASM_RET in preparation of
making it more than a single instruction.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211204134907.964635458@infradead.org
|
|
Make sure we can see the text changes when booting with
'debug-alternative'.
Example output:
[ ] SMP alternatives: retpoline at: __traceiter_initcall_level+0x1f/0x30 (ffffffff8100066f) len: 5 to: __x86_indirect_thunk_rax+0x0/0x20
[ ] SMP alternatives: ffffffff82603e58: [2:5) optimized NOPs: ff d0 0f 1f 00
[ ] SMP alternatives: ffffffff8100066f: orig: e8 cc 30 00 01
[ ] SMP alternatives: ffffffff8100066f: repl: ff d0 0f 1f 00
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.422273830@infradead.org
|
|
Try and replace retpoline thunk calls with:
LFENCE
CALL *%\reg
for spectre_v2=retpoline,amd.
Specifically, the sequence above is 5 bytes for the low 8 registers,
but 6 bytes for the high 8 registers. This means that unless the
compilers prefix stuff the call with higher registers this replacement
will fail.
Luckily GCC strongly favours RAX for the indirect calls and most (95%+
for defconfig-x86_64) will be converted. OTOH clang strongly favours
R11 and almost nothing gets converted.
Note: it will also generate a correct replacement for the Jcc.d32
case, except unless the compilers start to prefix stuff that, it'll
never fit. Specifically:
Jncc.d8 1f
LFENCE
JMP *%\reg
1:
is 7-8 bytes long, where the original instruction in unpadded form is
only 6 bytes.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.359986601@infradead.org
|
|
Handle the rare cases where the compiler (clang) does an indirect
conditional tail-call using:
Jcc __x86_indirect_thunk_\reg
For the !RETPOLINE case this can be rewritten to fit the original (6
byte) instruction like:
Jncc.d8 1f
JMP *%\reg
NOP
1:
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.296470217@infradead.org
|
|
Rewrite retpoline thunk call sites to be indirect calls for
spectre_v2=off. This ensures spectre_v2=off is as near to a
RETPOLINE=n build as possible.
This is the replacement for objtool writing alternative entries to
ensure the same and achieves feature-parity with the previous
approach.
One noteworthy feature is that it relies on the thunks to be in
machine order to compute the register index.
Specifically, this does not yet address the Jcc __x86_indirect_thunk_*
calls generated by clang, a future patch will add this.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.232495794@infradead.org
|
|
Pick up dependent changes in order to base further cleanups ontop.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Up until now the assumption was that an alternative patching site would
have some instructions at the beginning and trailing single-byte NOPs
(0x90) padding. Therefore, the patching machinery would go and optimize
those single-byte NOPs into longer ones.
However, this assumption is broken on 32-bit when code like
hv_do_hypercall() in hyperv_init() would use the ratpoline speculation
killer CALL_NOSPEC. The 32-bit version of that macro would align certain
insns to 16 bytes, leading to the compiler issuing a one or more
single-byte NOPs, depending on the holes it needs to fill for alignment.
That would lead to the warning in optimize_nops() to fire:
------------[ cut here ]------------
Not a NOP at 0xc27fb598
WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:211 optimize_nops.isra.13
due to that function verifying whether all of the following bytes really
are single-byte NOPs.
Therefore, carve out the NOP padding into a separate function and call
it for each NOP range beginning with a single-byte NOP.
Fixes: 23c1ad538f4f ("x86/alternatives: Optimize optimize_nops()")
Reported-by: Richard Narron <richard@aaazen.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213301
Link: https://lkml.kernel.org/r/20210601212125.17145-1-bp@alien8.de
|
|
For easier inspection which bytes have changed.
For example:
feat: 7*32+12, old: (__x86_indirect_thunk_r10+0x0/0x20 (ffffffff81c02480) len: 17), repl: (ffffffff897813aa, len: 17)
ffffffff81c02480: old_insn: 41 ff e2 90 90 90 90 90 90 90 90 90 90 90 90 90 90
ffffffff897813aa: rpl_insn: e8 07 00 00 00 f3 90 0f ae e8 eb f9 4c 89 14 24 c3
ffffffff81c02480: final_insn: e8 07 00 00 00 f3 90 0f ae e8 eb f9 4c 89 14 24 c3
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210601193713.16190-1-bp@alien8.de
|
|
Sparse says:
arch/x86/kernel/alternative.c:78:21: warning: symbol 'x86nops' was not declared. Should it be static?
Since x86nops[] is not used outside this file, Sparse is right and it can be made static.
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210506190726.15575-1-paskripkin@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 tlb updates from Ingo Molnar:
"The x86 MM changes in this cycle were:
- Implement concurrent TLB flushes, which overlaps the local TLB
flush with the remote TLB flush.
In testing this improved sysbench performance measurably by a
couple of percentage points, especially if TLB-heavy security
mitigations are active.
- Further micro-optimizations to improve the performance of TLB
flushes"
* tag 'x86-mm-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp: Micro-optimize smp_call_function_many_cond()
smp: Inline on_each_cpu_cond() and on_each_cpu()
x86/mm/tlb: Remove unnecessary uses of the inline keyword
cpumask: Mark functions as pure
x86/mm/tlb: Do not make is_lazy dirty for no reason
x86/mm/tlb: Privatize cpu_tlbstate
x86/mm/tlb: Flush remote and local TLBs concurrently
x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()
x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote()
smp: Run functions concurrently in smp_call_function_many_cond()
|
|
Currently, optimize_nops() scans to see if the alternative starts with
NOPs. However, the emit pattern is:
141: \oldinstr
142: .skip (len-(142b-141b)), 0x90
That is, when 'oldinstr' is short, the tail is padded with NOPs. This case
never gets optimized.
Rewrite optimize_nops() to replace any trailing string of NOPs inside
the alternative to larger NOPs. Also run it irrespective of patching,
replacing NOPs in both the original and replaced code.
A direct consequence is that 'padlen' becomes superfluous, so remove it.
[ bp:
- Adjust commit message
- remove a stale comment about needing to pad
- add a comment in optimize_nops()
- exit early if the NOP verif. loop catches a mismatch - function
should not not add NOPs in that case
- fix the "optimized NOPs" offsets output ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210326151259.442992235@infradead.org
|
|
a semantic conflict
Conflict-merge this main commit in essence:
a89dfde3dc3c: ("x86: Remove dynamic NOP selection")
With this upstream commit:
b90829704780: ("bpf: Use NOP_ATOMIC5 instead of emit_nops(&prog, 5) for BPF_TRAMP_F_CALL_ORIG")
Semantic merge conflict:
arch/x86/net/bpf_jit_comp.c
- memcpy(prog, ideal_nops[NOP_ATOMIC5], X86_PATCH_SIZE);
+ memcpy(prog, x86_nops[5], X86_PATCH_SIZE);
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Pick up dependent changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Add a helper to decode kernel instructions; there's no point in
endlessly repeating those last two arguments.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210326151259.379242587@infradead.org
|
|
This ensures that a NOP is a NOP and not a random other instruction that
is also a NOP. It allows simplification of dynamic code patching that
wants to verify existing code before writing new instructions (ftrace,
jump_label, static_call, etc..).
Differentiating on NOPs is not a feature.
This pessimises 32bit (DONTCARE) and 32bit on 64bit CPUs (CARELESS).
32bit is not a performance target.
Everything x86_64 since AMD K10 (2007) and Intel IvyBridge (2012) is
fine with using NOPL (as opposed to prefix NOP). And per FEATURE_NOPL
being required for x86_64, all x86_64 CPUs can use NOPL. So stop
caring about NOPs, simplify things and get on with life.
[ The problem seems to be that some uarchs can only decode NOPL on a
single front-end port while others have severe decode penalties for
excessive prefixes. All modern uarchs can handle both, except Atom,
which has prefix penalties. ]
[ Also, much doubt you can actually measure any of this on normal
workloads. ]
After this, FEATURE_NOPL is unused except for required-features for
x86_64. FEATURE_K8 is only used for PTI.
[ bp: Kernel build measurements showed ~0.3s slowdown on Sandybridge
which is hardly a slowdown. Get rid of X86_FEATURE_K7, while at it. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> # bpf
Acked-by: Linus Torvalds <torvalds@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20210312115749.065275711@infradead.org
|
|
No functional changes, just simplification.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-10-bp@alien8.de
|