From 4130a61cebb1843517d68017a967f27fb602b885 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Wed, 27 Apr 2022 18:31:23 +0100 Subject: lkdtm/stackleak: avoid spurious failure The lkdtm_STACKLEAK_ERASING() test scans for a contiguous block of poison values between the low stack bound and the stack pointer, and fails if it does not find a sufficiently large block. This can happen legitimately if the scan the low stack bound, which could occur if functions called prior to lkdtm_STACKLEAK_ERASING() used a large amount of stack. If this were to occur, it means that the erased portion of the stack is smaller than the size used by the scan, but does not cause a functional problem In practice this is unlikely to happen, but as this is legitimate and would not result in a functional problem, the test should not fail in this case. Remove the spurious failure case. Signed-off-by: Mark Rutland Cc: Alexander Popov Cc: Andrew Morton Cc: Andy Lutomirski Cc: Kees Cook Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220427173128.2603085-9-mark.rutland@arm.com --- drivers/misc/lkdtm/stackleak.c | 7 ------- 1 file changed, 7 deletions(-) (limited to 'drivers/misc') diff --git a/drivers/misc/lkdtm/stackleak.c b/drivers/misc/lkdtm/stackleak.c index 00db21ff115e..707d530d509b 100644 --- a/drivers/misc/lkdtm/stackleak.c +++ b/drivers/misc/lkdtm/stackleak.c @@ -53,13 +53,6 @@ void lkdtm_STACKLEAK_ERASING(void) found = 0; } - if (found <= check_depth) { - pr_err("FAIL: the erased part is not found (checked %lu bytes)\n", - i * sizeof(unsigned long)); - test_failed = true; - goto end; - } - pr_info("the erased part begins after %lu not poisoned bytes\n", (i - found) * sizeof(unsigned long)); -- cgit v1.2.3 From 72b61896f2b47fa4b98e86184bc0e6ddbd1a8db1 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Wed, 27 Apr 2022 18:31:24 +0100 Subject: lkdtm/stackleak: rework boundary management There are a few problems with the way the LKDTM STACKLEAK_ERASING test manipulates the stack pointer and boundary values: * It uses the address of a local variable to determine the current stack pointer, rather than using current_stack_pointer directly. As the local variable could be placed anywhere within the stack frame, this can be an over-estimate of the true stack pointer value. * Is uses an estimate of the current stack pointer as the upper boundary when scanning for poison, even though prior functions could have used more stack (and may have updated current->lowest stack accordingly). * A pr_info() call is made in the middle of the test. As the printk() code is out-of-line and will make use of the stack, this could clobber poison and/or adjust current->lowest_stack. It would be better to log the metadata after the body of the test to avoid such problems. These have been observed to result in spurious test failures on arm64. In addition to this there are a couple of things which are sub-optimal: * To avoid the STACK_END_MAGIC value, it conditionally modifies 'left' if this contains more than a single element, when it could instead calculate the bound unconditionally using stackleak_task_low_bound(). * It open-codes the poison scanning. It would be better if this used the same helper code as used by erasing function so that the two cannot diverge. This patch reworks the test to avoid these issues, making use of the recently introduced helpers to ensure this is aligned with the regular stackleak code. As the new code tests stack boundaries before accessing the stack, there is no need to fail early when the tracked or untracked portions of the stack extend all the way to the low stack boundary. As stackleak_find_top_of_poison() is now used to find the top of the poisoned region of the stack, the subsequent poison checking starts at this boundary and verifies that stackleak_find_top_of_poison() is working correctly. The pr_info() which logged the untracked portion of stack is now moved to the end of the function, and logs the size of all the portions of the stack relevant to the test, including the portions at the top and bottom of the stack which are not erased or scanned, and the current / lowest recorded stack usage. Tested on x86_64: | # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT | lkdtm: Performing direct entry STACKLEAK_ERASING | lkdtm: stackleak stack usage: | high offset: 168 bytes | current: 336 bytes | lowest: 656 bytes | tracked: 656 bytes | untracked: 400 bytes | poisoned: 15152 bytes | low offset: 8 bytes | lkdtm: OK: the rest of the thread stack is properly erased Tested on arm64: | # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT | lkdtm: Performing direct entry STACKLEAK_ERASING | lkdtm: stackleak stack usage: | high offset: 336 bytes | current: 656 bytes | lowest: 1232 bytes | tracked: 1232 bytes | untracked: 672 bytes | poisoned: 14136 bytes | low offset: 8 bytes | lkdtm: OK: the rest of the thread stack is properly erased Tested on arm64 with deliberate breakage to the starting stack value and poison scanning: | # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT | lkdtm: Performing direct entry STACKLEAK_ERASING | lkdtm: FAIL: non-poison value 24 bytes below poison boundary: 0x0 | lkdtm: FAIL: non-poison value 32 bytes below poison boundary: 0xffff8000083dbc00 ... | lkdtm: FAIL: non-poison value 1912 bytes below poison boundary: 0x78b4b9999e8cb15 | lkdtm: FAIL: non-poison value 1920 bytes below poison boundary: 0xffff8000083db400 | lkdtm: stackleak stack usage: | high offset: 336 bytes | current: 688 bytes | lowest: 1232 bytes | tracked: 576 bytes | untracked: 288 bytes | poisoned: 15176 bytes | low offset: 8 bytes | lkdtm: FAIL: the thread stack is NOT properly erased! | lkdtm: Unexpected! This kernel (5.18.0-rc1-00013-g1f7b1f1e29e0-dirty aarch64) was built with CONFIG_GCC_PLUGIN_STACKLEAK=y Signed-off-by: Mark Rutland Cc: Alexander Popov Cc: Andrew Morton Cc: Andy Lutomirski Cc: Kees Cook Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220427173128.2603085-10-mark.rutland@arm.com --- drivers/misc/lkdtm/stackleak.c | 86 +++++++++++++++++++++++------------------- 1 file changed, 47 insertions(+), 39 deletions(-) (limited to 'drivers/misc') diff --git a/drivers/misc/lkdtm/stackleak.c b/drivers/misc/lkdtm/stackleak.c index 707d530d509b..0aafa46ced7d 100644 --- a/drivers/misc/lkdtm/stackleak.c +++ b/drivers/misc/lkdtm/stackleak.c @@ -13,59 +13,67 @@ void lkdtm_STACKLEAK_ERASING(void) { - unsigned long *sp, left, found, i; - const unsigned long check_depth = - STACKLEAK_SEARCH_DEPTH / sizeof(unsigned long); + const unsigned long task_stack_base = (unsigned long)task_stack_page(current); + const unsigned long task_stack_low = stackleak_task_low_bound(current); + const unsigned long task_stack_high = stackleak_task_high_bound(current); + const unsigned long current_sp = current_stack_pointer; + const unsigned long lowest_sp = current->lowest_stack; + unsigned long untracked_high; + unsigned long poison_high, poison_low; bool test_failed = false; /* - * For the details about the alignment of the poison values, see - * the comment in stackleak_track_stack(). + * Depending on what has run prior to this test, the lowest recorded + * stack pointer could be above or below the current stack pointer. + * Start from the lowest of the two. + * + * Poison values are naturally-aligned unsigned longs. As the current + * stack pointer might not be sufficiently aligned, we must align + * downwards to find the lowest known stack pointer value. This is the + * high boundary for a portion of the stack which may have been used + * without being tracked, and has to be scanned for poison. */ - sp = PTR_ALIGN(&i, sizeof(unsigned long)); - - left = ((unsigned long)sp & (THREAD_SIZE - 1)) / sizeof(unsigned long); - sp--; + untracked_high = min(current_sp, lowest_sp); + untracked_high = ALIGN_DOWN(untracked_high, sizeof(unsigned long)); /* - * One 'long int' at the bottom of the thread stack is reserved - * and not poisoned. + * Find the top of the poison in the same way as the erasing code. */ - if (left > 1) { - left--; - } else { - pr_err("FAIL: not enough stack space for the test\n"); - test_failed = true; - goto end; - } - - pr_info("checking unused part of the thread stack (%lu bytes)...\n", - left * sizeof(unsigned long)); + poison_high = stackleak_find_top_of_poison(task_stack_low, untracked_high); /* - * Search for 'check_depth' poison values in a row (just like - * stackleak_erase() does). + * Check whether the poisoned portion of the stack (if any) consists + * entirely of poison. This verifies the entries that + * stackleak_find_top_of_poison() should have checked. */ - for (i = 0, found = 0; i < left && found <= check_depth; i++) { - if (*(sp - i) == STACKLEAK_POISON) - found++; - else - found = 0; - } + poison_low = poison_high; + while (poison_low > task_stack_low) { + poison_low -= sizeof(unsigned long); - pr_info("the erased part begins after %lu not poisoned bytes\n", - (i - found) * sizeof(unsigned long)); + if (*(unsigned long *)poison_low == STACKLEAK_POISON) + continue; - /* The rest of thread stack should be erased */ - for (; i < left; i++) { - if (*(sp - i) != STACKLEAK_POISON) { - pr_err("FAIL: bad value number %lu in the erased part: 0x%lx\n", - i, *(sp - i)); - test_failed = true; - } + pr_err("FAIL: non-poison value %lu bytes below poison boundary: 0x%lx\n", + poison_high - poison_low, *(unsigned long *)poison_low); + test_failed = true; } -end: + pr_info("stackleak stack usage:\n" + " high offset: %lu bytes\n" + " current: %lu bytes\n" + " lowest: %lu bytes\n" + " tracked: %lu bytes\n" + " untracked: %lu bytes\n" + " poisoned: %lu bytes\n" + " low offset: %lu bytes\n", + task_stack_base + THREAD_SIZE - task_stack_high, + task_stack_high - current_sp, + task_stack_high - lowest_sp, + task_stack_high - untracked_high, + untracked_high - poison_high, + poison_high - task_stack_low, + task_stack_low - task_stack_base); + if (test_failed) { pr_err("FAIL: the thread stack is NOT properly erased!\n"); pr_expected_config(CONFIG_GCC_PLUGIN_STACKLEAK); -- cgit v1.2.3 From f03a50938decaa601f02294e4d057fb0048ca9ca Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Wed, 27 Apr 2022 18:31:25 +0100 Subject: lkdtm/stackleak: prevent unexpected stack usage The lkdtm_STACKLEAK_ERASING() test is instrumentable and runs with IRQs unmasked, so it's possible for unrelated code to clobber the task stack and/or manipulate current->lowest_stack while the test is running, resulting in spurious failures. The regular stackleak erasing code is non-instrumentable and runs with IRQs masked, preventing similar issues. Make the body of the test non-instrumentable, and run it with IRQs masked, avoiding such spurious failures. Signed-off-by: Mark Rutland Cc: Alexander Popov Cc: Andrew Morton Cc: Andy Lutomirski Cc: Catalin Marinas Cc: Kees Cook Cc: Will Deacon Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220427173128.2603085-11-mark.rutland@arm.com --- drivers/misc/lkdtm/stackleak.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) (limited to 'drivers/misc') diff --git a/drivers/misc/lkdtm/stackleak.c b/drivers/misc/lkdtm/stackleak.c index 0aafa46ced7d..46c60761a05e 100644 --- a/drivers/misc/lkdtm/stackleak.c +++ b/drivers/misc/lkdtm/stackleak.c @@ -11,7 +11,20 @@ #include "lkdtm.h" #include -void lkdtm_STACKLEAK_ERASING(void) +/* + * Check that stackleak tracks the lowest stack pointer and erases the stack + * below this as expected. + * + * To prevent the lowest stack pointer changing during the test, IRQs are + * masked and instrumentation of this function is disabled. We assume that the + * compiler will create a fixed-size stack frame for this function. + * + * Any non-inlined function may make further use of the stack, altering the + * lowest stack pointer and/or clobbering poison values. To avoid spurious + * failures we must avoid printing until the end of the test or have already + * encountered a failure condition. + */ +static void noinstr check_stackleak_irqoff(void) { const unsigned long task_stack_base = (unsigned long)task_stack_page(current); const unsigned long task_stack_low = stackleak_task_low_bound(current); @@ -81,3 +94,12 @@ void lkdtm_STACKLEAK_ERASING(void) pr_info("OK: the rest of the thread stack is properly erased\n"); } } + +void lkdtm_STACKLEAK_ERASING(void) +{ + unsigned long flags; + + local_irq_save(flags); + check_stackleak_irqoff(); + local_irq_restore(flags); +} -- cgit v1.2.3 From f171d695f3adfd8093272fa5776ef4a2c6b9cf08 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Wed, 27 Apr 2022 18:31:26 +0100 Subject: lkdtm/stackleak: check stack boundaries The stackleak code relies upon the current SP and lowest recorded SP falling within expected task stack boundaries. Check this at the start of the test. Signed-off-by: Mark Rutland Cc: Alexander Popov Cc: Andrew Morton Cc: Andy Lutomirski Cc: Catalin Marinas Cc: Kees Cook Cc: Will Deacon Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220427173128.2603085-12-mark.rutland@arm.com --- drivers/misc/lkdtm/stackleak.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'drivers/misc') diff --git a/drivers/misc/lkdtm/stackleak.c b/drivers/misc/lkdtm/stackleak.c index 46c60761a05e..52800583fd05 100644 --- a/drivers/misc/lkdtm/stackleak.c +++ b/drivers/misc/lkdtm/stackleak.c @@ -35,6 +35,25 @@ static void noinstr check_stackleak_irqoff(void) unsigned long poison_high, poison_low; bool test_failed = false; + /* + * Check that the current and lowest recorded stack pointer values fall + * within the expected task stack boundaries. These tests should never + * fail unless the boundaries are incorrect or we're clobbering the + * STACK_END_MAGIC, and in either casee something is seriously wrong. + */ + if (current_sp < task_stack_low || current_sp >= task_stack_high) { + pr_err("FAIL: current_stack_pointer (0x%lx) outside of task stack bounds [0x%lx..0x%lx]\n", + current_sp, task_stack_low, task_stack_high - 1); + test_failed = true; + goto out; + } + if (lowest_sp < task_stack_low || lowest_sp >= task_stack_high) { + pr_err("FAIL: current->lowest_stack (0x%lx) outside of task stack bounds [0x%lx..0x%lx]\n", + lowest_sp, task_stack_low, task_stack_high - 1); + test_failed = true; + goto out; + } + /* * Depending on what has run prior to this test, the lowest recorded * stack pointer could be above or below the current stack pointer. @@ -87,6 +106,7 @@ static void noinstr check_stackleak_irqoff(void) poison_high - task_stack_low, task_stack_low - task_stack_base); +out: if (test_failed) { pr_err("FAIL: the thread stack is NOT properly erased!\n"); pr_expected_config(CONFIG_GCC_PLUGIN_STACKLEAK); -- cgit v1.2.3 From 8c6a490e404fe82b53bbafe0f0eeef10605617f7 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Fri, 6 May 2022 13:11:45 +0100 Subject: lkdtm/stackleak: fix CONFIG_GCC_PLUGIN_STACKLEAK=n Recent rework broke building LKDTM when CONFIG_GCC_PLUGIN_STACKLEAK=n. This patch fixes that breakage. Prior to recent stackleak rework, the LKDTM STACKLEAK_ERASING code could be built when the kernel was not built with stackleak support, and would run a test that would almost certainly fail (or pass by sheer cosmic coincidence), e.g. | # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT | lkdtm: Performing direct entry STACKLEAK_ERASING | lkdtm: checking unused part of the thread stack (15560 bytes)... | lkdtm: FAIL: the erased part is not found (checked 15560 bytes) | lkdtm: FAIL: the thread stack is NOT properly erased! | lkdtm: This is probably expected, since this kernel (5.18.0-rc2 aarch64) was built *without* CONFIG_GCC_PLUGIN_STACKLEAK=y The recent rework to the test made it more accurate by using helpers which are only defined when CONFIG_GCC_PLUGIN_STACKLEAK=y, and so when building LKDTM when CONFIG_GCC_PLUGIN_STACKLEAK=n, we get a build failure: | drivers/misc/lkdtm/stackleak.c: In function 'check_stackleak_irqoff': | drivers/misc/lkdtm/stackleak.c:30:46: error: implicit declaration of function 'stackleak_task_low_bound' [-Werror=implicit-function-declaration] | 30 | const unsigned long task_stack_low = stackleak_task_low_bound(current); | | ^~~~~~~~~~~~~~~~~~~~~~~~ | drivers/misc/lkdtm/stackleak.c:31:47: error: implicit declaration of function 'stackleak_task_high_bound'; did you mean 'stackleak_task_init'? [-Werror=implicit-function-declaration] | 31 | const unsigned long task_stack_high = stackleak_task_high_bound(current); | | ^~~~~~~~~~~~~~~~~~~~~~~~~ | | stackleak_task_init | drivers/misc/lkdtm/stackleak.c:33:48: error: 'struct task_struct' has no member named 'lowest_stack' | 33 | const unsigned long lowest_sp = current->lowest_stack; | | ^~ | drivers/misc/lkdtm/stackleak.c:74:23: error: implicit declaration of function 'stackleak_find_top_of_poison' [-Werror=implicit-function-declaration] | 74 | poison_high = stackleak_find_top_of_poison(task_stack_low, untracked_high); | | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ This patch fixes the issue by not compiling the body of the test when CONFIG_GCC_PLUGIN_STACKLEAK=n, and replacing this with an unconditional XFAIL message. This means the pr_expected_config() in check_stackleak_irqoff() is redundant, and so it is removed. Where an architecture does not support stackleak, the test will log: | # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT | lkdtm: Performing direct entry STACKLEAK_ERASING | lkdtm: XFAIL: stackleak is not supported on this arch (HAVE_ARCH_STACKLEAK=n) Where an architectures does support stackleak, but this has not been compiled in, the test will log: | # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT | lkdtm: Performing direct entry STACKLEAK_ERASING | lkdtm: XFAIL: stackleak is not enabled (CONFIG_GCC_PLUGIN_STACKLEAK=n) Where stackleak has been compiled in, the test behaves as usual: | # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT | lkdtm: Performing direct entry STACKLEAK_ERASING | lkdtm: stackleak stack usage: | high offset: 336 bytes | current: 688 bytes | lowest: 1232 bytes | tracked: 1232 bytes | untracked: 672 bytes | poisoned: 14136 bytes | low offset: 8 bytes | lkdtm: OK: the rest of the thread stack is properly erased Fixes: f4cfacd92972cc44 ("lkdtm/stackleak: rework boundary management") Signed-off-by: Mark Rutland Cc: Alexander Popov Cc: Kees Cook Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220506121145.1162908-1-mark.rutland@arm.com --- drivers/misc/lkdtm/stackleak.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) (limited to 'drivers/misc') diff --git a/drivers/misc/lkdtm/stackleak.c b/drivers/misc/lkdtm/stackleak.c index 52800583fd05..82369c6f889e 100644 --- a/drivers/misc/lkdtm/stackleak.c +++ b/drivers/misc/lkdtm/stackleak.c @@ -11,6 +11,7 @@ #include "lkdtm.h" #include +#if defined(CONFIG_GCC_PLUGIN_STACKLEAK) /* * Check that stackleak tracks the lowest stack pointer and erases the stack * below this as expected. @@ -109,7 +110,6 @@ static void noinstr check_stackleak_irqoff(void) out: if (test_failed) { pr_err("FAIL: the thread stack is NOT properly erased!\n"); - pr_expected_config(CONFIG_GCC_PLUGIN_STACKLEAK); } else { pr_info("OK: the rest of the thread stack is properly erased\n"); } @@ -123,3 +123,13 @@ void lkdtm_STACKLEAK_ERASING(void) check_stackleak_irqoff(); local_irq_restore(flags); } +#else /* defined(CONFIG_GCC_PLUGIN_STACKLEAK) */ +void lkdtm_STACKLEAK_ERASING(void) +{ + if (IS_ENABLED(CONFIG_HAVE_ARCH_STACKLEAK)) { + pr_err("XFAIL: stackleak is not enabled (CONFIG_GCC_PLUGIN_STACKLEAK=n)\n"); + } else { + pr_err("XFAIL: stackleak is not supported on this arch (HAVE_ARCH_STACKLEAK=n)\n"); + } +} +#endif /* defined(CONFIG_GCC_PLUGIN_STACKLEAK) */ -- cgit v1.2.3