summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorHou Tao <houtao1@huawei.com>2023-09-01 14:19:52 +0300
committerAlexei Starovoitov <ast@kernel.org>2023-09-08 18:42:18 +0300
commit566f6de3cea3482d75d836a2398792a8be32ec26 (patch)
tree7ef90b239bd292763a4dd2985aa03e3f81d00122
parent1e4a6d975e5cd114509aa447750d68d295a501a7 (diff)
downloadlinux-566f6de3cea3482d75d836a2398792a8be32ec26.tar.xz
bpf: Enable IRQ after irq_work_raise() completes in unit_alloc()
When doing stress test for qp-trie, bpf_mem_alloc() returned NULL unexpectedly because all qp-trie operations were initiated from bpf syscalls and there was still available free memory. bpf_obj_new() has the same problem as shown by the following selftest. The failure is due to the preemption. irq_work_raise() will invoke irq_work_claim() first to mark the irq work as pending and then inovke __irq_work_queue_local() to raise an IPI. So when the current task which is invoking irq_work_raise() is preempted by other task, unit_alloc() may return NULL for preemption task as shown below: task A task B unit_alloc() // low_watermark = 32 // free_cnt = 31 after alloc irq_work_raise() // mark irq work as IRQ_WORK_PENDING irq_work_claim() // task B preempts task A unit_alloc() // free_cnt = 30 after alloc // irq work is already PENDING, // so just return irq_work_raise() // does unit_alloc() 30-times ...... unit_alloc() // free_cnt = 0 before alloc return NULL Fix it by enabling IRQ after irq_work_raise() completes. An alternative fix is using preempt_{disable|enable}_notrace() pair, but it may have extra overhead. Another feasible fix is to only disable preemption or IRQ before invoking irq_work_queue() and enable preemption or IRQ after the invocation completes, but it can't handle the case when c->low_watermark is 1. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20230901111954.1804721-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-rw-r--r--kernel/bpf/memalloc.c7
1 files changed, 6 insertions, 1 deletions
diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index cb60445de98a..c5d822d7cfaa 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -732,12 +732,17 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c)
}
}
local_dec(&c->active);
- local_irq_restore(flags);
WARN_ON(cnt < 0);
if (cnt < c->low_watermark)
irq_work_raise(c);
+ /* Enable IRQ after the enqueue of irq work completes, so irq work
+ * will run after IRQ is enabled and free_llist may be refilled by
+ * irq work before other task preempts current task.
+ */
+ local_irq_restore(flags);
+
return llnode;
}