summaryrefslogtreecommitdiff
path: root/include
diff options
context:
space:
mode:
authorDavid S. Miller <davem@davemloft.net>2014-09-26 23:05:40 +0400
committerDavid S. Miller <davem@davemloft.net>2014-09-26 23:05:40 +0400
commitb4fc1a460f3017e958e6a8ea560ea0afd91bf6fe (patch)
treead7927653cfca896fd60e2e7b2fb12750e46fd2e /include
parent4a8e320c929991c9480a7b936512c57ea02d87b2 (diff)
parent3c731eba48e1b0650decfc91a839b80f0e05ce8f (diff)
downloadlinux-b4fc1a460f3017e958e6a8ea560ea0afd91bf6fe.tar.xz
Merge branch 'bpf-next'
Alexei Starovoitov says: ==================== eBPF syscall, verifier, testsuite v14 -> v15: - got rid of macros with hidden control flow (suggested by David) replaced macro with explicit goto or return and simplified where possible (affected patches #9 and #10) - rebased, retested v13 -> v14: - small change to 1st patch to ease 'new userspace with old kernel' problem (done similar to perf_copy_attr()) (suggested by Daniel) - the rest unchanged v12 -> v13: - replaced 'foo __user *' pointers with __aligned_u64 (suggested by David) - added __attribute__((aligned(8)) to 'union bpf_attr' to keep constant alignment between patches - updated manpage and syscall wrappers due to __aligned_u64 - rebased, retested on x64 with 32-bit and 64-bit userspace and on i386, build tested on arm32,sparc64 v11 -> v12: - dropped patch 11 and copied few macros to libbpf.h (suggested by Daniel) - replaced 'enum bpf_prog_type' with u32 to be safe in compat (.. Andy) - implemented and tested compat support (not part of this set) (.. Daniel) - changed 'void *log_buf' to 'char *' (.. Daniel) - combined struct bpf_work_struct and bpf_prog_info (.. Daniel) - added better return value explanation to manpage (.. Andy) - added log_buf/log_size explanation to manpage (.. Andy & Daniel) - added a lot more info about prog_type and map_type to manpage (.. Andy) - rebased, tweaked test_stubs Patches 1-4 establish BPF syscall shell for maps and programs. Patches 5-10 add verifier step by step Patch 11 adds test stubs for 'unspec' program type and verifier testsuite from user space Note that patches 1,3,4,7 add commands and attributes to the syscall while being backwards compatible from each other, which should demonstrate how other commands can be added in the future. After this set the programs can be loaded for testing only. They cannot be attached to any events. Though manpage talks about tracing and sockets, it will be a subject of future patches. Please take a look at manpage: BPF(2) Linux Programmer's Manual BPF(2) NAME bpf - perform a command on eBPF map or program SYNOPSIS #include <linux/bpf.h> int bpf(int cmd, union bpf_attr *attr, unsigned int size); DESCRIPTION bpf() syscall is a multiplexor for a range of different operations on eBPF which can be characterized as "universal in-kernel virtual machine". eBPF is similar to original Berkeley Packet Filter (or "classic BPF") used to filter network packets. Both statically analyze the programs before loading them into the kernel to ensure that programs cannot harm the running system. eBPF extends classic BPF in multiple ways including ability to call in- kernel helper functions and access shared data structures like eBPF maps. The programs can be written in a restricted C that is compiled into eBPF bytecode and executed on the eBPF virtual machine or JITed into native instruction set. eBPF Design/Architecture eBPF maps is a generic storage of different types. User process can create multiple maps (with key/value being opaque bytes of data) and access them via file descriptor. In parallel eBPF programs can access maps from inside the kernel. It's up to user process and eBPF program to decide what they store inside maps. eBPF programs are similar to kernel modules. They are loaded by the user process and automatically unloaded when process exits. Each eBPF program is a safe run-to-completion set of instructions. eBPF verifier statically determines that the program terminates and is safe to execute. During verification the program takes a hold of maps that it intends to use, so selected maps cannot be removed until the program is unloaded. The program can be attached to different events. These events can be packets, tracepoint events and other types in the future. A new event triggers execution of the program which may store information about the event in the maps. Beyond storing data the programs may call into in-kernel helper functions which may, for example, dump stack, do trace_printk or other forms of live kernel debugging. The same program can be attached to multiple events. Different programs can access the same map: tracepoint tracepoint tracepoint sk_buff sk_buff event A event B event C on eth0 on eth1 | | | | | | | | | | --> tracing <-- tracing socket socket prog_1 prog_2 prog_3 prog_4 | | | | |--- -----| |-------| map_3 map_1 map_2 Syscall Arguments bpf() syscall operation is determined by cmd which can be one of the following: BPF_MAP_CREATE Create a map with given type and attributes and return map FD BPF_MAP_LOOKUP_ELEM Lookup element by key in a given map and return its value BPF_MAP_UPDATE_ELEM Create or update element (key/value pair) in a given map BPF_MAP_DELETE_ELEM Lookup and delete element by key in a given map BPF_MAP_GET_NEXT_KEY Lookup element by key in a given map and return key of next element BPF_PROG_LOAD Verify and load eBPF program attr is a pointer to a union of type bpf_attr as defined below. size is the size of the union. union bpf_attr { struct { /* anonymous struct used by BPF_MAP_CREATE command */ __u32 map_type; __u32 key_size; /* size of key in bytes */ __u32 value_size; /* size of value in bytes */ __u32 max_entries; /* max number of entries in a map */ }; struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ __u32 map_fd; __aligned_u64 key; union { __aligned_u64 value; __aligned_u64 next_key; }; }; struct { /* anonymous struct used by BPF_PROG_LOAD command */ __u32 prog_type; __u32 insn_cnt; __aligned_u64 insns; /* 'const struct bpf_insn *' */ __aligned_u64 license; /* 'const char *' */ __u32 log_level; /* verbosity level of eBPF verifier */ __u32 log_size; /* size of user buffer */ __aligned_u64 log_buf; /* user supplied 'char *' buffer */ }; } __attribute__((aligned(8))); eBPF maps maps is a generic storage of different types for sharing data between kernel and userspace. Any map type has the following attributes: . type . max number of elements . key size in bytes . value size in bytes The following wrapper functions demonstrate how this syscall can be used to access the maps. The functions use the cmd argument to invoke different operations. BPF_MAP_CREATE int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size, int max_entries) { union bpf_attr attr = { .map_type = map_type, .key_size = key_size, .value_size = value_size, .max_entries = max_entries }; return bpf(BPF_MAP_CREATE, &attr, sizeof(attr)); } bpf() syscall creates a map of map_type type and given attributes key_size, value_size, max_entries. On success it returns process-local file descriptor. On error, -1 is returned and errno is set to EINVAL or EPERM or ENOMEM. The attributes key_size and value_size will be used by verifier during program loading to check that program is calling bpf_map_*_elem() helper functions with correctly initialized key and that program doesn't access map element value beyond specified value_size. For example, when map is created with key_size = 8 and program does: bpf_map_lookup_elem(map_fd, fp - 4) such program will be rejected, since in-kernel helper function bpf_map_lookup_elem(map_fd, void *key) expects to read 8 bytes from 'key' pointer, but 'fp - 4' starting address will cause out of bounds stack access. Similarly, when map is created with value_size = 1 and program does: value = bpf_map_lookup_elem(...); *(u32 *)value = 1; such program will be rejected, since it accesses value pointer beyond specified 1 byte value_size limit. Currently only hash table map_type is supported: enum bpf_map_type { BPF_MAP_TYPE_UNSPEC, BPF_MAP_TYPE_HASH, }; map_type selects one of the available map implementations in kernel. For all map_types eBPF programs access maps with the same bpf_map_lookup_elem()/bpf_map_update_elem() helper functions. BPF_MAP_LOOKUP_ELEM int bpf_lookup_elem(int fd, void *key, void *value) { union bpf_attr attr = { .map_fd = fd, .key = ptr_to_u64(key), .value = ptr_to_u64(value), }; return bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr)); } bpf() syscall looks up an element with given key in a map fd. If element is found it returns zero and stores element's value into value. If element is not found it returns -1 and sets errno to ENOENT. BPF_MAP_UPDATE_ELEM int bpf_update_elem(int fd, void *key, void *value) { union bpf_attr attr = { .map_fd = fd, .key = ptr_to_u64(key), .value = ptr_to_u64(value), }; return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr)); } The call creates or updates element with given key/value in a map fd. On success it returns zero. On error, -1 is returned and errno is set to EINVAL or EPERM or ENOMEM or E2BIG. E2BIG indicates that number of elements in the map reached max_entries limit specified at map creation time. BPF_MAP_DELETE_ELEM int bpf_delete_elem(int fd, void *key) { union bpf_attr attr = { .map_fd = fd, .key = ptr_to_u64(key), }; return bpf(BPF_MAP_DELETE_ELEM, &attr, sizeof(attr)); } The call deletes an element in a map fd with given key. Returns zero on success. If element is not found it returns -1 and sets errno to ENOENT. BPF_MAP_GET_NEXT_KEY int bpf_get_next_key(int fd, void *key, void *next_key) { union bpf_attr attr = { .map_fd = fd, .key = ptr_to_u64(key), .next_key = ptr_to_u64(next_key), }; return bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr)); } The call looks up an element by key in a given map fd and returns key of the next element into next_key pointer. If key is not found, it return zero and returns key of the first element into next_key. If key is the last element, it returns -1 and sets errno to ENOENT. Other possible errno values are ENOMEM, EFAULT, EPERM, EINVAL. This method can be used to iterate over all elements of the map. close(map_fd) will delete the map map_fd. Exiting process will delete all maps automatically. eBPF programs BPF_PROG_LOAD This cmd is used to load eBPF program into the kernel. char bpf_log_buf[LOG_BUF_SIZE]; int bpf_prog_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns, int insn_cnt, const char *license) { union bpf_attr attr = { .prog_type = prog_type, .insns = ptr_to_u64(insns), .insn_cnt = insn_cnt, .license = ptr_to_u64(license), .log_buf = ptr_to_u64(bpf_log_buf), .log_size = LOG_BUF_SIZE, .log_level = 1, }; return bpf(BPF_PROG_LOAD, &attr, sizeof(attr)); } prog_type is one of the available program types: enum bpf_prog_type { BPF_PROG_TYPE_UNSPEC, BPF_PROG_TYPE_SOCKET, BPF_PROG_TYPE_TRACING, }; By picking prog_type program author selects a set of helper functions callable from eBPF program and corresponding format of struct bpf_context (which is the data blob passed into the program as the first argument). For example, the programs loaded with prog_type = TYPE_TRACING may call bpf_printk() helper, whereas TYPE_SOCKET programs may not. The set of functions available to the programs under given type may increase in the future. Currently the set of functions for TYPE_TRACING is: bpf_map_lookup_elem(map_fd, void *key) // lookup key in a map_fd bpf_map_update_elem(map_fd, void *key, void *value) // update key/value bpf_map_delete_elem(map_fd, void *key) // delete key in a map_fd bpf_ktime_get_ns(void) // returns current ktime bpf_printk(char *fmt, int fmt_size, ...) // prints into trace buffer bpf_memcmp(void *ptr1, void *ptr2, int size) // non-faulting memcmp bpf_fetch_ptr(void *ptr) // non-faulting load pointer from any address bpf_fetch_u8(void *ptr) // non-faulting 1 byte load bpf_fetch_u16(void *ptr) // other non-faulting loads bpf_fetch_u32(void *ptr) bpf_fetch_u64(void *ptr) and bpf_context is defined as: struct bpf_context { /* argN fields match one to one to arguments passed to trace events */ u64 arg1, arg2, arg3, arg4, arg5, arg6; /* return value from kretprobe event or from syscall_exit event */ u64 ret; }; The set of helper functions for TYPE_SOCKET is TBD. More program types may be added in the future. Like BPF_PROG_TYPE_USER_TRACING for unprivileged programs. BPF_PROG_TYPE_UNSPEC is used for testing only. Such programs cannot be attached to events. insns array of "struct bpf_insn" instructions insn_cnt number of instructions in the program license license string, which must be GPL compatible to call helper functions marked gpl_only log_buf user supplied buffer that in-kernel verifier is using to store verification log. Log is a multi-line string that should be used by program author to understand how verifier came to conclusion that program is unsafe. The format of the output can change at any time as verifier evolves. log_size size of user buffer. If size of the buffer is not large enough to store all verifier messages, -1 is returned and errno is set to ENOSPC. log_level verbosity level of eBPF verifier, where zero means no logs provided close(prog_fd) will unload eBPF program The maps are accesible from programs and generally tie the two together. Programs process various events (like tracepoint, kprobe, packets) and store the data into maps. User space fetches data from maps. Either the same or a different map may be used by user space as configuration space to alter program behavior on the fly. Events Once an eBPF program is loaded, it can be attached to an event. Various kernel subsystems have different ways to do so. For example: setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd)); will attach the program prog_fd to socket sock which was received by prior call to socket(). ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd); will attach the program prog_fd to perf event event_fd which was received by prior call to perf_event_open(). Another way to attach the program to a tracing event is: event_fd = open("/sys/kernel/debug/tracing/events/skb/kfree_skb/filter"); write(event_fd, "bpf-123"); /* where 123 is eBPF program FD */ /* here program is attached and will be triggered by events */ close(event_fd); /* to detach from event */ EXAMPLES /* eBPF+sockets example: * 1. create map with maximum of 2 elements * 2. set map[6] = 0 and map[17] = 0 * 3. load eBPF program that counts number of TCP and UDP packets received * via map[skb->ip->proto]++ * 4. attach prog_fd to raw socket via setsockopt() * 5. print number of received TCP/UDP packets every second */ int main(int ac, char **av) { int sock, map_fd, prog_fd, key; long long value = 0, tcp_cnt, udp_cnt; map_fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value), 2); if (map_fd < 0) { printf("failed to create map '%s'\n", strerror(errno)); /* likely not run as root */ return 1; } key = 6; /* ip->proto == tcp */ assert(bpf_update_elem(map_fd, &key, &value) == 0); key = 17; /* ip->proto == udp */ assert(bpf_update_elem(map_fd, &key, &value) == 0); struct bpf_insn prog[] = { BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), /* r6 = r1 */ BPF_LD_ABS(BPF_B, 14 + 9), /* r0 = ip->proto */ BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4),/* *(u32 *)(fp - 4) = r0 */ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), /* r2 = fp */ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = r2 - 4 */ BPF_LD_MAP_FD(BPF_REG_1, map_fd), /* r1 = map_fd */ BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem), /* r0 = map_lookup(r1, r2) */ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), /* if (r0 == 0) goto pc+2 */ BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */ BPF_XADD(BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* lock *(u64 *)r0 += r1 */ BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */ BPF_EXIT_INSN(), /* return r0 */ }; prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET, prog, sizeof(prog), "GPL"); assert(prog_fd >= 0); sock = open_raw_sock("lo"); assert(setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd)) == 0); for (;;) { key = 6; assert(bpf_lookup_elem(map_fd, &key, &tcp_cnt) == 0); key = 17; assert(bpf_lookup_elem(map_fd, &key, &udp_cnt) == 0); printf("TCP %lld UDP %lld packets0, tcp_cnt, udp_cnt); sleep(1); } return 0; } RETURN VALUE For a successful call, the return value depends on the operation: BPF_MAP_CREATE The new file descriptor associated with eBPF map. BPF_PROG_LOAD The new file descriptor associated with eBPF program. All other commands Zero. On error, -1 is returned, and errno is set appropriately. ERRORS EPERM bpf() syscall was made without sufficient privilege (without the CAP_SYS_ADMIN capability). ENOMEM Cannot allocate sufficient memory. EBADF fd is not an open file descriptor EFAULT One of the pointers ( key or value or log_buf or insns ) is outside accessible address space. EINVAL The value specified in cmd is not recognized by this kernel. EINVAL For BPF_MAP_CREATE, either map_type or attributes are invalid. EINVAL For BPF_MAP_*_ELEM commands, some of the fields of "union bpf_attr" unused by this command are not set to zero. EINVAL For BPF_PROG_LOAD, attempt to load invalid program (unrecognized instruction or uses reserved fields or jumps out of range or loop detected or calls unknown function). EACCES For BPF_PROG_LOAD, though program has valid instructions, it was rejected, since it was deemed unsafe (may access disallowed memory region or uninitialized stack/register or function constraints don't match actual types or misaligned access). In such case it is recommended to call bpf() again with log_level = 1 and examine log_buf for specific reason provided by verifier. ENOENT For BPF_MAP_LOOKUP_ELEM or BPF_MAP_DELETE_ELEM, indicates that element with given key was not found. E2BIG program is too large or a map reached max_entries limit (max number of elements). NOTES These commands may be used only by a privileged process (one having the CAP_SYS_ADMIN capability). SEE ALSO eBPF architecture and instruction set is explained in Documentation/networking/filter.txt Linux 2014-09-16 BPF(2) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'include')
-rw-r--r--include/linux/bpf.h136
-rw-r--r--include/linux/filter.h14
-rw-r--r--include/linux/syscalls.h3
-rw-r--r--include/uapi/asm-generic/unistd.h4
-rw-r--r--include/uapi/linux/bpf.h90
5 files changed, 239 insertions, 8 deletions
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
new file mode 100644
index 000000000000..3cf91754a957
--- /dev/null
+++ b/include/linux/bpf.h
@@ -0,0 +1,136 @@
+/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#ifndef _LINUX_BPF_H
+#define _LINUX_BPF_H 1
+
+#include <uapi/linux/bpf.h>
+#include <linux/workqueue.h>
+#include <linux/file.h>
+
+struct bpf_map;
+
+/* map is generic key/value storage optionally accesible by eBPF programs */
+struct bpf_map_ops {
+ /* funcs callable from userspace (via syscall) */
+ struct bpf_map *(*map_alloc)(union bpf_attr *attr);
+ void (*map_free)(struct bpf_map *);
+ int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key);
+
+ /* funcs callable from userspace and from eBPF programs */
+ void *(*map_lookup_elem)(struct bpf_map *map, void *key);
+ int (*map_update_elem)(struct bpf_map *map, void *key, void *value);
+ int (*map_delete_elem)(struct bpf_map *map, void *key);
+};
+
+struct bpf_map {
+ atomic_t refcnt;
+ enum bpf_map_type map_type;
+ u32 key_size;
+ u32 value_size;
+ u32 max_entries;
+ struct bpf_map_ops *ops;
+ struct work_struct work;
+};
+
+struct bpf_map_type_list {
+ struct list_head list_node;
+ struct bpf_map_ops *ops;
+ enum bpf_map_type type;
+};
+
+void bpf_register_map_type(struct bpf_map_type_list *tl);
+void bpf_map_put(struct bpf_map *map);
+struct bpf_map *bpf_map_get(struct fd f);
+
+/* function argument constraints */
+enum bpf_arg_type {
+ ARG_ANYTHING = 0, /* any argument is ok */
+
+ /* the following constraints used to prototype
+ * bpf_map_lookup/update/delete_elem() functions
+ */
+ ARG_CONST_MAP_PTR, /* const argument used as pointer to bpf_map */
+ ARG_PTR_TO_MAP_KEY, /* pointer to stack used as map key */
+ ARG_PTR_TO_MAP_VALUE, /* pointer to stack used as map value */
+
+ /* the following constraints used to prototype bpf_memcmp() and other
+ * functions that access data on eBPF program stack
+ */
+ ARG_PTR_TO_STACK, /* any pointer to eBPF program stack */
+ ARG_CONST_STACK_SIZE, /* number of bytes accessed from stack */
+};
+
+/* type of values returned from helper functions */
+enum bpf_return_type {
+ RET_INTEGER, /* function returns integer */
+ RET_VOID, /* function doesn't return anything */
+ RET_PTR_TO_MAP_VALUE_OR_NULL, /* returns a pointer to map elem value or NULL */
+};
+
+/* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs
+ * to in-kernel helper functions and for adjusting imm32 field in BPF_CALL
+ * instructions after verifying
+ */
+struct bpf_func_proto {
+ u64 (*func)(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
+ bool gpl_only;
+ enum bpf_return_type ret_type;
+ enum bpf_arg_type arg1_type;
+ enum bpf_arg_type arg2_type;
+ enum bpf_arg_type arg3_type;
+ enum bpf_arg_type arg4_type;
+ enum bpf_arg_type arg5_type;
+};
+
+/* bpf_context is intentionally undefined structure. Pointer to bpf_context is
+ * the first argument to eBPF programs.
+ * For socket filters: 'struct bpf_context *' == 'struct sk_buff *'
+ */
+struct bpf_context;
+
+enum bpf_access_type {
+ BPF_READ = 1,
+ BPF_WRITE = 2
+};
+
+struct bpf_verifier_ops {
+ /* return eBPF function prototype for verification */
+ const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id func_id);
+
+ /* return true if 'size' wide access at offset 'off' within bpf_context
+ * with 'type' (read or write) is allowed
+ */
+ bool (*is_valid_access)(int off, int size, enum bpf_access_type type);
+};
+
+struct bpf_prog_type_list {
+ struct list_head list_node;
+ struct bpf_verifier_ops *ops;
+ enum bpf_prog_type type;
+};
+
+void bpf_register_prog_type(struct bpf_prog_type_list *tl);
+
+struct bpf_prog;
+
+struct bpf_prog_aux {
+ atomic_t refcnt;
+ bool is_gpl_compatible;
+ enum bpf_prog_type prog_type;
+ struct bpf_verifier_ops *ops;
+ struct bpf_map **used_maps;
+ u32 used_map_cnt;
+ struct bpf_prog *prog;
+ struct work_struct work;
+};
+
+void bpf_prog_put(struct bpf_prog *prog);
+struct bpf_prog *bpf_prog_get(u32 ufd);
+/* verify correctness of eBPF program */
+int bpf_check(struct bpf_prog *fp, union bpf_attr *attr);
+
+#endif /* _LINUX_BPF_H */
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1a0bc6d134d7..ca95abd2bed1 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -21,6 +21,7 @@
struct sk_buff;
struct sock;
struct seccomp_data;
+struct bpf_prog_aux;
/* ArgX, context and stack frame pointer register positions. Note,
* Arg1, Arg2, Arg3, etc are used as argument mappings of function
@@ -144,6 +145,12 @@ struct seccomp_data;
.off = 0, \
.imm = ((__u64) (IMM)) >> 32 })
+#define BPF_PSEUDO_MAP_FD 1
+
+/* pseudo BPF_LD_IMM64 insn used to refer to process-local map_fd */
+#define BPF_LD_MAP_FD(DST, MAP_FD) \
+ BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD)
+
/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
#define BPF_MOV64_RAW(TYPE, DST, SRC, IMM) \
@@ -300,17 +307,12 @@ struct bpf_binary_header {
u8 image[];
};
-struct bpf_work_struct {
- struct bpf_prog *prog;
- struct work_struct work;
-};
-
struct bpf_prog {
u16 pages; /* Number of allocated pages */
bool jited; /* Is our filter JIT'ed? */
u32 len; /* Number of filter blocks */
struct sock_fprog_kern *orig_prog; /* Original BPF program */
- struct bpf_work_struct *work; /* Deferred free work struct */
+ struct bpf_prog_aux *aux; /* Auxiliary fields */
unsigned int (*bpf_func)(const struct sk_buff *skb,
const struct bpf_insn *filter);
/* Instructions for interpreter */
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 0f86d85a9ce4..bda9b81357cc 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -65,6 +65,7 @@ struct old_linux_dirent;
struct perf_event_attr;
struct file_handle;
struct sigaltstack;
+union bpf_attr;
#include <linux/types.h>
#include <linux/aio_abi.h>
@@ -875,5 +876,5 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
const char __user *uargs);
asmlinkage long sys_getrandom(char __user *buf, size_t count,
unsigned int flags);
-
+asmlinkage long sys_bpf(int cmd, union bpf_attr *attr, unsigned int size);
#endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 11d11bc5c78f..22749c134117 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -705,9 +705,11 @@ __SYSCALL(__NR_seccomp, sys_seccomp)
__SYSCALL(__NR_getrandom, sys_getrandom)
#define __NR_memfd_create 279
__SYSCALL(__NR_memfd_create, sys_memfd_create)
+#define __NR_bpf 280
+__SYSCALL(__NR_bpf, sys_bpf)
#undef __NR_syscalls
-#define __NR_syscalls 280
+#define __NR_syscalls 281
/*
* All syscalls below here should go away really,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 479ed0b6be16..31b0ac208a52 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -62,4 +62,94 @@ struct bpf_insn {
__s32 imm; /* signed immediate constant */
};
+/* BPF syscall commands */
+enum bpf_cmd {
+ /* create a map with given type and attributes
+ * fd = bpf(BPF_MAP_CREATE, union bpf_attr *, u32 size)
+ * returns fd or negative error
+ * map is deleted when fd is closed
+ */
+ BPF_MAP_CREATE,
+
+ /* lookup key in a given map
+ * err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)
+ * Using attr->map_fd, attr->key, attr->value
+ * returns zero and stores found elem into value
+ * or negative error
+ */
+ BPF_MAP_LOOKUP_ELEM,
+
+ /* create or update key/value pair in a given map
+ * err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)
+ * Using attr->map_fd, attr->key, attr->value
+ * returns zero or negative error
+ */
+ BPF_MAP_UPDATE_ELEM,
+
+ /* find and delete elem by key in a given map
+ * err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)
+ * Using attr->map_fd, attr->key
+ * returns zero or negative error
+ */
+ BPF_MAP_DELETE_ELEM,
+
+ /* lookup key in a given map and return next key
+ * err = bpf(BPF_MAP_GET_NEXT_KEY, union bpf_attr *attr, u32 size)
+ * Using attr->map_fd, attr->key, attr->next_key
+ * returns zero and stores next key or negative error
+ */
+ BPF_MAP_GET_NEXT_KEY,
+
+ /* verify and load eBPF program
+ * prog_fd = bpf(BPF_PROG_LOAD, union bpf_attr *attr, u32 size)
+ * Using attr->prog_type, attr->insns, attr->license
+ * returns fd or negative error
+ */
+ BPF_PROG_LOAD,
+};
+
+enum bpf_map_type {
+ BPF_MAP_TYPE_UNSPEC,
+};
+
+enum bpf_prog_type {
+ BPF_PROG_TYPE_UNSPEC,
+};
+
+union bpf_attr {
+ struct { /* anonymous struct used by BPF_MAP_CREATE command */
+ __u32 map_type; /* one of enum bpf_map_type */
+ __u32 key_size; /* size of key in bytes */
+ __u32 value_size; /* size of value in bytes */
+ __u32 max_entries; /* max number of entries in a map */
+ };
+
+ struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
+ __u32 map_fd;
+ __aligned_u64 key;
+ union {
+ __aligned_u64 value;
+ __aligned_u64 next_key;
+ };
+ };
+
+ struct { /* anonymous struct used by BPF_PROG_LOAD command */
+ __u32 prog_type; /* one of enum bpf_prog_type */
+ __u32 insn_cnt;
+ __aligned_u64 insns;
+ __aligned_u64 license;
+ __u32 log_level; /* verbosity level of verifier */
+ __u32 log_size; /* size of user buffer */
+ __aligned_u64 log_buf; /* user supplied buffer */
+ };
+} __attribute__((aligned(8)));
+
+/* integer value in 'imm' field of BPF_CALL instruction selects which helper
+ * function eBPF program intends to call
+ */
+enum bpf_func_id {
+ BPF_FUNC_unspec,
+ __BPF_FUNC_MAX_ID,
+};
+
#endif /* _UAPI__LINUX_BPF_H__ */