diff options
author | David S. Miller <davem@davemloft.net> | 2018-06-05 19:42:19 +0300 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2018-06-05 19:42:19 +0300 |
commit | fd129f8941cf2309def29b5c8a23b62faff0c9d0 (patch) | |
tree | 6ad8afbb59eaf14cfa9f0c4bad498254e6ff1e66 /samples | |
parent | a6fa9087fc280bba8a045d11d9b5d86cbf9a3a83 (diff) | |
parent | 9fa06104a235f64d6a2bf3012cc9966e8e4be5eb (diff) | |
download | linux-fd129f8941cf2309def29b5c8a23b62faff0c9d0.tar.xz |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-06-05
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add a new BPF hook for sendmsg similar to existing hooks for bind and
connect: "This allows to override source IP (including the case when it's
set via cmsg(3)) and destination IP:port for unconnected UDP (slow path).
TCP and connected UDP (fast path) are not affected. This makes UDP support
complete, that is, connected UDP is handled by connect hooks, unconnected
by sendmsg ones.", from Andrey.
2) Rework of the AF_XDP API to allow extending it in future for type writer
model if necessary. In this mode a memory window is passed to hardware
and multiple frames might be filled into that window instead of just one
that is the case in the current fixed frame-size model. With the new
changes made this can be supported without having to add a new descriptor
format. Also, core bits for the zero-copy support for AF_XDP have been
merged as agreed upon, where i40e bits will be routed via Jeff later on.
Various improvements to documentation and sample programs included as
well, all from Björn and Magnus.
3) Given BPF's flexibility, a new program type has been added to implement
infrared decoders. Quote: "The kernel IR decoders support the most
widely used IR protocols, but there are many protocols which are not
supported. [...] There is a 'long tail' of unsupported IR protocols,
for which lircd is need to decode the IR. IR encoding is done in such
a way that some simple circuit can decode it; therefore, BPF is ideal.
[...] user-space can define a decoder in BPF, attach it to the rc
device through the lirc chardev.", from Sean.
4) Several improvements and fixes to BPF core, among others, dumping map
and prog IDs into fdinfo which is a straight forward way to correlate
BPF objects used by applications, removing an indirect call and therefore
retpoline in all map lookup/update/delete calls by invoking the callback
directly for 64 bit archs, adding a new bpf_skb_cgroup_id() BPF helper
for tc BPF programs to have an efficient way of looking up cgroup v2 id
for policy or other use cases. Fixes to make sure we zero tunnel/xfrm
state that hasn't been filled, to allow context access wrt pt_regs in
32 bit archs for tracing, and last but not least various test cases
for fixes that landed in bpf earlier, from Daniel.
5) Get rid of the ndo_xdp_flush API and extend the ndo_xdp_xmit with
a XDP_XMIT_FLUSH flag instead which allows to avoid one indirect
call as flushing is now merged directly into ndo_xdp_xmit(), from Jesper.
6) Add a new bpf_get_current_cgroup_id() helper that can be used in
tracing to retrieve the cgroup id from the current process in order
to allow for e.g. aggregation of container-level events, from Yonghong.
7) Two follow-up fixes for BTF to reject invalid input values and
related to that also two test cases for BPF kselftests, from Martin.
8) Various API improvements to the bpf_fib_lookup() helper, that is,
dropping MPLS bits which are not fully hashed out yet, rejecting
invalid helper flags, returning error for unsupported address
families as well as renaming flowlabel to flowinfo, from David.
9) Various fixes and improvements to sockmap BPF kselftests in particular
in proper error detection and data verification, from Prashant.
10) Two arm32 BPF JIT improvements. One is to fix imm range check with
regards to whether immediate fits into 24 bits, and a naming cleanup
to get functions related to rsh handling consistent to those handling
lsh, from Wang.
11) Two compile warning fixes in BPF, one for BTF and a false positive
to silent gcc in stack_map_get_build_id_offset(), from Arnd.
12) Add missing seg6.h header into tools include infrastructure in order
to fix compilation of BPF kselftests, from Mathieu.
13) Several formatting cleanups in the BPF UAPI helper description that
also fix an error during rst2man compilation, from Quentin.
14) Hide an unused variable in sk_msg_convert_ctx_access() when IPv6 is
not built into the kernel, from Yue.
15) Remove a useless double assignment in dev_map_enqueue(), from Colin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'samples')
-rw-r--r-- | samples/bpf/xdp_fwd_kern.c | 2 | ||||
-rw-r--r-- | samples/bpf/xdpsock_user.c | 97 |
2 files changed, 47 insertions, 52 deletions
diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c index 4a6be0f87505..6673cdb9f55c 100644 --- a/samples/bpf/xdp_fwd_kern.c +++ b/samples/bpf/xdp_fwd_kern.c @@ -88,7 +88,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags) return XDP_PASS; fib_params.family = AF_INET6; - fib_params.flowlabel = *(__be32 *)ip6h & IPV6_FLOWINFO_MASK; + fib_params.flowinfo = *(__be32 *)ip6h & IPV6_FLOWINFO_MASK; fib_params.l4_protocol = ip6h->nexthdr; fib_params.sport = 0; fib_params.dport = 0; diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c index e379eac034ac..d69c8d78d3fd 100644 --- a/samples/bpf/xdpsock_user.c +++ b/samples/bpf/xdpsock_user.c @@ -46,6 +46,7 @@ #define NUM_FRAMES 131072 #define FRAME_HEADROOM 0 +#define FRAME_SHIFT 11 #define FRAME_SIZE 2048 #define NUM_DESCS 1024 #define BATCH_SIZE 16 @@ -55,6 +56,7 @@ #define DEBUG_HEXDUMP 0 +typedef __u64 u64; typedef __u32 u32; static unsigned long prev_time; @@ -73,6 +75,7 @@ static int opt_queue; static int opt_poll; static int opt_shared_packet_buffer; static int opt_interval = 1; +static u32 opt_xdp_bind_flags; struct xdp_umem_uqueue { u32 cached_prod; @@ -81,12 +84,12 @@ struct xdp_umem_uqueue { u32 size; u32 *producer; u32 *consumer; - u32 *ring; + u64 *ring; void *map; }; struct xdp_umem { - char (*frames)[FRAME_SIZE]; + char *frames; struct xdp_umem_uqueue fq; struct xdp_umem_uqueue cq; int fd; @@ -155,15 +158,15 @@ static const char pkt_data[] = static inline u32 umem_nb_free(struct xdp_umem_uqueue *q, u32 nb) { - u32 free_entries = q->size - (q->cached_prod - q->cached_cons); + u32 free_entries = q->cached_cons - q->cached_prod; if (free_entries >= nb) return free_entries; /* Refresh the local tail pointer */ - q->cached_cons = *q->consumer; + q->cached_cons = *q->consumer + q->size; - return q->size - (q->cached_prod - q->cached_cons); + return q->cached_cons - q->cached_prod; } static inline u32 xq_nb_free(struct xdp_uqueue *q, u32 ndescs) @@ -214,7 +217,7 @@ static inline int umem_fill_to_kernel_ex(struct xdp_umem_uqueue *fq, for (i = 0; i < nb; i++) { u32 idx = fq->cached_prod++ & fq->mask; - fq->ring[idx] = d[i].idx; + fq->ring[idx] = d[i].addr; } u_smp_wmb(); @@ -224,7 +227,7 @@ static inline int umem_fill_to_kernel_ex(struct xdp_umem_uqueue *fq, return 0; } -static inline int umem_fill_to_kernel(struct xdp_umem_uqueue *fq, u32 *d, +static inline int umem_fill_to_kernel(struct xdp_umem_uqueue *fq, u64 *d, size_t nb) { u32 i; @@ -246,7 +249,7 @@ static inline int umem_fill_to_kernel(struct xdp_umem_uqueue *fq, u32 *d, } static inline size_t umem_complete_from_kernel(struct xdp_umem_uqueue *cq, - u32 *d, size_t nb) + u64 *d, size_t nb) { u32 idx, i, entries = umem_nb_avail(cq, nb); @@ -266,10 +269,9 @@ static inline size_t umem_complete_from_kernel(struct xdp_umem_uqueue *cq, return entries; } -static inline void *xq_get_data(struct xdpsock *xsk, __u32 idx, __u32 off) +static inline void *xq_get_data(struct xdpsock *xsk, u64 addr) { - lassert(idx < NUM_FRAMES); - return &xsk->umem->frames[idx][off]; + return &xsk->umem->frames[addr]; } static inline int xq_enq(struct xdp_uqueue *uq, @@ -285,9 +287,8 @@ static inline int xq_enq(struct xdp_uqueue *uq, for (i = 0; i < ndescs; i++) { u32 idx = uq->cached_prod++ & uq->mask; - r[idx].idx = descs[i].idx; + r[idx].addr = descs[i].addr; r[idx].len = descs[i].len; - r[idx].offset = descs[i].offset; } u_smp_wmb(); @@ -297,7 +298,7 @@ static inline int xq_enq(struct xdp_uqueue *uq, } static inline int xq_enq_tx_only(struct xdp_uqueue *uq, - __u32 idx, unsigned int ndescs) + unsigned int id, unsigned int ndescs) { struct xdp_desc *r = uq->ring; unsigned int i; @@ -308,9 +309,8 @@ static inline int xq_enq_tx_only(struct xdp_uqueue *uq, for (i = 0; i < ndescs; i++) { u32 idx = uq->cached_prod++ & uq->mask; - r[idx].idx = idx + i; + r[idx].addr = (id + i) << FRAME_SHIFT; r[idx].len = sizeof(pkt_data) - 1; - r[idx].offset = 0; } u_smp_wmb(); @@ -357,17 +357,21 @@ static void swap_mac_addresses(void *data) *dst_addr = tmp; } -#if DEBUG_HEXDUMP -static void hex_dump(void *pkt, size_t length, const char *prefix) +static void hex_dump(void *pkt, size_t length, u64 addr) { - int i = 0; const unsigned char *address = (unsigned char *)pkt; const unsigned char *line = address; size_t line_size = 32; unsigned char c; + char buf[32]; + int i = 0; + + if (!DEBUG_HEXDUMP) + return; + sprintf(buf, "addr=%llu", addr); printf("length = %zu\n", length); - printf("%s | ", prefix); + printf("%s | ", buf); while (length-- > 0) { printf("%02X ", *address++); if (!(++i % line_size) || (length == 0 && i % line_size)) { @@ -382,12 +386,11 @@ static void hex_dump(void *pkt, size_t length, const char *prefix) } printf("\n"); if (length > 0) - printf("%s | ", prefix); + printf("%s | ", buf); } } printf("\n"); } -#endif static size_t gen_eth_frame(char *frame) { @@ -412,8 +415,8 @@ static struct xdp_umem *xdp_umem_configure(int sfd) mr.addr = (__u64)bufs; mr.len = NUM_FRAMES * FRAME_SIZE; - mr.frame_size = FRAME_SIZE; - mr.frame_headroom = FRAME_HEADROOM; + mr.chunk_size = FRAME_SIZE; + mr.headroom = FRAME_HEADROOM; lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr)) == 0); lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_FILL_RING, &fq_size, @@ -426,7 +429,7 @@ static struct xdp_umem *xdp_umem_configure(int sfd) &optlen) == 0); umem->fq.map = mmap(0, off.fr.desc + - FQ_NUM_DESCS * sizeof(u32), + FQ_NUM_DESCS * sizeof(u64), PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, sfd, XDP_UMEM_PGOFF_FILL_RING); @@ -437,9 +440,10 @@ static struct xdp_umem *xdp_umem_configure(int sfd) umem->fq.producer = umem->fq.map + off.fr.producer; umem->fq.consumer = umem->fq.map + off.fr.consumer; umem->fq.ring = umem->fq.map + off.fr.desc; + umem->fq.cached_cons = FQ_NUM_DESCS; umem->cq.map = mmap(0, off.cr.desc + - CQ_NUM_DESCS * sizeof(u32), + CQ_NUM_DESCS * sizeof(u64), PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, sfd, XDP_UMEM_PGOFF_COMPLETION_RING); @@ -451,14 +455,14 @@ static struct xdp_umem *xdp_umem_configure(int sfd) umem->cq.consumer = umem->cq.map + off.cr.consumer; umem->cq.ring = umem->cq.map + off.cr.desc; - umem->frames = (char (*)[FRAME_SIZE])bufs; + umem->frames = bufs; umem->fd = sfd; if (opt_bench == BENCH_TXONLY) { int i; - for (i = 0; i < NUM_FRAMES; i++) - (void)gen_eth_frame(&umem->frames[i][0]); + for (i = 0; i < NUM_FRAMES * FRAME_SIZE; i += FRAME_SIZE) + (void)gen_eth_frame(&umem->frames[i]); } return umem; @@ -472,7 +476,7 @@ static struct xdpsock *xsk_configure(struct xdp_umem *umem) struct xdpsock *xsk; bool shared = true; socklen_t optlen; - u32 i; + u64 i; sfd = socket(PF_XDP, SOCK_RAW, 0); lassert(sfd >= 0); @@ -508,7 +512,7 @@ static struct xdpsock *xsk_configure(struct xdp_umem *umem) lassert(xsk->rx.map != MAP_FAILED); if (!shared) { - for (i = 0; i < NUM_DESCS / 2; i++) + for (i = 0; i < NUM_DESCS * FRAME_SIZE; i += FRAME_SIZE) lassert(umem_fill_to_kernel(&xsk->umem->fq, &i, 1) == 0); } @@ -533,13 +537,17 @@ static struct xdpsock *xsk_configure(struct xdp_umem *umem) xsk->tx.producer = xsk->tx.map + off.tx.producer; xsk->tx.consumer = xsk->tx.map + off.tx.consumer; xsk->tx.ring = xsk->tx.map + off.tx.desc; + xsk->tx.cached_cons = NUM_DESCS; sxdp.sxdp_family = PF_XDP; sxdp.sxdp_ifindex = opt_ifindex; sxdp.sxdp_queue_id = opt_queue; + if (shared) { sxdp.sxdp_flags = XDP_SHARED_UMEM; sxdp.sxdp_shared_umem_fd = umem->fd; + } else { + sxdp.sxdp_flags = opt_xdp_bind_flags; } lassert(bind(sfd, (struct sockaddr *)&sxdp, sizeof(sxdp)) == 0); @@ -695,6 +703,7 @@ static void parse_command_line(int argc, char **argv) break; case 'S': opt_xdp_flags |= XDP_FLAGS_SKB_MODE; + opt_xdp_bind_flags |= XDP_COPY; break; case 'N': opt_xdp_flags |= XDP_FLAGS_DRV_MODE; @@ -727,7 +736,7 @@ static void kick_tx(int fd) static inline void complete_tx_l2fwd(struct xdpsock *xsk) { - u32 descs[BATCH_SIZE]; + u64 descs[BATCH_SIZE]; unsigned int rcvd; size_t ndescs; @@ -749,7 +758,7 @@ static inline void complete_tx_l2fwd(struct xdpsock *xsk) static inline void complete_tx_only(struct xdpsock *xsk) { - u32 descs[BATCH_SIZE]; + u64 descs[BATCH_SIZE]; unsigned int rcvd; if (!xsk->outstanding_tx) @@ -774,17 +783,9 @@ static void rx_drop(struct xdpsock *xsk) return; for (i = 0; i < rcvd; i++) { - u32 idx = descs[i].idx; + char *pkt = xq_get_data(xsk, descs[i].addr); - lassert(idx < NUM_FRAMES); -#if DEBUG_HEXDUMP - char *pkt; - char buf[32]; - - pkt = xq_get_data(xsk, idx, descs[i].offset); - sprintf(buf, "idx=%d", idx); - hex_dump(pkt, descs[i].len, buf); -#endif + hex_dump(pkt, descs[i].len, descs[i].addr); } xsk->rx_npkts += rcvd; @@ -867,17 +868,11 @@ static void l2fwd(struct xdpsock *xsk) } for (i = 0; i < rcvd; i++) { - char *pkt = xq_get_data(xsk, descs[i].idx, - descs[i].offset); + char *pkt = xq_get_data(xsk, descs[i].addr); swap_mac_addresses(pkt); -#if DEBUG_HEXDUMP - char buf[32]; - u32 idx = descs[i].idx; - sprintf(buf, "idx=%d", idx); - hex_dump(pkt, descs[i].len, buf); -#endif + hex_dump(pkt, descs[i].len, descs[i].addr); } xsk->rx_npkts += rcvd; |