summaryrefslogtreecommitdiff
path: root/samples/bpf/bpf_helpers.h
AgeCommit message (Collapse)Author
2016-04-06samples/bpf: Enable powerpc supportNaveen N. Rao
Add the necessary definitions for building bpf samples on ppc. Since ppc doesn't store function return address on the stack, modify how PT_REGS_RET() and PT_REGS_FP() work. Also, introduce PT_REGS_IP() to access the instruction pointer. Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08samples/bpf: add map_flags to bpf loaderAlexei Starovoitov
note old loader is compatible with new kernel. map_flags are optional Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-20samples/bpf: offwaketime exampleAlexei Starovoitov
This is simplified version of Brendan Gregg's offwaketime: This program shows kernel stack traces and task names that were blocked and "off-CPU", along with the stack traces and task names for the threads that woke them, and the total elapsed time from when they blocked to when they were woken up. The combined stacks, task names, and total time is summarized in kernel context for efficiency. Example: $ sudo ./offwaketime | flamegraph.pl > demo.svg Open demo.svg in the browser as FlameGraph visualization. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2015-10-28bpf: sample: define aarch64 specific registersYang Shi
Define aarch64 specific registers for building bpf samples correctly. Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22samples: bpf: add bpf_perf_event_output exampleAlexei Starovoitov
Performance test and example of bpf_perf_event_output(). kprobe is attached to sys_write() and trivial bpf program streams pid+cookie into userspace via PERF_COUNT_SW_BPF_OUTPUT event. Usage: $ sudo ./bld_x64/samples/bpf/trace_output recv 2968913 events per sec Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-18bpf: add bpf_redirect() helperAlexei Starovoitov
Existing bpf_clone_redirect() helper clones skb before redirecting it to RX or TX of destination netdev. Introduce bpf_redirect() helper that does that without cloning. Benchmarked with two hosts using 10G ixgbe NICs. One host is doing line rate pktgen. Another host is configured as: $ tc qdisc add dev $dev ingress $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \ action bpf run object-file tcbpf1_kern.o section clone_redirect_xmit drop so it receives the packet on $dev and immediately xmits it on $dev + 1 The section 'clone_redirect_xmit' in tcbpf1_kern.o file has the program that does bpf_clone_redirect() and performance is 2.0 Mpps $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \ action bpf run object-file tcbpf1_kern.o section redirect_xmit drop which is using bpf_redirect() - 2.4 Mpps and using cls_bpf with integrated actions as: $ tc filter add dev $dev root pref 10 \ bpf run object-file tcbpf1_kern.o section redirect_xmit integ_act classid 1 performance is 2.5 Mpps To summarize: u32+act_bpf using clone_redirect - 2.0 Mpps u32+act_bpf using redirect - 2.4 Mpps cls_bpf using redirect - 2.5 Mpps For comparison linux bridge in this setup is doing 2.1 Mpps and ixgbe rx + drop in ip_rcv - 7.8 Mpps Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10samples/bpf: example of get selected PMU counter valueKaixu Xia
This is a simple example and shows how to use the new ability to get the selected Hardware PMU counter value. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08samples: bpf: enable trace samples for s390xMichael Holzheu
The trace bpf samples do not compile on s390x because they use x86 specific fields from the "pt_regs" structure. Fix this and access the fields via new PT_REGS macros. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15bpf: introduce current->pid, tgid, uid, gid, comm accessorsAlexei Starovoitov
eBPF programs attached to kprobes need to filter based on current->pid, uid and other fields, so introduce helper functions: u64 bpf_get_current_pid_tgid(void) Return: current->tgid << 32 | current->pid u64 bpf_get_current_uid_gid(void) Return: current_gid << 32 | current_uid bpf_get_current_comm(char *buf, int size_of_buf) stores current->comm into buf They can be used from the programs attached to TC as well to classify packets based on current task fields. Update tracex2 example to print histogram of write syscalls for each process instead of aggregated for all. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21samples/bpf: bpf_tail_call example for networkingAlexei Starovoitov
Usage: $ sudo ./sockex3 IP src.port -> dst.port bytes packets 127.0.0.1.42010 -> 127.0.0.1.12865 1568 8 127.0.0.1.59526 -> 127.0.0.1.33778 11422636 173070 127.0.0.1.33778 -> 127.0.0.1.59526 11260224828 341974 127.0.0.1.12865 -> 127.0.0.1.42010 1832 12 IP src.port -> dst.port bytes packets 127.0.0.1.42010 -> 127.0.0.1.12865 1568 8 127.0.0.1.59526 -> 127.0.0.1.33778 23198092 351486 127.0.0.1.33778 -> 127.0.0.1.59526 22972698518 698616 127.0.0.1.12865 -> 127.0.0.1.42010 1832 12 this example is similar to sockex2 in a way that it accumulates per-flow statistics, but it does packet parsing differently. sockex2 inlines full packet parser routine into single bpf program. This sockex3 example have 4 independent programs that parse vlan, mpls, ip, ipv6 and one main program that starts the process. bpf_tail_call() mechanism allows each program to be small and be called on demand potentially multiple times, so that many vlan, mpls, ip in ip, gre encapsulations can be parsed. These and other protocol parsers can be added or removed at runtime. TLVs can be parsed in similar manner. Note, tail_call_cnt dynamic check limits the number of tail calls to 32. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21samples/bpf: bpf_tail_call example for tracingAlexei Starovoitov
kprobe example that demonstrates how future seccomp programs may look like. It attaches to seccomp_phase1() function and tail-calls other BPF programs depending on syscall number. Existing optimized classic BPF seccomp programs generated by Chrome look like: if (sd.nr < 121) { if (sd.nr < 57) { if (sd.nr < 22) { if (sd.nr < 7) { if (sd.nr < 4) { if (sd.nr < 1) { check sys_read } else { if (sd.nr < 3) { check sys_write and sys_open } else { check sys_close } } } else { } else { } else { } else { } else { } the future seccomp using native eBPF may look like: bpf_tail_call(&sd, &syscall_jmp_table, sd.nr); which is simpler, faster and leaves more room for per-syscall checks. Usage: $ sudo ./tracex5 <...>-366 [001] d... 4.870033: : read(fd=1, buf=00007f6d5bebf000, size=771) <...>-369 [003] d... 4.870066: : mmap <...>-369 [003] d... 4.870077: : syscall=110 (one of get/set uid/pid/gid) <...>-369 [003] d... 4.870089: : syscall=107 (one of get/set uid/pid/gid) sh-369 [000] d... 4.891740: : read(fd=0, buf=00000000023d1000, size=512) sh-369 [000] d... 4.891747: : write(fd=1, buf=00000000023d3000, size=512) sh-369 [000] d... 4.891747: : read(fd=1, buf=00000000023d3000, size=512) Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Add BQL support to via-rhine, from Tino Reichardt. 2) Integrate SWITCHDEV layer support into the DSA layer, so DSA drivers can support hw switch offloading. From Floria Fainelli. 3) Allow 'ip address' commands to initiate multicast group join/leave, from Madhu Challa. 4) Many ipv4 FIB lookup optimizations from Alexander Duyck. 5) Support EBPF in cls_bpf classifier and act_bpf action, from Daniel Borkmann. 6) Remove the ugly compat support in ARP for ugly layers like ax25, rose, etc. And use this to clean up the neigh layer, then use it to implement MPLS support. All from Eric Biederman. 7) Support L3 forwarding offloading in switches, from Scott Feldman. 8) Collapse the LOCAL and MAIN ipv4 FIB tables when possible, to speed up route lookups even further. From Alexander Duyck. 9) Many improvements and bug fixes to the rhashtable implementation, from Herbert Xu and Thomas Graf. In particular, in the case where an rhashtable user bulk adds a large number of items into an empty table, we expand the table much more sanely. 10) Don't make the tcp_metrics hash table per-namespace, from Eric Biederman. 11) Extend EBPF to access SKB fields, from Alexei Starovoitov. 12) Split out new connection request sockets so that they can be established in the main hash table. Much less false sharing since hash lookups go direct to the request sockets instead of having to go first to the listener then to the request socks hashed underneath. From Eric Dumazet. 13) Add async I/O support for crytpo AF_ALG sockets, from Tadeusz Struk. 14) Support stable privacy address generation for RFC7217 in IPV6. From Hannes Frederic Sowa. 15) Hash network namespace into IP frag IDs, also from Hannes Frederic Sowa. 16) Convert PTP get/set methods to use 64-bit time, from Richard Cochran. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1816 commits) fm10k: Bump driver version to 0.15.2 fm10k: corrected VF multicast update fm10k: mbx_update_max_size does not drop all oversized messages fm10k: reset head instead of calling update_max_size fm10k: renamed mbx_tx_dropped to mbx_tx_oversized fm10k: update xcast mode before synchronizing multicast addresses fm10k: start service timer on probe fm10k: fix function header comment fm10k: comment next_vf_mbx flow fm10k: don't handle mailbox events in iov_event path and always process mailbox fm10k: use separate workqueue for fm10k driver fm10k: Set PF queues to unlimited bandwidth during virtualization fm10k: expose tx_timeout_count as an ethtool stat fm10k: only increment tx_timeout_count in Tx hang path fm10k: remove extraneous "Reset interface" message fm10k: separate PF only stats so that VF does not display them fm10k: use hw->mac.max_queues for stats fm10k: only show actual queues, not the maximum in hardware fm10k: allow creation of VLAN on default vid fm10k: fix unused warnings ...
2015-04-06tc: bpf: add checksum helpersAlexei Starovoitov
Commit 608cd71a9c7c ("tc: bpf: generalize pedit action") has added the possibility to mangle packet data to BPF programs in the tc pipeline. This patch adds two helpers bpf_l3_csum_replace() and bpf_l4_csum_replace() for fixing up the protocol checksums after the packet mangling. It also adds 'flags' argument to bpf_skb_store_bytes() helper to avoid unnecessary checksum recomputations when BPF programs adjusting l3/l4 checksums and documents all three helpers in uapi header. Moreover, a sample program is added to show how BPF programs can make use of the mangle and csum helpers. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02samples/bpf: Add simple non-portable kprobe filter exampleAlexei Starovoitov
tracex1_kern.c - C program compiled into BPF. It attaches to kprobe:netif_receive_skb() When skb->dev->name == "lo", it prints sample debug message into trace_pipe via bpf_trace_printk() helper function. tracex1_user.c - corresponding user space component that: - loads BPF program via bpf() syscall - opens kprobes:netif_receive_skb event via perf_event_open() syscall - attaches the program to event via ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd); - prints from trace_pipe Note, this BPF program is non-portable. It must be recompiled with current kernel headers. kprobe is not a stable ABI and BPF+kprobe scripts may no longer be meaningful when kernel internals change. No matter in what way the kernel changes, neither the kprobe, nor the BPF program can ever crash or corrupt the kernel, assuming the kprobes, perf and BPF subsystem has no bugs. The verifier will detect that the program is using bpf_trace_printk() and the kernel will print 'this is a DEBUG kernel' warning banner, which means that bpf_trace_printk() should be used for debugging of the BPF program only. Usage: $ sudo tracex1 ping-19826 [000] d.s2 63103.382648: : skb ffff880466b1ca00 len 84 ping-19826 [000] d.s2 63103.382684: : skb ffff880466b1d300 len 84 ping-19826 [000] d.s2 63104.382533: : skb ffff880466b1ca00 len 84 ping-19826 [000] d.s2 63104.382594: : skb ffff880466b1d300 len 84 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1427312966-8434-7-git-send-email-ast@plumgrid.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-06samples: bpf: elf_bpf file loaderAlexei Starovoitov
simple .o parser and loader using BPF syscall. .o is a standard ELF generated by LLVM backend It parses elf file compiled by llvm .c->.o - parses 'maps' section and creates maps via BPF syscall - parses 'license' section and passes it to syscall - parses elf relocations for BPF maps and adjusts BPF_LD_IMM64 insns by storing map_fd into insn->imm and marking such insns as BPF_PSEUDO_MAP_FD - loads eBPF programs via BPF syscall One ELF file can contain multiple BPF programs. int load_bpf_file(char *path); populates prog_fd[] and map_fd[] with FDs received from bpf syscall bpf_helpers.h - helper functions available to eBPF programs written in C Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>