summaryrefslogtreecommitdiff
path: root/net/openvswitch
AgeCommit message (Collapse)Author
2017-12-12Merge Linaro linux 4.9.62 into linux-4.9Xie Xiaobo
Signed-off-by: Xiaobo Xie <xiaobo.xie@nxp.com>
2017-08-30openvswitch: fix skb_panic due to the incorrect actions attrlenLiping Zhang
[ Upstream commit 494bea39f3201776cdfddc232705f54a0bd210c4 ] For sw_flow_actions, the actions_len only represents the kernel part's size, and when we dump the actions to the userspace, we will do the convertions, so it's true size may become bigger than the actions_len. But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len to alloc the skbuff, so the user_skb's size may become insufficient and oops will happen like this: skbuff: skb_over_panic: text:ffffffff8148fabf len:1749 put:157 head: ffff881300f39000 data:ffff881300f39000 tail:0x6d5 end:0x6c0 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:129! [...] Call Trace: <IRQ> [<ffffffff8148be82>] skb_put+0x43/0x44 [<ffffffff8148fabf>] skb_zerocopy+0x6c/0x1f4 [<ffffffffa0290d36>] queue_userspace_packet+0x3a3/0x448 [openvswitch] [<ffffffffa0292023>] ovs_dp_upcall+0x30/0x5c [openvswitch] [<ffffffffa028d435>] output_userspace+0x132/0x158 [openvswitch] [<ffffffffa01e6890>] ? ip6_rcv_finish+0x74/0x77 [ipv6] [<ffffffffa028e277>] do_execute_actions+0xcc1/0xdc8 [openvswitch] [<ffffffffa028e3f2>] ovs_execute_actions+0x74/0x106 [openvswitch] [<ffffffffa0292130>] ovs_dp_process_packet+0xe1/0xfd [openvswitch] [<ffffffffa0292b77>] ? key_extract+0x63c/0x8d5 [openvswitch] [<ffffffffa029848b>] ovs_vport_receive+0xa1/0xc3 [openvswitch] [...] Also we can find that the actions_len is much little than the orig_len: crash> struct sw_flow_actions 0xffff8812f539d000 struct sw_flow_actions { rcu = { next = 0xffff8812f5398800, func = 0xffffe3b00035db32 }, orig_len = 1384, actions_len = 592, actions = 0xffff8812f539d01c } So as a quick fix, use the orig_len instead of the actions_len to alloc the user_skb. Last, this oops happened on our system running a relative old kernel, but the same risk still exists on the mainline, since we use the wrong actions_len from the beginning. Fixes: ccea74457bbd ("openvswitch: include datapath actions with sampled-packet upcall to userspace") Cc: Neil McKee <neil.mckee@inmon.com> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11openvswitch: fix potential out of bound access in parse_ctLiping Zhang
[ Upstream commit 69ec932e364b1ba9c3a2085fe96b76c8a3f71e7c ] Before the 'type' is validated, we shouldn't use it to fetch the ovs_ct_attr_lens's minlen and maxlen, else, out of bound access may happen. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-14net: make ndo_get_stats64 a void functionstephen hemminger
The network device operation for reading statistics is only called in one place, and it ignores the return value. Having a structure return value is potentially confusing because some future driver could incorrectly assume that the return value was used. Fix all drivers with ndo_get_stats64 to have a void function. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-30openvswitch: Add missing case OVS_TUNNEL_KEY_ATTR_PADKris Murphy
[ Upstream commit 8f3dbfd79ed9ef9770305a7cc4e13dfd31ad2cd0 ] Added a case for OVS_TUNNEL_KEY_ATTR_PAD to the switch statement in ip_tun_from_nlattr in order to prevent the default case returning an error. Fixes: b46f6ded906e ("libnl: nla_put_be64(): align on a 64-bit area") Signed-off-by: Kris Murphy <kriskend@linux.vnet.ibm.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30net/openvswitch: Set the ipv6 source tunnel key address attribute correctlyOr Gerlitz
[ Upstream commit 3d20f1f7bd575d147ffa75621fa560eea0aec690 ] When dealing with ipv6 source tunnel key address attribute (OVS_TUNNEL_KEY_ATTR_IPV6_SRC) we are wrongly setting the tunnel dst ip, fix that. Fixes: 6b26ba3a7d95 ('openvswitch: netlink attributes for IPv6 tunneling') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Paul Blakey <paulb@mellanox.com> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22ipv6: orphan skbs in reassembly unitEric Dumazet
[ Upstream commit 48cac18ecf1de82f76259a54402c3adb7839ad01 ] Andrey reported a use-after-free in IPv6 stack. Issue here is that we free the socket while it still has skb in TX path and in some queues. It happens here because IPv6 reassembly unit messes skb->truesize, breaking skb_set_owner_w() badly. We fixed a similar issue for IPV4 in commit 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()") Acked-by: Joe Stringer <joe@ovn.org> ================================================================== BUG: KASAN: use-after-free in sock_wfree+0x118/0x120 Read of size 8 at addr ffff880062da0060 by task a.out/4140 page:ffffea00018b6800 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0x100000000008100(slab|head) raw: 0100000000008100 0000000000000000 0000000000000000 0000000180130013 raw: dead000000000100 dead000000000200 ffff88006741f140 0000000000000000 page dumped because: kasan: bad access detected CPU: 0 PID: 4140 Comm: a.out Not tainted 4.10.0-rc3+ #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 dump_stack+0x292/0x398 lib/dump_stack.c:51 describe_address mm/kasan/report.c:262 kasan_report_error+0x121/0x560 mm/kasan/report.c:370 kasan_report mm/kasan/report.c:392 __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:413 sock_flag ./arch/x86/include/asm/bitops.h:324 sock_wfree+0x118/0x120 net/core/sock.c:1631 skb_release_head_state+0xfc/0x250 net/core/skbuff.c:655 skb_release_all+0x15/0x60 net/core/skbuff.c:668 __kfree_skb+0x15/0x20 net/core/skbuff.c:684 kfree_skb+0x16e/0x4e0 net/core/skbuff.c:705 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304 inet_frag_put ./include/net/inet_frag.h:133 nf_ct_frag6_gather+0x1125/0x38b0 net/ipv6/netfilter/nf_conntrack_reasm.c:617 ipv6_defrag+0x21b/0x350 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68 nf_hook_entry_hookfn ./include/linux/netfilter.h:102 nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310 nf_hook ./include/linux/netfilter.h:212 __ip6_local_out+0x52c/0xaf0 net/ipv6/output_core.c:160 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742 rawv6_push_pending_frames net/ipv6/raw.c:613 rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744 sock_sendmsg_nosec net/socket.c:635 sock_sendmsg+0xca/0x110 net/socket.c:645 sock_write_iter+0x326/0x620 net/socket.c:848 new_sync_write fs/read_write.c:499 __vfs_write+0x483/0x760 fs/read_write.c:512 vfs_write+0x187/0x530 fs/read_write.c:560 SYSC_write fs/read_write.c:607 SyS_write+0xfb/0x230 fs/read_write.c:599 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203 RIP: 0033:0x7ff26e6f5b79 RSP: 002b:00007ff268e0ed98 EFLAGS: 00000206 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007ff268e0f9c0 RCX: 00007ff26e6f5b79 RDX: 0000000000000010 RSI: 0000000020f50fe1 RDI: 0000000000000003 RBP: 00007ff26ebc1220 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 00007ff268e0f9c0 R14: 00007ff26efec040 R15: 0000000000000003 The buggy address belongs to the object at ffff880062da0000 which belongs to the cache RAWv6 of size 1504 The buggy address ffff880062da0060 is located 96 bytes inside of 1504-byte region [ffff880062da0000, ffff880062da05e0) Freed by task 4113: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0 mm/kasan/kasan.c:502 set_track mm/kasan/kasan.c:514 kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578 slab_free_hook mm/slub.c:1352 slab_free_freelist_hook mm/slub.c:1374 slab_free mm/slub.c:2951 kmem_cache_free+0xb2/0x2c0 mm/slub.c:2973 sk_prot_free net/core/sock.c:1377 __sk_destruct+0x49c/0x6e0 net/core/sock.c:1452 sk_destruct+0x47/0x80 net/core/sock.c:1460 __sk_free+0x57/0x230 net/core/sock.c:1468 sk_free+0x23/0x30 net/core/sock.c:1479 sock_put ./include/net/sock.h:1638 sk_common_release+0x31e/0x4e0 net/core/sock.c:2782 rawv6_close+0x54/0x80 net/ipv6/raw.c:1214 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:431 sock_release+0x8d/0x1e0 net/socket.c:599 sock_close+0x16/0x20 net/socket.c:1063 __fput+0x332/0x7f0 fs/file_table.c:208 ____fput+0x15/0x20 fs/file_table.c:244 task_work_run+0x19b/0x270 kernel/task_work.c:116 exit_task_work ./include/linux/task_work.h:21 do_exit+0x186b/0x2800 kernel/exit.c:839 do_group_exit+0x149/0x420 kernel/exit.c:943 SYSC_exit_group kernel/exit.c:954 SyS_exit_group+0x1d/0x20 kernel/exit.c:952 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203 Allocated by task 4115: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0 mm/kasan/kasan.c:502 set_track mm/kasan/kasan.c:514 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544 slab_post_alloc_hook mm/slab.h:432 slab_alloc_node mm/slub.c:2708 slab_alloc mm/slub.c:2716 kmem_cache_alloc+0x1af/0x250 mm/slub.c:2721 sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1334 sk_alloc+0x105/0x1010 net/core/sock.c:1396 inet6_create+0x44d/0x1150 net/ipv6/af_inet6.c:183 __sock_create+0x4f6/0x880 net/socket.c:1199 sock_create net/socket.c:1239 SYSC_socket net/socket.c:1269 SyS_socket+0xf9/0x230 net/socket.c:1249 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203 Memory state around the buggy address: ffff880062d9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff880062d9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff880062da0000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff880062da0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff880062da0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-04openvswitch: maintain correct checksum state in conntrack actionsLance Richardson
[ Upstream commit 75f01a4c9cc291ff5cb28ca1216adb163b7a20ee ] When executing conntrack actions on skbuffs with checksum mode CHECKSUM_COMPLETE, the checksum must be updated to account for header pushes and pulls. Otherwise we get "hw csum failure" logs similar to this (ICMP packet received on geneve tunnel via ixgbe NIC): [ 405.740065] genev_sys_6081: hw csum failure [ 405.740106] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G I 4.10.0-rc3+ #1 [ 405.740108] Call Trace: [ 405.740110] <IRQ> [ 405.740113] dump_stack+0x63/0x87 [ 405.740116] netdev_rx_csum_fault+0x3a/0x40 [ 405.740118] __skb_checksum_complete+0xcf/0xe0 [ 405.740120] nf_ip_checksum+0xc8/0xf0 [ 405.740124] icmp_error+0x1de/0x351 [nf_conntrack_ipv4] [ 405.740132] nf_conntrack_in+0xe1/0x550 [nf_conntrack] [ 405.740137] ? find_bucket.isra.2+0x62/0x70 [openvswitch] [ 405.740143] __ovs_ct_lookup+0x95/0x980 [openvswitch] [ 405.740145] ? netif_rx_internal+0x44/0x110 [ 405.740149] ovs_ct_execute+0x147/0x4b0 [openvswitch] [ 405.740153] do_execute_actions+0x22e/0xa70 [openvswitch] [ 405.740157] ovs_execute_actions+0x40/0x120 [openvswitch] [ 405.740161] ovs_dp_process_packet+0x84/0x120 [openvswitch] [ 405.740166] ovs_vport_receive+0x73/0xd0 [openvswitch] [ 405.740168] ? udp_rcv+0x1a/0x20 [ 405.740170] ? ip_local_deliver_finish+0x93/0x1e0 [ 405.740172] ? ip_local_deliver+0x6f/0xe0 [ 405.740174] ? ip_rcv_finish+0x3a0/0x3a0 [ 405.740176] ? ip_rcv_finish+0xdb/0x3a0 [ 405.740177] ? ip_rcv+0x2a7/0x400 [ 405.740180] ? __netif_receive_skb_core+0x970/0xa00 [ 405.740185] netdev_frame_hook+0xd3/0x160 [openvswitch] [ 405.740187] __netif_receive_skb_core+0x1dc/0xa00 [ 405.740194] ? ixgbe_clean_rx_irq+0x46d/0xa20 [ixgbe] [ 405.740197] __netif_receive_skb+0x18/0x60 [ 405.740199] netif_receive_skb_internal+0x40/0xb0 [ 405.740201] napi_gro_receive+0xcd/0x120 [ 405.740204] gro_cell_poll+0x57/0x80 [geneve] [ 405.740206] net_rx_action+0x260/0x3c0 [ 405.740209] __do_softirq+0xc9/0x28c [ 405.740211] irq_exit+0xd9/0xf0 [ 405.740213] do_IRQ+0x51/0xd0 [ 405.740215] common_interrupt+0x93/0x93 Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Lance Richardson <lrichard@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-30openvswitch: Fix skb leak in IPv6 reassembly.Daniele Di Proietto
If nf_ct_frag6_gather() returns an error other than -EINPROGRESS, it means that we still have a reference to the skb. We should free it before returning from handle_fragments, as stated in the comment above. Fixes: daaa7d647f81 ("netfilter: ipv6: avoid nf_iterate recursion") CC: Florian Westphal <fw@strlen.de> CC: Pravin B Shelar <pshelar@ovn.org> CC: Joe Stringer <joe@ovn.org> Signed-off-by: Daniele Di Proietto <diproiettod@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal devJiri Benc
The internal device does support 802.1AD offloading since 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes"). Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Eric Garver <e@erig.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13openvswitch: fix vlan subtraction from packet lengthJiri Benc
When the packet has its vlan tag in skb->vlan_tci, the length of the VLAN header is not counted in skb->len. It doesn't make sense to subtract it. Fixes: 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes") Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Eric Garver <e@erig.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13openvswitch: vlan: remove wrong likely statementJiri Benc
This code is called whenever flow key is being extracted from the packet. The packet may be as likely vlan tagged as not. Fixes: 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes") Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Eric Garver <e@erig.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03openvswitch: use mpls_hdrJiri Benc
skb_mpls_header is equivalent to mpls_hdr now. Use the existing helper instead. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03openvswitch: mpls: set network header correctly on key extractJiri Benc
After the 48d2ab609b6b ("net: mpls: Fixups for GSO"), MPLS handling in openvswitch was changed to have network header pointing to the start of the MPLS headers and inner_network_header pointing after the MPLS headers. However, key_extract was missed by the mentioned commit, causing incorrect headers to be set when a MPLS packet just enters the bridge or after it is recirculated. Fixes: 48d2ab609b6b ("net: mpls: Fixups for GSO") Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21openvswitch: avoid resetting flow key while installing new flow.pravin shelar
since commit commit db74a3335e0f6 ("openvswitch: use percpu flow stats") flow alloc resets flow-key. So there is no need to reset the flow-key again if OVS is using newly allocated flow-key. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21openvswitch: Fix Frame-size larger than 1024 bytes warning.pravin shelar
There is no need to declare separate key on stack, we can just use sw_flow->key to store the key directly. This commit fixes following warning: net/openvswitch/datapath.c: In function ‘ovs_flow_cmd_new’: net/openvswitch/datapath.c:1080:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-19openvswitch: use percpu flow statsThadeu Lima de Souza Cascardo
Instead of using flow stats per NUMA node, use it per CPU. When using megaflows, the stats lock can be a bottleneck in scalability. On a E5-2690 12-core system, usual throughput went from ~4Mpps to ~15Mpps when forwarding between two 40GbE ports with a single flow configured on the datapath. This has been tested on a system with possible CPUs 0-7,16-23. After module removal, there were no corruption on the slab cache. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Cc: pravin shelar <pshelar@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-19openvswitch: fix flow stats accounting when node 0 is not possibleThadeu Lima de Souza Cascardo
On a system with only node 1 as possible, all statistics is going to be accounted on node 0 as it will have a single writer. However, when getting and clearing the statistics, node 0 is not going to be considered, as it's not a possible node. Tested that statistics are not zero on a system with only node 1 possible. Also compile-tested with CONFIG_NUMA off. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16openvswitch: avoid deferred execution of recirc actionsLance Richardson
The ovs kernel data path currently defers the execution of all recirc actions until stack utilization is at a minimum. This is too limiting for some packet forwarding scenarios due to the small size of the deferred action FIFO (10 entries). For example, broadcast traffic sent out more than 10 ports with recirculation results in packet drops when the deferred action FIFO becomes full, as reported here: http://openvswitch.org/pipermail/dev/2016-March/067672.html Since the current recursion depth is available (it is already tracked by the exec_actions_level pcpu variable), we can use it to determine whether to execute recirculation actions immediately (safe when recursion depth is low) or defer execution until more stack space is available. With this change, the deferred action fifo size becomes a non-issue for currently failing scenarios because it is no longer used when there are three or fewer recursions through ovs_execute_actions(). Suggested-by: Pravin Shelar <pshelar@ovn.org> Signed-off-by: Lance Richardson <lrichard@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-11openvswitch: use alias for genetlink family namesThadeu Lima de Souza Cascardo
When userspace tries to create datapaths and the module is not loaded, it will simply fail. With this patch, the module will be automatically loaded. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributesEric Garver
Add support for 802.1ad including the ability to push and pop double tagged vlans. Add support for 802.1ad to netlink parsing and flow conversion. Uses double nested encap attributes to represent double tagged vlan. Inner TPID encoded along with ctci in nested attributes. This is based on Thomas F Herbert's original v20 patch. I made some small clean ups and bug fixes. Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com> Signed-off-by: Eric Garver <e@erig.me> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-04openvswitch: Free tmpl with tmpl_free.Joe Stringer
When an error occurs during conntrack template creation as part of actions validation, we need to free the template. Previously we've been using nf_ct_put() to do this, but nf_ct_tmpl_free() is more appropriate. Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-31net: mpls: Fixups for GSODavid Ahern
As reported by Lennert the MPLS GSO code is failing to properly segment large packets. There are a couple of problems: 1. the inner protocol is not set so the gso segment functions for inner protocol layers are not getting run, and 2 MPLS labels for packets that use the "native" (non-OVS) MPLS code are not properly accounted for in mpls_gso_segment. The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment to call the gso segment functions for the higher layer protocols. That means skb_mac_gso_segment is called twice -- once with the network protocol set to MPLS and again with the network protocol set to the inner protocol. This patch sets the inner skb protocol addressing item 1 above and sets the network_header and inner_network_header to mark where the MPLS labels start and end. The MPLS code in OVS is also updated to set the two network markers. >From there the MPLS GSO code uses the difference between the network header and the inner network header to know the size of the MPLS header that was pushed. It then pulls the MPLS header, resets the mac_len and protocol for the inner protocol and then calls skb_mac_gso_segment to segment the skb. Afterward the inner protocol segmentation is done the skb protocol is set to mpls for each segment and the network and mac headers restored. Reported-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-11openvswitch: do not ignore netdev errors when creating tunnel vportsMartynas Pumputis
The creation of a tunnel vport (geneve, gre, vxlan) brings up a corresponding netdev, a multi-step operation which can fail. For example, changing a vxlan vport's netdev state to 'up' binds the vport's socket to a UDP port - if the binding fails (e.g. due to the port being in use), the error is currently ignored giving the appearance that the tunnel vport creation completed successfully. Signed-off-by: Martynas Pumputis <martynas@weave.works> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-06OVS: Ignore negative headroom valueIan Wienand
net_device->ndo_set_rx_headroom (introduced in 871b642adebe300be2e50aa5f65a418510f636ec) says "Setting a negtaive value reset the rx headroom to the default value". It seems that the OVS implementation in 3a927bc7cf9d0fbe8f4a8189dd5f8440228f64e7 overlooked this and sets dev->needed_headroom unconditionally. This doesn't have an immediate effect, but can mess up later LL_RESERVED_SPACE calculations, such as done in net/ipv6/mcast.c:mld_newpack. For reference, this issue was found from a skb_panic raised there after the length calculations had given the wrong result. Note the other current users of this interface (drivers/net/tun.c:tun_set_headroom and drivers/net/veth.c:veth_set_rx_headroom) are both checking this correctly thus need no modification. Thanks to Ben for some pointers from the crash dumps! Cc: Benjamin Poirier <bpoirier@suse.com> Cc: Paolo Abeni <pabeni@redhat.com> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1361414 Signed-off-by: Ian Wienand <iwienand@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-03openvswitch: Remove incorrect WARN_ONCE().Jarno Rajahalme
ovs_ct_find_existing() issues a warning if an existing conntrack entry classified as IP_CT_NEW is found, with the premise that this should not happen. However, a newly confirmed, non-expected conntrack entry remains IP_CT_NEW as long as no reply direction traffic is seen. This has resulted into somewhat confusing kernel log messages. This patch removes this check and warning. Fixes: 289f2253 ("openvswitch: Find existing conntrack entry after upcall.") Suggested-by: Joe Stringer <joe@ovn.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-22netfilter: conntrack: support a fixed size of 128 distinct labelsFlorian Westphal
The conntrack label extension is currently variable-sized, e.g. if only 2 labels are used by iptables rules then the labels->bits[] array will only contain one element. We track size of each label storage area in the 'words' member. But in nftables and openvswitch we always have to ask for worst-case since we don't know what bit will be used at configuration time. As most arches are 64bit we need to allocate 24 bytes in this case: struct nf_conn_labels { u8 words; /* 0 1 */ /* XXX 7 bytes hole, try to pack */ long unsigned bits[2]; /* 8 24 */ Make bits a fixed size and drop the words member, it simplifies the code and only increases memory requirements on x86 when less than 64bit labels are required. We still only allocate the extension if its needed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Several cases of overlapping changes, except the packet scheduler conflicts which deal with the addition of the free list parameter to qdisc_enqueue(). Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29openvswitch: fix conntrack netlink event deliverySamuel Gauthier
Only the first and last netlink message for a particular conntrack are actually sent. The first message is sent through nf_conntrack_confirm when the conntrack is committed. The last one is sent when the conntrack is destroyed on timeout. The other conntrack state change messages are not advertised. When the conntrack subsystem is used from netfilter, nf_conntrack_confirm is called for each packet, from the postrouting hook, which in turn calls nf_ct_deliver_cached_events to send the state change netlink messages. This commit fixes the problem by calling nf_ct_deliver_cached_events in the non-commit case as well. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") CC: Joe Stringer <joestringer@nicira.com> CC: Justin Pettit <jpettit@nicira.com> CC: Andy Zhou <azhou@nicira.com> CC: Thomas Graf <tgraf@suug.ch> Signed-off-by: Samuel Gauthier <samuel.gauthier@6wind.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-25openvswitch: Only set mark and labels with a commit flag.Jarno Rajahalme
Only set conntrack mark or labels when the commit flag is specified. This makes sure we can not set them before the connection has been persisted, as in that case the mark and labels would be lost in an event of an userspace upcall. OVS userspace already requires the commit flag to accept setting ct_mark and/or ct_labels. Validate for this in the kernel API. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-25openvswitch: Set mark and labels before confirming.Jarno Rajahalme
Set conntrack mark and labels right before committing so that the initial conntrack NEW event has the mark and labels. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-22openvswitch: Add packet len info to upcall.William Tu
The commit f2a4d086ed4c ("openvswitch: Add packet truncation support.") introduces packet truncation before sending to userspace upcall receiver. This patch passes up the skb->len before truncation so that the upcall receiver knows the original packet size. Potentially this will be used by sFlow, where OVS translates sFlow config header=N to a sample action, truncating packet to N byte in kernel datapath. Thus, only N bytes instead of full-packet size is copied from kernel to userspace, saving the kernel-to-userspace bandwidth. Signed-off-by: William Tu <u9012063@gmail.com> Cc: Pravin Shelar <pshelar@nicira.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11openvswitch: Add packet truncation support.William Tu
The patch adds a new OVS action, OVS_ACTION_ATTR_TRUNC, in order to truncate packets. A 'max_len' is added for setting up the maximum packet size, and a 'cutlen' field is to record the number of bytes to trim the packet when the packet is outputting to a port, or when the packet is sent to userspace. Signed-off-by: William Tu <u9012063@gmail.com> Cc: Pravin Shelar <pshelar@nicira.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-02ovs: set name assign type of internal portZhang Shengju
Set name_assign_type of internal port to NET_NAME_USER. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31openvswitch: update checksum in {push,pop}_mplsSimon Horman
In the case of CHECKSUM_COMPLETE the skb checksum should be updated in {push,pop}_mpls() as they the type in the ethernet header. As suggested by Pravin Shelar. Cc: Pravin Shelar <pshelar@nicira.com> Fixes: 25cd9ba0abc0 ("openvswitch: Add basic MPLS support to kernel") Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The nf_conntrack_core.c fix in 'net' is not relevant in 'net-next' because we no longer have a per-netns conntrack hash. The ip_gre.c conflict as well as the iwlwifi ones were cases of overlapping changes. Conflicts: drivers/net/wireless/intel/iwlwifi/mvm/tx.c net/ipv4/ip_gre.c net/netfilter/nf_conntrack_core.c Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-11openvswitch: Fix cached ct with helper.Joe Stringer
When using conntrack helpers from OVS, a common configuration is to perform a lookup without specifying a helper, then go through a firewalling policy, only to decide to attach a helper afterwards. In this case, the initial lookup will cause a ct entry to be attached to the skb, then the later commit with helper should attach the helper and confirm the connection. However, the helper attachment has been missing. If the user has enabled automatic helper attachment, then this issue will be masked as it will be applied in init_conntrack(). It is also masked if the action is executed from ovs_packet_cmd_execute() as that will construct a fresh skb. This patch fixes the issue by making an explicit call to try to assign the helper if there is a discrepancy between the action's helper and the current skb->nfct. Fixes: cae3a2627520 ("openvswitch: Allow attaching helpers to ct action") Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following large patchset contains Netfilter updates for your net-next tree. My initial intention was to send you this in two goes but when I looked back twice I already had this burden on top of me. Several updates for IPVS from Marco Angaroni: 1) Allow SIP connections originating from real-servers to be load balanced by the SIP persistence engine as is already implemented in the other direction. 2) Release connections immediately for One-packet-scheduling (OPS) in IPVS, instead of making it via timer and rcu callback. 3) Skip deleting conntracks for each one packet in OPS, and don't call nf_conntrack_alter_reply() since no reply is expected. 4) Enable drop on exhaustion for OPS + SIP persistence. Miscelaneous conntrack updates from Florian Westphal, including fix for hash resize: 5) Move conntrack generation counter out of conntrack pernet structure since this is only used by the init_ns to allow hash resizing. 6) Use get_random_once() from packet path to collect hash random seed instead of our compound. 7) Don't disable BH from ____nf_conntrack_find() for statistics, use NF_CT_STAT_INC_ATOMIC() instead. 8) Fix lookup race during conntrack hash resizing. 9) Introduce clash resolution on conntrack insertion for connectionless protocol. Then, Florian's netns rework to get rid of per-netns conntrack table, thus we use one single table for them all. There was consensus on this change during the NFWS 2015 and, on top of that, it has recently been pointed as a source of multiple problems from unpriviledged netns: 11) Use a single conntrack hashtable for all namespaces. Include netns in object comparisons and make it part of the hash calculation. Adapt early_drop() to consider netns. 12) Use single expectation and NAT hashtable for all namespaces. 13) Use a single slab cache for all namespaces for conntrack objects. 14) Skip full table scanning from nf_ct_iterate_cleanup() if the pernet conntrack counter tells us the table is empty (ie. equals zero). Fixes for nf_tables interval set element handling, support to set conntrack connlabels and allow set names up to 32 bytes. 15) Parse element flags from element deletion path and pass it up to the backend set implementation. 16) Allow adjacent intervals in the rbtree set type for dynamic interval updates. 17) Add support to set connlabel from nf_tables, from Florian Westphal. 18) Allow set names up to 32 bytes in nf_tables. Several x_tables fixes and updates: 19) Fix incorrect use of IS_ERR_VALUE() in x_tables, original patch from Andrzej Hajda. And finally, miscelaneous netfilter updates such as: 20) Disable automatic helper assignment by default. Note this proc knob was introduced by a9006892643a ("netfilter: nf_ct_helper: allow to disable automatic helper assignment") 4 years ago to start moving towards explicit conntrack helper configuration via iptables CT target. 21) Get rid of obsolete and inconsistent debugging instrumentation in x_tables. 22) Remove unnecessary check for null after ip6_route_output(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-05openvswitch: __nf_ct_l{3,4}proto_find() always return a valid pointerPablo Neira Ayuso
If the protocol is not natively supported, this assigns generic protocol tracker so we can always assume a valid pointer after these calls. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Joe Stringer <joe@ovn.org>
2016-04-26ovs: align nlattr properly when neededNicolas Dichtel
I also fix commit 8b32ab9e6ef1: use nla_total_size_64bit() for OVS_FLOW_ATTR_USED in ovs_flow_cmd_msg_size(). Fixes: 8b32ab9e6ef1 ("ovs: use nla_put_u64_64bit()") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25ovs: use nla_put_u64_64bit()Nicolas Dichtel
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree, mostly from Florian Westphal to sort out the lack of sufficient validation in x_tables and connlabel preparation patches to add nf_tables support. They are: 1) Ensure we don't go over the ruleset blob boundaries in mark_source_chains(). 2) Validate that target jumps land on an existing xt_entry. This extra sanitization comes with a performance penalty when loading the ruleset. 3) Introduce xt_check_entry_offsets() and use it from {arp,ip,ip6}tables. 4) Get rid of the smallish check_entry() functions in {arp,ip,ip6}tables. 5) Make sure the minimal possible target size in x_tables. 6) Similar to #3, add xt_compat_check_entry_offsets() for compat code. 7) Check that standard target size is valid. 8) More sanitization to ensure that the target_offset field is correct. 9) Add xt_check_entry_match() to validate that matches are well-formed. 10-12) Three patch to reduce the number of parameters in translate_compat_table() for {arp,ip,ip6}tables by using a container structure. 13) No need to return value from xt_compat_match_from_user(), so make it void. 14) Consolidate translate_table() so it can be used by compat code too. 15) Remove obsolete check for compat code, so we keep consistent with what was already removed in the native layout code (back in 2007). 16) Get rid of target jump validation from mark_source_chains(), obsoleted by #2. 17) Introduce xt_copy_counters_from_user() to consolidate counter copying, and use it from {arp,ip,ip6}tables. 18,22) Get rid of unnecessary explicit inlining in ctnetlink for dump functions. 19) Move nf_connlabel_match() to xt_connlabel. 20) Skip event notification if connlabel did not change. 21) Update of nf_connlabels_get() to make the upcoming nft connlabel support easier. 23) Remove spinlock to read protocol state field in conntrack. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24libnl: nla_put_be64(): align on a 64-bit areaNicolas Dichtel
nla_data() is now aligned on a 64-bit area. A temporary version (nla_put_be64_32bit()) is added for nla_put_net64(). This function is removed in the next patch. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts were two cases of simple overlapping changes, nothing serious. In the UDP case, we need to add a hlist_add_tail_rcu() to linux/rculist.h, because we've moved UDP socket handling away from using nulls lists. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21openvswitch: use flow protocol when recalculating ipv6 checksumsSimon Horman
When using masked actions the ipv6_proto field of an action to set IPv6 fields may be zero rather than the prevailing protocol which will result in skipping checksum recalculation. This patch resolves the problem by relying on the protocol in the flow key rather than that in the set field action. Fixes: 83d2b9ba1abc ("net: openvswitch: Support masked set actions.") Cc: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21openvswitch: Orphan skbs before IPv6 defragJoe Stringer
This is the IPv6 counterpart to commit 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()"). Prior to commit 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would be cloned (implicitly orphaning) prior to queueing for reassembly. As such, when the IPv6 message is eventually reassembled, the skb->sk for all fragments would be NULL. After that commit was introduced, rather than cloning, the original skbs were queued directly without orphaning. The end result is that all frags except for the first and last may have a socket attached. This commit explicitly orphans such skbs during nf_ct_frag6_gather() to prevent BUG_ON(skb->sk) during a later call to ip6_fragment(). kernel BUG at net/ipv6/ip6_output.c:631! [...] Call Trace: <IRQ> [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0 [<ffffffffa042c7c0>] ? do_output.isra.28+0x1b0/0x1b0 [openvswitch] [<ffffffff810bb8a2>] ? __lock_is_held+0x52/0x70 [<ffffffffa042c587>] ovs_fragment+0x1f7/0x280 [openvswitch] [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0 [<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50 [<ffffffff81697ea0>] ? dst_discard_out+0x20/0x20 [<ffffffff81697e80>] ? dst_ifdown+0x80/0x80 [<ffffffffa042c703>] do_output.isra.28+0xf3/0x1b0 [openvswitch] [<ffffffffa042d279>] do_execute_actions+0x709/0x12c0 [openvswitch] [<ffffffffa04340a4>] ? ovs_flow_stats_update+0x74/0x1e0 [openvswitch] [<ffffffffa04340d1>] ? ovs_flow_stats_update+0xa1/0x1e0 [openvswitch] [<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40 [<ffffffffa042de75>] ovs_execute_actions+0x45/0x120 [openvswitch] [<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch] [<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40 [<ffffffffa042def4>] ovs_execute_actions+0xc4/0x120 [openvswitch] [<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch] [<ffffffffa04337f2>] ? key_extract+0x442/0xc10 [openvswitch] [<ffffffffa043b26d>] ovs_vport_receive+0x5d/0xb0 [openvswitch] [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0 [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0 [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0 [<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50 [<ffffffffa043c11d>] internal_dev_xmit+0x6d/0x150 [openvswitch] [<ffffffffa043c0b5>] ? internal_dev_xmit+0x5/0x150 [openvswitch] [<ffffffff8168fb5f>] dev_hard_start_xmit+0x2df/0x660 [<ffffffff8168f5ea>] ? validate_xmit_skb.isra.105.part.106+0x1a/0x2b0 [<ffffffff81690925>] __dev_queue_xmit+0x8f5/0x950 [<ffffffff81690080>] ? __dev_queue_xmit+0x50/0x950 [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0 [<ffffffff81690990>] dev_queue_xmit+0x10/0x20 [<ffffffff8169a418>] neigh_resolve_output+0x178/0x220 [<ffffffff81752759>] ? ip6_finish_output2+0x219/0x7b0 [<ffffffff81752759>] ip6_finish_output2+0x219/0x7b0 [<ffffffff817525a5>] ? ip6_finish_output2+0x65/0x7b0 [<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80 [<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50 [<ffffffff81754af1>] ip6_fragment+0xba1/0xc50 [<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40 [<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0 [<ffffffff81754dcf>] ip6_output+0x5f/0x1a0 [<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50 [<ffffffff81797fbd>] ip6_local_out+0x3d/0x80 [<ffffffff817554df>] ip6_send_skb+0x2f/0xc0 [<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50 [<ffffffff817796cc>] icmpv6_push_pending_frames+0xac/0xe0 [<ffffffff8177a4be>] icmpv6_echo_reply+0x42e/0x500 [<ffffffff8177acbf>] icmpv6_rcv+0x4cf/0x580 [<ffffffff81755ac7>] ip6_input_finish+0x1a7/0x690 [<ffffffff81755925>] ? ip6_input_finish+0x5/0x690 [<ffffffff817567a0>] ip6_input+0x30/0xa0 [<ffffffff81755920>] ? ip6_rcv_finish+0x1a0/0x1a0 [<ffffffff817557ce>] ip6_rcv_finish+0x4e/0x1a0 [<ffffffff8175640f>] ipv6_rcv+0x45f/0x7c0 [<ffffffff81755fe6>] ? ipv6_rcv+0x36/0x7c0 [<ffffffff81755780>] ? ip6_make_skb+0x1c0/0x1c0 [<ffffffff8168b649>] __netif_receive_skb_core+0x229/0xb80 [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0 [<ffffffff8168c07f>] ? process_backlog+0x6f/0x230 [<ffffffff8168bfb6>] __netif_receive_skb+0x16/0x70 [<ffffffff8168c088>] process_backlog+0x78/0x230 [<ffffffff8168c0ed>] ? process_backlog+0xdd/0x230 [<ffffffff8168db43>] net_rx_action+0x203/0x480 [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0 [<ffffffff817c156e>] __do_softirq+0xde/0x49f [<ffffffff81752768>] ? ip6_finish_output2+0x228/0x7b0 [<ffffffff817c070c>] do_softirq_own_stack+0x1c/0x30 <EOI> [<ffffffff8106f88b>] do_softirq.part.18+0x3b/0x40 [<ffffffff8106f946>] __local_bh_enable_ip+0xb6/0xc0 [<ffffffff81752791>] ip6_finish_output2+0x251/0x7b0 [<ffffffff81754af1>] ? ip6_fragment+0xba1/0xc50 [<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80 [<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50 [<ffffffff81754af1>] ip6_fragment+0xba1/0xc50 [<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40 [<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0 [<ffffffff81754dcf>] ip6_output+0x5f/0x1a0 [<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50 [<ffffffff81797fbd>] ip6_local_out+0x3d/0x80 [<ffffffff817554df>] ip6_send_skb+0x2f/0xc0 [<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50 [<ffffffff81778558>] rawv6_sendmsg+0xa28/0xe30 [<ffffffff81719097>] ? inet_sendmsg+0xc7/0x1d0 [<ffffffff817190d6>] inet_sendmsg+0x106/0x1d0 [<ffffffff81718fd5>] ? inet_sendmsg+0x5/0x1d0 [<ffffffff8166d078>] sock_sendmsg+0x38/0x50 [<ffffffff8166d4d6>] SYSC_sendto+0xf6/0x170 [<ffffffff8100201b>] ? trace_hardirqs_on_thunk+0x1b/0x1d [<ffffffff8166e38e>] SyS_sendto+0xe/0x10 [<ffffffff817bebe5>] entry_SYSCALL_64_fastpath+0x18/0xa8 Code: 06 48 83 3f 00 75 26 48 8b 87 d8 00 00 00 2b 87 d0 00 00 00 48 39 d0 72 14 8b 87 e4 00 00 00 83 f8 01 75 09 48 83 7f 18 00 74 9a <0f> 0b 41 8b 86 cc 00 00 00 49 8# RIP [<ffffffff8175468a>] ip6_fragment+0x73a/0xc50 RSP <ffff880072803120> Fixes: 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free clone operations") Reported-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-18netfilter: connlabels: change nf_connlabels_get bit arg to 'highest used'Florian Westphal
nf_connlabel_set() takes the bit number that we would like to set. nf_connlabels_get() however took the number of bits that we want to support. So e.g. nf_connlabels_get(32) support bits 0 to 31, but not 32. This changes nf_connlabels_get() to take the highest bit that we want to set. Callers then don't have to cope with a potential integer wrap when using nf_connlabels_get(bit + 1) anymore. Current callers are fine, this change is only to make folloup nft ct label set support simpler. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-04-17openvswitch: Convert to using IFF_NO_QUEUEPhil Sutter
Cc: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for you net tree, they are: 1) There was a race condition between parallel save/swap and delete, which resulted a kernel crash due to the increase ref for save, swap, wrong ref decrease operations. Reported and fixed by Vishwanath Pai. 2) OVS should call into CT NAT for packets of new expected connections only when the conntrack state is persisted with the 'commit' option to the OVS CT action. From Jarno Rajahalme. 3) Resolve kconfig dependencies with new OVS NAT support. From Arnd Bergmann. 4) Early validation of entry->target_offset to make sure it doesn't take us out from the blob, from Florian Westphal. 5) Again early validation of entry->next_offset to make sure it doesn't take out from the blob, also from Florian. 6) Check that entry->target_offset is always of of sizeof(struct xt_entry) for unconditional entries, when checking both from check_underflow() and when checking for loops in mark_source_chains(), again from Florian. 7) Fix inconsistent behaviour in nfnetlink_queue when NFQA_CFG_F_FAIL_OPEN is set and netlink_unicast() fails due to buffer overrun, we have to reinject the packet as the user expects. 8) Enforce nul-terminated table names from getsockopt GET_ENTRIES requests. 9) Don't assume skb->sk is set from nft_bridge_reject and synproxy, this fixes a recent update of the code to namespaceify ip_default_ttl, patch from Liping Zhang. This batch comes with four patches to validate x_tables blobs coming from userspace. CONFIG_USERNS exposes the x_tables interface to unpriviledged users and to be honest this interface never received the attention for this move away from the CAP_NET_ADMIN domain. Florian is working on another round with more patches with more sanity checks, so expect a bit more Netfilter fixes in this development cycle than usual. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-28openvswitch: call only into reachable nf-nat codeArnd Bergmann
The openvswitch code has gained support for calling into the nf-nat-ipv4/ipv6 modules, however those can be loadable modules in a configuration in which openvswitch is built-in, leading to link errors: net/built-in.o: In function `__ovs_ct_lookup': :(.text+0x2cc2c8): undefined reference to `nf_nat_icmp_reply_translation' :(.text+0x2cc66c): undefined reference to `nf_nat_icmpv6_reply_translation' The dependency on (!NF_NAT || NF_NAT) prevents similar issues, but NF_NAT is set to 'y' if any of the symbols selecting it are built-in, but the link error happens when any of them are modular. A second issue is that even if CONFIG_NF_NAT_IPV6 is built-in, CONFIG_NF_NAT_IPV4 might be completely disabled. This is unlikely to be useful in practice, but the driver currently only handles IPv6 being optional. This patch improves the Kconfig dependency so that openvswitch cannot be built-in if either of the two other symbols are set to 'm', and it replaces the incorrect #ifdef in ovs_ct_nat_execute() with two "if (IS_ENABLED())" checks that should catch all corner cases also make the code more readable. The same #ifdef exists ovs_ct_nat_to_attr(), where it does not cause a link error, but for consistency I'm changing it the same way. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 05752523e565 ("openvswitch: Interface with NAT.") Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>