summaryrefslogtreecommitdiff
path: root/drivers/net/ipvlan
AgeCommit message (Collapse)Author
2015-10-22ipvlan: read direct ifindex instead of iflinkBrenden Blanco
In the ipv4 outbound path of an ipvlan device in l3 mode, the ifindex is being grabbed from dev_get_iflink. This works for the physical device case, since as the documentation of that function notes: "Physical interfaces have the same 'ifindex' and 'iflink' values.". However, if the master device is a veth, and the pairs are in separate net namespaces, the route lookup will fail with -ENODEV due to outer veth pair being in a separate namespace from the ipvlan master/routing namespace. ns0 | ns1 | ns2 veth0a--|--veth0b--|--ipvl0 In ipvlan_process_v4_outbound(), a packet sent from ipvl0 in the above configuration will pass fl.flowi4_oif == veth0a to ip_route_output_flow(), but *net == ns1. Notice also that ipv6 processing is not using iflink. Since there is a discrepancy in usage, fixup both v4 and v6 case to use local dev variable. Tested this with l3 ipvlan on top of veth, as well as with single physical interface in the top namespace. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Reviewed-by: Jiri Benc <jbenc@redhat.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08ipv4, ipv6: Pass net into ip_local_out and ip6_local_outEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08ipvlan: Cache net in ipvlan_process_v4_outbound and ipvlan_process_v6_outboundEric W. Biederman
Compute net once in ipvlan_process_v4_outbound and ipvlan_process_v6_outbound and store it in a variable so that net does not need to be recomputed next time it is used. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08ipv6: Merge ip6_local_out and ip6_local_out_skEric W. Biederman
Stop hidding the sk parameter with an inline helper function and make all of the callers pass it, so that it is clear what the function is doing. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08ipv4: Merge ip_local_out and ip_local_out_skEric W. Biederman
It is confusing and silly hiding a parameter so modify all of the callers to pass in the appropriate socket or skb->sk if no socket is known. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-18net: ipvlan: convert to using IFF_NO_QUEUEPhil Sutter
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-16ipvlan: ignore addresses from ipv6 autoconfigurationKonstantin Khlebnikov
Inet6addr notifier is atomic and runs in bh context without RTNL when ipv6 receives router advertisement packet and performs autoconfiguration. Proper fix still in discussion. Let's at least plug the bug. v1: http://lkml.kernel.org/r/20150514135618.14062.1969.stgit@buzz v2: http://lkml.kernel.org/r/20150703125840.24121.91556.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-16ipvlan: use rcu_deference_bh() in ipvlan_queue_xmit()WANG Cong
In tx path rcu_read_lock_bh() is held, so we need rcu_deference_bh(). This fixes the following warning: =============================== [ INFO: suspicious RCU usage. ] 4.1.0-rc1+ #1007 Not tainted ------------------------------- drivers/net/ipvlan/ipvlan.h:106 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by dhclient/1076: #0: (rcu_read_lock_bh){......}, at: [<ffffffff817e8d84>] rcu_lock_acquire+0x0/0x26 stack backtrace: CPU: 2 PID: 1076 Comm: dhclient Not tainted 4.1.0-rc1+ #1007 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 0000000000000001 ffff8800d381bac8 ffffffff81a4154f 000000003c1a3c19 ffff8800d4d0a690 ffff8800d381baf8 ffffffff810b849f ffff880117d41148 ffff880117d40000 ffff880117d40068 0000000000000156 ffff8800d381bb18 Call Trace: [<ffffffff81a4154f>] dump_stack+0x4c/0x65 [<ffffffff810b849f>] lockdep_rcu_suspicious+0x107/0x110 [<ffffffff8165a522>] ipvlan_port_get_rcu+0x47/0x4e [<ffffffff8165ad14>] ipvlan_queue_xmit+0x35/0x450 [<ffffffff817ea45d>] ? rcu_read_unlock+0x3e/0x5f [<ffffffff810a20bf>] ? local_clock+0x19/0x22 [<ffffffff810b4781>] ? __lock_is_held+0x39/0x52 [<ffffffff8165b64c>] ipvlan_start_xmit+0x1b/0x44 [<ffffffff817edf7f>] dev_hard_start_xmit+0x2ae/0x467 [<ffffffff817ee642>] __dev_queue_xmit+0x50a/0x60c [<ffffffff817ee7a7>] dev_queue_xmit_sk+0x13/0x15 [<ffffffff81997596>] dev_queue_xmit+0x10/0x12 [<ffffffff8199b41c>] packet_sendmsg+0xb6b/0xbdf [<ffffffff810b5ea7>] ? mark_lock+0x2e/0x226 [<ffffffff810a1fcc>] ? sched_clock_cpu+0x9e/0xb7 [<ffffffff817d56f9>] sock_sendmsg_nosec+0x12/0x1d [<ffffffff817d7257>] sock_sendmsg+0x29/0x2e [<ffffffff817d72cc>] sock_write_iter+0x70/0x91 [<ffffffff81199563>] __vfs_write+0x7e/0xa7 [<ffffffff811996bc>] vfs_write+0x92/0xe8 [<ffffffff811997d7>] SyS_write+0x47/0x7e [<ffffffff81a4d517>] system_call_fastpath+0x12/0x6f Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-16ipvlan: unhash addresses without synchronize_rcuKonstantin Khlebnikov
All structures used in traffic forwarding are rcu-protected: ipvl_addr, ipvl_dev and ipvl_port. Thus we can unhash addresses without synchronization. We'll anyway hash it back into the same bucket: in worst case lockless lookup will scan hash once again. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-16ipvlan: plug memory leak in ipvlan_link_deleteKonstantin Khlebnikov
Add missing kfree_rcu(addr, rcu); Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-16ipvlan: remove counters of ipv4 and ipv6 addressesKonstantin Khlebnikov
They are unused after commit f631c44bbe15 ("ipvlan: Always set broadcast bit in multicast filter"). Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05ipvlan: Always set broadcast bit in multicast filterMahesh Bandewar
Earlier tricks of setting broadcast bit only when IPv4 address is added onto interface are not good enough especially when autoconf comes in play. Setting them on always is performance drag but now that multicast / broadcast is not processed in fast-path; enabling broadcast will let autoconf work correctly without affecting performance characteristics of the device. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05ipvlan: Defer multicast / broadcast processing to a work-queueMahesh Bandewar
Processing multicast / broadcast in fast path is performance draining and having more links means more cloning and bringing performance down further. Broadcast; in particular, need to be given to all the virtual links. Earlier tricks of enabling broadcast bit for IPv4 only interfaces are not really working since it fails autoconf. Which means enabling broadcast for all the links if protocol specific hacks do not have to be added into the driver. This patch defers all (incoming as well as outgoing) multicast traffic to a work-queue leaving only the unicast traffic in the fast-path. Now if we need to apply any additional tricks to further reduce the impact of this (multicast / broadcast) type of traffic, it can be implemented while processing this work without affecting the fast-path. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/usb/asix_common.c drivers/net/usb/sr9800.c drivers/net/usb/usbnet.c include/linux/usb/usbnet.h net/ipv4/tcp_ipv4.c net/ipv6/tcp_ipv6.c The TCP conflicts were overlapping changes. In 'net' we added a READ_ONCE() to the socket cached RX route read, whilst in 'net-next' Eric Dumazet touched the surrounding code dealing with how mini sockets are handled. With USB, it's a case of the same bug fix first going into net-next and then I cherry picked it back into net. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02ipvlan: implement ndo_get_iflinkNicolas Dichtel
Don't use dev->iflink anymore. CC: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02dev: introduce dev_get_iflink()Nicolas Dichtel
The goal of this patch is to prepare the removal of the iflink field. It introduces a new ndo function, which will be implemented by virtual interfaces. There is no functional change into this patch. All readers of iflink field now call dev_get_iflink(). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31ipvlan: fix check for IP addresses in control pathJiri Benc
When an ipvlan interface is down, its addresses are not on the hash list. Fix checks for existence of addresses not to depend on the hash list, walk through all interface addresses instead. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31ipvlan: do not use rcu operations for address listJiri Benc
All accesses to ipvlan->addrs are under rtnl. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31ipvlan: protect against concurrent link removalJiri Benc
Adding and removing to the 'ipvlans' list is already done using _rcu list operations. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31ipvlan: fix addr hash list corruptionJiri Benc
When ipvlan interface with IP addresses attached is brought down and then deleted, the assigned addresses are deleted twice from the address hash list, first on the interface down and second on the link deletion. Similarly, when an address is added while the interface is down, it is added second time once the interface is brought up. When the interface is down, the addresses should be kept off the hash list for performance reasons. Ensure this is true, which also fixes the double add problem. To fix the double free, check whether the address is hashed before removing it. Reported-by: Dan Williams <dcbw@redhat.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: Kill dev_rebuild_headerEric W. Biederman
Now that there are no more users kill dev_rebuild_header and all of it's implementations. This is long overdue. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-12ipvlan: add a missing __percpu pcpu_statsEric Dumazet
Cosmetic patch to add __percpu qualifier to pcpu_stats Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-31net: mark some potential candidates __read_mostlyDaniel Borkmann
They are all either written once or extremly rarely (e.g. from init code), so we can move them to the .data..read_mostly section. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25ipvlan: fix incorrect usage of IS_ERR() macro in IPv6 code path.Mahesh Bandewar
The ip6_route_output() always returns a valid dst pointer unlike in IPv4 case. So the validation has to be different from the IPv4 path. Correcting that error in this patch. This was picked up by a static checker with a following warning - drivers/net/ipvlan/ipvlan_core.c:380 ipvlan_process_v6_outbound() warn: 'dst' isn't an ERR_PTR Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09ipvlan: move the device check function into netdevice.hMahesh Bandewar
Move the port check [ipvlan_dev_master()] and device check [ipvlan_dev_slave()] functions to netdevice.h and rename them netif_is_ipvlan_port() and netif_is_ipvlan() resp. to be consistent with macvlan api naming. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09ipvlan: play well with macvlan deviceMahesh Bandewar
If a device is already a macvlan port then refuse to use it as an ipvlan port in the early stage of port creation. thost1:~# ip link add link eth0 mvl0 type macvlan thost1:~# echo $? 0 thost1:~# ip link add link eth0 ipvl0 type ipvlan RTNETLINK answers: Device or resource busy thost1:~# echo $? 2 thost1:~# Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-06net-ipvlan: Deletion of an unnecessary check before the function call ↵Markus Elfring
"free_percpu" The free_percpu() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-30ipvlan: ipvlan depends on INET and IPV6Mahesh Bandewar
This driver uses ip_out_local() and ip6_route_output() which are defined only if CONFIG_INET and CONFIG_IPV6 are enabled respectively. Reported-by: Jim Davis <jim.epost@gmail.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26ipvlan: fix sparse warningsMahesh Bandewar
Fix sparse warnings reported by kbuild robot drivers/net/ipvlan/ipvlan_main.c:172:13: warning: symbol 'ipvlan_start_xmit' was not declared. Should it be static? drivers/net/ipvlan/ipvlan_main.c:256:33: warning: incorrect type in initializer (different address spaces) drivers/net/ipvlan/ipvlan_main.c:256:33: expected void const [noderef] <asn:3>*__vpp_verify drivers/net/ipvlan/ipvlan_main.c:256:33: got struct ipvl_pcpu_stats *<noident> drivers/net/ipvlan/ipvlan_main.c:544:5: warning: symbol 'ipvlan_link_register' was not declared. Should it be static Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-24ipvlan: Initial check-in of the IPVLAN driver.Mahesh Bandewar
This driver is very similar to the macvlan driver except that it uses L3 on the frame to determine the logical interface while functioning as packet dispatcher. It inherits L2 of the master device hence the packets on wire will have the same L2 for all the packets originating from all virtual devices off of the same master device. This driver was developed keeping the namespace use-case in mind. Hence most of the examples given here take that as the base setup where main-device belongs to the default-ns and virtual devices are assigned to the additional namespaces. The device operates in two different modes and the difference in these two modes in primarily in the TX side. (a) L2 mode : In this mode, the device behaves as a L2 device. TX processing upto L2 happens on the stack of the virtual device associated with (namespace). Packets are switched after that into the main device (default-ns) and queued for xmit. RX processing is simple and all multicast, broadcast (if applicable), and unicast belonging to the address(es) are delivered to the virtual devices. (b) L3 mode : In this mode, the device behaves like a L3 device. TX processing upto L3 happens on the stack of the virtual device associated with (namespace). Packets are switched to the main-device (default-ns) for the L2 processing. Hence the routing table of the default-ns will be used in this mode. RX processins is somewhat similar to the L2 mode except that in this mode only Unicast packets are delivered to the virtual device while main-dev will handle all other packets. The devices can be added using the "ip" command from the iproute2 package - ip link add link <master> <virtual> type ipvlan mode [ l2 | l3 ] Signed-off-by: Mahesh Bandewar <maheshb@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Laurent Chavey <chavey@google.com> Cc: Tim Hockin <thockin@google.com> Cc: Brandon Philips <brandon.philips@coreos.com> Cc: Pavel Emelianov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>