summaryrefslogtreecommitdiff
path: root/drivers/net/bonding/bond_main.c
AgeCommit message (Collapse)Author
2016-09-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/mediatek/mtk_eth_soc.c drivers/net/ethernet/qlogic/qed/qed_dcbx.c drivers/net/phy/Kconfig All conflicts were cases of overlapping commits. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-04bonding: Fix bonding crashMahesh Bandewar
Following few steps will crash kernel - (a) Create bonding master > modprobe bonding miimon=50 (b) Create macvlan bridge on eth2 > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \ type macvlan (c) Now try adding eth2 into the bond > echo +eth2 > /sys/class/net/bond0/bonding/slaves <crash> Bonding does lots of things before checking if the device enslaved is busy or not. In this case when the notifier call-chain sends notifications, the bond_netdev_event() assumes that the rx_handler /rx_handler_data is registered while the bond_enslave() hasn't progressed far enough to register rx_handler for the new slave. This patch adds a rx_handler check that can be performed right at the beginning of the enslave code to avoid getting into this situation. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01bonding: Remove deprecated create_singlethread_workqueueBhaktipriya Shridhar
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "wq" queues multiple work items viz &bond->mcast_work, &nnw->work, &bond->mii_work, &bond->arp_work, &bond->alb_work, &bond->mii_work, &bond->ad_work, &bond->slave_arr_work which require strict execution ordering. Hence, an ordered dedicated workqueue has been used. Since, it is a network driver, WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-09bonding: fix the typoZhu Yanjun
The message "803.ad" should be "802.3ad". Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25net/bonding: Enforce active-backup policy for IPoIB bondsMark Bloch
When using an IPoIB bond currently only active-backup mode is a valid use case and this commit strengthens it. Since commit 2ab82852a270 ("net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()") was introduced till 4.7-rc1, IPoIB didn't support the set_mac_address ndo, and hence the fail over mac policy always applied to IPoIB bonds. With the introduction of commit 492a7e67ff83 ("IB/IPoIB: Allow setting the device address"), that doesn't hold and practically IPoIB bonds are broken as of that. To fix it, lets go to fail over mac if the device doesn't support the ndo OR this is IPoIB device. As a by-product, this commit also prevents a stack corruption which occurred when trying to copy 20 bytes (IPoIB) device address to a sockaddr struct that has only 16 bytes of storage. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/mellanox/mlx5/core/en.h drivers/net/ethernet/mellanox/mlx5/core/en_main.c drivers/net/usb/r8152.c All three conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05bonding: fix enslavement slave link notificationsAviv Heller
Currently, link notifications are not sent by bond_set_slave_link_state() upon enslavement if the slave is enslaved when up. This happens because slave->link default init value is 0, which is the same as BOND_LINK_UP, resulting in bond_set_slave_link_state() ignoring this transition. This patch sets the default value of slave->link to BOND_LINK_NOCHANGE, assuring it will count as a state transition and thus trigger notification logic. Signed-off-by: Aviv Heller <avivh@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05net: introduce default neigh_construct/destroy ndo calls for L2 upper devicesJiri Pirko
L2 upper device needs to propagate neigh_construct/destroy calls down to lower devices. Do this by defining default ndo functions and use them in team, bond, bridge and vlan. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net: add netdev_lockdep_set_classes() helperEric Dumazet
It is time to add netdev_lockdep_set_classes() helper so that lockdep annotations per device type are easier to manage. This removes a lot of copies and missing annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07net_sched: transform qdisc running bit into a seqcountEric Dumazet
Instead of using a single bit (__QDISC___STATE_RUNNING) in sch->__state, use a seqcount. This adds lockdep support, but more importantly it will allow us to sample qdisc/class statistics without having to grab qdisc root lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-19bonding: fix bond_get_stats()Eric Dumazet
bond_get_stats() can be called from rtnetlink (with RTNL held) or from /proc/net/dev seq handler (with RCU held) The logic added in commit 5f0c5f73e5ef ("bonding: make global bonding stats more reliable") kind of assumed only one cpu could run there. If multiple threads are reading /proc/net/dev, stats can be really messed up after a while. A second problem is that some fields are 32bit, so we need to properly handle the wrap around problem. Given that RTNL is not always held, we need to use bond_for_each_slave_rcu(). Fixes: 5f0c5f73e5ef ("bonding: make global bonding stats more reliable") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-19bonding: remove duplicate set of flag IFF_MULTICASTZhang Shengju
Remove unnecessary set of flag IFF_MULTICAST, since ether_setup already does this. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-26net: bonding: use __ethtool_get_ksettingsDavid Decotigny
Signed-off-by: David Decotigny <decot@googlers.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/phy/bcm7xxx.c drivers/net/phy/marvell.c drivers/net/vxlan.c All three conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16bonding: don't use stale speed and duplex informationJay Vosburgh
There is presently a race condition between the bonding periodic link monitor and the updating of a slave's speed and duplex. The former occurs on a periodic basis, and the latter in response to a driver's calling of netif_carrier_on. It is possible for the periodic monitor to run between the driver call of netif_carrier_on and the receipt of the NETDEV_CHANGE event that causes bonding to update the slave's speed and duplex. This manifests most notably as a report that a slave is up and "0 Mbps full duplex" after enslavement, but in principle could report an incorrect speed and duplex after any link up event if the device comes up with a different speed or duplex. This affects the 802.3ad aggregator selection, as the speed and duplex are selection criteria. This is fixed by updating the speed and duplex in the periodic monitor, prior to using that information. This was done historically in bonding, but the call to bond_update_speed_duplex was removed in commit 876254ae2758 ("bonding: don't call update_speed_duplex() under spinlocks"), as it might sleep under lock. Later, the locking was changed to only hold RTNL, and so after commit 876254ae2758 ("bonding: don't call update_speed_duplex() under spinlocks") this call is again safe. Tested-by: "Tantilov, Emil S" <emil.s.tantilov@intel.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: dingtianhong <dingtianhong@huawei.com> Fixes: 876254ae2758 ("bonding: don't call update_speed_duplex() under spinlocks") Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-13bonding: Fix ARP monitor validationJay Vosburgh
The current logic in bond_arp_rcv will accept an incoming ARP for validation if (a) the receiving slave is either "active" (which includes the currently active slave, or the current ARP slave) or, (b) there is a currently active slave, and it has received an ARP since it became active. For case (b), the receiving slave isn't the currently active slave, and is receiving the original broadcast ARP request, not an ARP reply from the target. This logic can fail if there is no currently active slave. In this situation, the ARP probe logic cycles through all slaves, assigning each in turn as the "current_arp_slave" for one arp_interval, then setting that one as "active," and sending an ARP probe from that slave. The current logic expects the ARP reply to arrive on the sending current_arp_slave, however, due to switch FDB updating delays, the reply may be directed to another slave. This can arise if the bonding slaves and switch are working, but the ARP target is not responding. When the ARP target recovers, a condition may result wherein the ARP target host replies faster than the switch can update its forwarding table, causing each ARP reply to be sent to the previous current_arp_slave. This will never pass the logic in bond_arp_rcv, as neither of the above conditions (a) or (b) are met. Some experimentation on a LAN shows ARP reply round trips in the 200 usec range, but my available switches never update their FDB in less than 4000 usec. This patch changes the logic in bond_arp_rcv to additionally accept an ARP reply for validation on any slave if there is a current ARP slave and it sent an ARP probe during the previous arp_interval. Fixes: aeea64ac717a ("bonding: don't trust arp requests unless active slave really works") Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11bonding: use return instead of gotoZhang Shengju
Replace 'goto' with 'return' to remove unnecessary check at label: err_undo_flags. The reason is that 'err_undo_flags' do two things for the first slave device: 1.revert bond mac address if it is set by the slave device. 2.revert bond device type if it's not ARPHRD_ETHER. It's not necessary for the following three places, they changed neither bond mac address nor type. It's straightforward to return directly. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-08bonding: trivial: style fixesZhang Shengju
remove some redudant brackets, use sizeof(*) instead of sizeof(struct x). Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-06bonding: add slave device name for debugZhang Shengju
netdev_dbg() will add bond device name, it will be helpful if we print slave device name. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-06bond: track sum of rx_nohandler for all slavesJarod Wilson
Sample output with this set applied for an active-backup bond: $ cat /sys/devices/virtual/net/bond0/lower_p7p1/statistics/rx_nohandler 16568 $ cat /sys/devices/virtual/net/bond0/lower_p5p2/statistics/rx_nohandler 16583 $ cat /sys/devices/virtual/net/bond0/statistics/rx_nohandler 33151 CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/bonding/bond_main.c drivers/net/ethernet/mellanox/mlxsw/spectrum.h drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c The bond_main.c and mellanox switch conflicts were cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11bonding: Prevent IPv6 link local address on enslaved devicesKarl Heiss
Commit 1f718f0f4f97 ("bonding: populate neighbour's private on enslave") undoes the fix provided by commit c2edacf80e15 ("bonding / ipv6: no addrconf for slaves separately from master") by effectively setting the slave flag after the slave has been opened. If the slave comes up quickly enough, it will go through the IPv6 addrconf before the slave flag has been set and will get a link local IPv6 address. In order to ensure that addrconf knows to ignore the slave devices on state change, set IFF_SLAVE before dev_open() during bonding enslavement. Fixes: 1f718f0f4f97 ("bonding: populate neighbour's private on enslave") Signed-off-by: Karl Heiss <kheiss@gmail.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-15net: Rename NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASKTom Herbert
The name NETIF_F_ALL_CSUM is a misnomer. This does not correspond to the set of features for offloading all checksums. This is a mask of the checksum offload related features bits. It is incorrect to set both NETIF_F_HW_CSUM and NETIF_F_IP_CSUM or NETIF_F_IPV6 at the same time for features of a device. This patch: - Changes instances of NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK (where NETIF_F_ALL_CSUM is being used as a mask). - Changes bonding, sfc/efx, ipvlan, macvlan, vlan, and team drivers to use NEITF_F_HW_CSUM in features list instead of NETIF_F_ALL_CSUM. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-04net: bonding: remove redudant bracketsyzhu1
It is not necessary to use two brackets. As such, the redudant brackets are removed. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03bonding: set inactive flags on releaseJiri Pirko
Be correct and symmetric to enslave and set inactive flags during release. That gives LAG offload drivers - lower state change listeners - possibility to do proper cleanup. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03bonding: implement lower state change propagationJiri Pirko
Let netdev notifier listeners know about link and slave state change. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03bonding: allow notifications for bond_set_slave_link_stateJiri Pirko
Similar to state notifications. We allow caller to indicate if the notification should happen now or later, depending on if he holds rtnl mutex or not. Introduce bond_slave_link_notify function (similar to bond_slave_state_notify) which is later on called with rtnl mutex and goes over slaves and executes delayed notification. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03bonding: fill-up LAG changeupper info struct and pass it alongJiri Pirko
Initialize netdev_lag_upper_info structure by TX type according to current bonding mode and pass it along via netdev_master_upper_dev_link. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03net: add possibility to pass information about upper device via notifierJiri Pirko
Sometimes the drivers and other code would find it handy to know some internal information about upper device being changed. So allow upper-code to pass information down to notifier listeners during linking. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03net: propagate upper priv via netdev_master_upper_dev_linkJiri Pirko
Eliminate netdev_master_upper_dev_link_private and pass priv directly as a parameter of netdev_master_upper_dev_link. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-07bonding: fix panic on non-ARPHRD_ETHER enslave failureJay Vosburgh
Since commit 7d5cd2ce529b, when bond_enslave fails on devices that are not ARPHRD_ETHER, if needed, it resets the bonding device back to ARPHRD_ETHER by calling ether_setup. Unfortunately, ether_setup clobbers dev->flags, clearing IFF_UP if the bond device is up, leaving it in a quasi-down state without having actually gone through dev_close. For bonding, if any periodic work queue items are active (miimon, arp_interval, etc), those will remain running, as they are stopped by bond_close. At this point, if the bonding module is unloaded or the bond is deleted, the system will panic when the work function is called. This panic is resolved by calling dev_close on the bond itself prior to calling ether_setup. Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Fixes: 7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure") Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03bonding: simplify / unify event handling code for 3ad mode.Mahesh Bandewar
Old logic of updating state-machine is not required since ad_update_actor_keys() does it implicitly. The only loss is the notification differentiation between speed vs. duplex change. Now only one unified notification is printed. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16bonding: support encapsulated ipv6 TSOEric Dumazet
If using a sixtofour device on top of a bonding device, skb segmentation of TCP traffic is done right before calling bonding xmit, because bonding only enables TSO for IPv4. This patch improves single flow performance by about 120 % on my hosts, because segmentation is deferred right before calling slave xmit. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-18bonding: use l4 hash if availableEric Dumazet
If skb carries a l4 hash, no need to perform a flow dissection. Performance is slightly better : lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100 2.39012e+06 lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100 2.39393e+06 lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100 2.39988e+06 After patch : lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100 2.43579e+06 lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100 2.44304e+06 lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100 2.44312e+06 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01flow_dissector: Add flags argument to skb_flow_dissector functionsTom Herbert
The flags argument will allow control of the dissection process (for instance whether to parse beyond L3). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2015-08-29bonding: fix bond_poll_controller bh_enable warningNikolay Aleksandrov
The problem is rcu_read_unlock_bh() which triggers a warning when irqs are disabled. ndo_poll_controller should run with irqs disabled always so we can drop the rcu_read_lock_bh. [ 98.502922] bond0: making interface eth1 the new active one [ 98.503039] ------------[ cut here ]------------ [ 98.503039] WARNING: CPU: 0 PID: 1744 at kernel/softirq.c:150 __local_bh_enable_ip+0x96/0xc0() [ 98.503039] Modules linked in: bonding(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netconsole ppdev joydev parport_pc serio_raw parport i2c_piix4 video acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc virtio_net e1000 ata_generic pcnet32 mii virtio_pci virtio_ring virtio pata_acpi [ 98.503039] CPU: 0 PID: 1744 Comm: ifenslave Tainted: G OE 4.2.0-rc7+ #56 [ 98.503039] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 98.503039] 0000000000000000 00000000e96ba230 ffff880020c236b8 ffffffff8183f105 [ 98.503039] 0000000000000000 0000000000000000 ffff880020c236f8 ffffffff810a9496 [ 98.503039] ffff88002ea99e08 0000000000000200 ffffffffa02a8e06 ffff88002ea99e08 [ 98.503039] Call Trace: [ 98.503039] [<ffffffff8183f105>] dump_stack+0x4c/0x65 [ 98.503039] [<ffffffff810a9496>] warn_slowpath_common+0x86/0xc0 [ 98.503039] [<ffffffffa02a8e06>] ? bond_poll_controller+0x146/0x250 [bonding] [ 98.503039] [<ffffffff810a95ca>] warn_slowpath_null+0x1a/0x20 [ 98.503039] [<ffffffff810ae376>] __local_bh_enable_ip+0x96/0xc0 [ 98.503039] [<ffffffffa02a8e2f>] bond_poll_controller+0x16f/0x250 [bonding] [ 98.503039] [<ffffffffa02a8cf3>] ? bond_poll_controller+0x33/0x250 [bonding] [ 98.503039] [<ffffffff810feaed>] ? trace_hardirqs_off+0xd/0x10 [ 98.503039] [<ffffffff81848afb>] ? _raw_spin_unlock_irqrestore+0x5b/0x60 [ 98.503039] [<ffffffff816ec48e>] netpoll_poll_dev+0x6e/0x350 [ 98.503039] [<ffffffff816eb977>] ? netpoll_start_xmit+0x137/0x1d0 [ 98.503039] [<ffffffff816b2e8b>] ? __alloc_skb+0x5b/0x210 [ 98.503039] [<ffffffff816ec89d>] netpoll_send_skb_on_dev+0x12d/0x2a0 [ 98.503039] [<ffffffff816eccde>] netpoll_send_udp+0x2ce/0x430 [ 98.503039] [<ffffffffa0190850>] write_msg+0xb0/0xf0 [netconsole] [ 98.503039] [<ffffffff81116b63>] call_console_drivers.constprop.25+0x133/0x260 [ 98.503039] [<ffffffff81117934>] console_unlock+0x2f4/0x580 [ 98.503039] [<ffffffff81117ea5>] ? vprintk_emit+0x2e5/0x630 [ 98.503039] [<ffffffff81117ee5>] vprintk_emit+0x325/0x630 [ 98.503039] [<ffffffff81118379>] vprintk_default+0x29/0x40 [ 98.503039] [<ffffffff8183de4f>] printk+0x55/0x6b [ 98.503039] [<ffffffff816c754c>] __netdev_printk+0x16c/0x260 [ 98.503039] [<ffffffff816c7a12>] netdev_info+0x62/0x80 [ 98.503039] [<ffffffffa02ab464>] bond_change_active_slave+0x134/0x6a0 [bonding] [ 98.503039] [<ffffffffa02aba95>] bond_select_active_slave+0xc5/0x310 [bonding] [ 98.503039] [<ffffffffa02aeb78>] bond_enslave+0x1088/0x10c0 [bonding] [ 98.503039] [<ffffffffa02af46b>] bond_do_ioctl+0x37b/0x400 [bonding] [ 98.503039] [<ffffffff81101d8d>] ? trace_hardirqs_on+0xd/0x10 [ 98.503039] [<ffffffff816dc437>] ? rtnl_lock+0x17/0x20 [ 98.503039] [<ffffffff816e5fd1>] dev_ifsioc+0x331/0x3e0 [ 98.503039] [<ffffffff816e62dc>] dev_ioctl+0xec/0x6c0 [ 98.503039] [<ffffffff816a6c6a>] sock_do_ioctl+0x4a/0x60 [ 98.503039] [<ffffffff816a7300>] sock_ioctl+0x1c0/0x250 [ 98.503039] [<ffffffff81271bfe>] do_vfs_ioctl+0x2ee/0x540 [ 98.503039] [<ffffffff810fd943>] ? up_read+0x23/0x40 [ 98.503039] [<ffffffff81070993>] ? __do_page_fault+0x1d3/0x420 [ 98.503039] [<ffffffff8127e246>] ? __fget_light+0x66/0x90 [ 98.503039] [<ffffffff81271ec9>] SyS_ioctl+0x79/0x90 [ 98.503039] [<ffffffff8184936e>] entry_SYSCALL_64_fastpath+0x12/0x76 [ 98.503039] ---[ end trace 00cfa804b0670051 ]--- Fixes: 616f45416ca0 ("bonding: implement bond_poll_controller()") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-18net: bonding: convert to using IFF_NO_QUEUEPhil Sutter
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/cavium/Kconfig The cavium conflict was overlapping dependency changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-12bonding: Gratuitous ARP gets dropped when first slave addedVenkat Venkatsubra
When the first slave is added (such as during bootup) the first gratuitous ARP gets dropped. We don't see this drop during a failover. The packet gets dropped in qdisc (noop_enqueue). The fix is to delay the sending of gratuitous ARPs till the bond dev's carrier is present. It can also be worked around by setting num_grat_arp to more than 1. Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/bridge/br_mdb.c br_mdb.c conflict was a function call being removed to fix a bug in 'net' but whose signature was changed in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21bonding: correct the MAC address for "follow" fail_over_mac policydingtianhong
The "follow" fail_over_mac policy is useful for multiport devices that either become confused or incur a performance penalty when multiple ports are programmed with the same MAC address, but the same MAC address still may happened by this steps for this policy: 1) echo +eth0 > /sys/class/net/bond0/bonding/slaves bond0 has the same mac address with eth0, it is MAC1. 2) echo +eth1 > /sys/class/net/bond0/bonding/slaves eth1 is backup, eth1 has MAC2. 3) ifconfig eth0 down eth1 became active slave, bond will swap MAC for eth0 and eth1, so eth1 has MAC1, and eth0 has MAC2. 4) ifconfig eth1 down there is no active slave, and eth1 still has MAC1, eth2 has MAC2. 5) ifconfig eth0 up the eth0 became active slave again, the bond set eth0 to MAC1. Something wrong here, then if you set eth1 up, the eth0 and eth1 will have the same MAC address, it will break this policy for ACTIVE_BACKUP mode. This patch will fix this problem by finding the old active slave and swap them MAC address before change active slave. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20bonding: correctly handle bonding type change on enslave failureNikolay Aleksandrov
If the bond is enslaving a device with different type it will be setup by it, but if after being setup the enslave fails the bond doesn't switch back its type and also keeps pointers to foreign structures that can be long gone. Thus revert back any type changes if the enslave failed and the bond had to change its type. Example: Before patch: $ echo lo > bond0/bonding/slaves -bash: echo: write error: Cannot assign requested address $ ip l sh bond0 20: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default link/loopback 16:54:78:34:bd:41 brd 00:00:00:00:00:00 $ echo +eth1 > bond0/bonding/slaves $ ip l sh bond0 20: bond0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:3f:47:69 brd ff:ff:ff:ff:ff:ff (notice the MASTER flag is gone) After patch: $ echo lo > bond0/bonding/slaves -bash: echo: write error: Cannot assign requested address $ ip l sh bond0 21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 6e:66:94:f6:07:fc brd ff:ff:ff:ff:ff:ff $ echo +eth1 > bond0/bonding/slaves $ ip l sh bond0 21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:3f:47:69 brd ff:ff:ff:ff:ff:ff Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: e36b9d16c6a6 ("bonding: clean muticast addresses when device changes type") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20bonding: fix destruction of bond with devices different from arphrd_etherNikolay Aleksandrov
When the bonding is being unloaded and the netdevice notifier is unregistered it executes NETDEV_UNREGISTER for each device which should remove the bond's proc entry but if the device enslaved is not of ARPHRD_ETHER type and is in front of the bonding, it may execute bond_release_and_destroy() first which would release the last slave and destroy the bond device leaving the proc entry and thus we will get the following error (with dynamic debug on for bond_netdev_event to see the events order): [ 908.963051] eql: event: 9 [ 908.963052] eql: IFF_SLAVE [ 908.963054] eql: event: 2 [ 908.963056] eql: IFF_SLAVE [ 908.963058] eql: event: 6 [ 908.963059] eql: IFF_SLAVE [ 908.963110] bond0: Releasing active interface eql [ 908.976168] bond0: Destroying bond bond0 [ 908.976266] bond0 (unregistering): Released all slaves [ 908.984097] ------------[ cut here ]------------ [ 908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575 remove_proc_entry+0x112/0x160() [ 908.984110] remove_proc_entry: removing non-empty directory 'net/bonding', leaking at least 'bond0' [ 908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore usb_common [last unloaded: bonding] [ 908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G W O 4.2.0-rc2+ #8 [ 908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 908.984172] 0000000000000000 ffffffff81732d41 ffffffff81525b34 ffff8800358dfda8 [ 908.984175] ffffffff8106c521 ffff88003595af78 ffff88003595af40 ffff88003e3a4280 [ 908.984178] ffffffffa058d040 0000000000000000 ffffffff8106c59a ffffffff8172ebd0 [ 908.984181] Call Trace: [ 908.984188] [<ffffffff81525b34>] ? dump_stack+0x40/0x50 [ 908.984193] [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0 [ 908.984196] [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50 [ 908.984199] [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160 [ 908.984205] [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30 [bonding] [ 908.984208] [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding] [ 908.984217] [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70 [ 908.984225] [<ffffffff8142f52d>] ? unregister_pernet_operations+0x8d/0xd0 [ 908.984228] [<ffffffff8142f58d>] ? unregister_pernet_subsys+0x1d/0x30 [ 908.984232] [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding] [ 908.984236] [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250 [ 908.984241] [<ffffffff81086f99>] ? task_work_run+0x89/0xc0 [ 908.984244] [<ffffffff8152b732>] ? entry_SYSCALL_64_fastpath+0x16/0x75 [ 908.984247] ---[ end trace 7c006ed4abbef24b ]--- Thus remove the proc entry manually if bond_release_and_destroy() is used. Because of the checks in bond_remove_proc_entry() it's not a problem for a bond device to change namespaces (the bug fixed by the Fixes commit) but since commit f9399814927ad ("bonding: Don't allow bond devices to change network namespaces.") that can't happen anyway. Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: a64d49c3dd50 ("bonding: Manage /proc/net/bonding/ entries from the netdev events") Tested-by: Carol L Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20bonding: trivial: remove unused variablesNikolay Aleksandrov
Get rid of these: drivers/net/bonding//bond_main.c: In function ‘bond_update_slave_arr’: drivers/net/bonding//bond_main.c:3754:6: warning: variable ‘slaves_in_agg’ set but not used [-Wunused-but-set-variable] int slaves_in_agg; ^ CC [M] drivers/net/bonding//bond_3ad.o drivers/net/bonding//bond_3ad.c: In function ‘ad_marker_response_received’: drivers/net/bonding//bond_3ad.c:1870:61: warning: parameter ‘marker’ set but not used [-Wunused-but-set-parameter] static void ad_marker_response_received(struct bond_marker *marker, ^ drivers/net/bonding//bond_3ad.c:1871:19: warning: parameter ‘port’ set but not used [-Wunused-but-set-parameter] struct port *port) ^ Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08bonding: "primary_reselect" with "failure" is not working properlyMazhar Rana
When "primary_reselect" is set to "failure", primary interface should not become active until current active slave is down. But if we set first member of bond device as a "primary" interface and "primary_reselect" is set to "failure" then whenever primary interface's link get back(up) it become active slave even if current active slave is still up. With this patch, "bond_find_best_slave" will not traverse members if primary interface is not candidate for failover/reselection and current active slave is still up. Signed-off-by: Mazhar Rana <mazhar.rana@cyberoam.com> Signed-off-by: Jay Vosburgh <j.vosburgh@gmail.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04net: Add full IPv6 addresses to flow_keysTom Herbert
This patch adds full IPv6 addresses into flow_keys and uses them as input to the flow hash function. The implementation supports either IPv4 or IPv6 addresses in a union, and selector is used to determine how may words to input to jhash2. We also add flow_get_u32_dst and flow_get_u32_src functions which are used to get a u32 representation of the source and destination addresses. For IPv6, ipv6_addr_hash is called. These functions retain getting the legacy values of src and dst in flow_keys. With this patch, Ethertype and IP protocol are now included in the flow hash input. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-18switchdev: add support for fdb add/del/dump via switchdev_port_obj ops.Samudrala, Sridhar
- introduce port fdb obj and generic switchdev_port_fdb_add/del/dump() - use switchdev_port_fdb_add/del/dump in rocker/team/bonding ndo ops. - add support for fdb obj in switchdev_port_obj_add/del/dump() - switch rocker to implement fdb ops via switchdev_ops v3: updated to sync with named union changes. Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13flow_dissect: use programable dissector in skb_flow_dissect and friendsJiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13net: change name of flow_dissector header to match the .c file nameJiri Pirko
add couple of empty lines on the way. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>