summaryrefslogtreecommitdiff
path: root/drivers/net/bonding/bond_3ad.c
diff options
context:
space:
mode:
authorJay Vosburgh <fubar@us.ibm.com>2011-10-28 15:42:50 (GMT)
committerDavid S. Miller <davem@davemloft.net>2011-10-30 07:13:14 (GMT)
commite6d265e8504ab4a3368b8645d318b344ee88b280 (patch)
tree6fd2bc16819bae491e56be2658fc3298b7efa92c /drivers/net/bonding/bond_3ad.c
parent10ee0faed92c8af4baebd633372136a6608a41ea (diff)
downloadlinux-e6d265e8504ab4a3368b8645d318b344ee88b280.tar.xz
bonding: eliminate bond_close race conditions
This patch resolves two sets of race conditions. Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> reported the first, as follows: The bond_close() calls cancel_delayed_work() to cancel delayed works. It, however, cannot cancel works that were already queued in workqueue. The bond_open() initializes work->data, and proccess_one_work() refers get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL when work->data has been initialized. Thus, a panic occurs. He included a patch that converted the cancel_delayed_work calls in bond_close to flush_delayed_work_sync, which eliminated the above problem. His patch is incorporated, at least in principle, into this patch. In this patch, we use cancel_delayed_work_sync in place of flush_delayed_work_sync, and also convert bond_uninit in addition to bond_close. This conversion to _sync, however, opens new races between bond_close and three periodically executing workqueue functions: bond_mii_monitor, bond_alb_monitor and bond_activebackup_arp_mon. The race occurs because bond_close and bond_uninit are always called with RTNL held, and these workqueue functions may acquire RTNL to perform failover-related activities. If bond_close or bond_uninit is waiting in cancel_delayed_work_sync, deadlock occurs. These deadlocks are resolved by having the workqueue functions acquire RTNL conditionally. If the rtnl_trylock() fails, the functions reschedule and return immediately. For the cases that are attempting to perform link failover, a delay of 1 is used; for the other cases, the normal interval is used (as those activities are not as time critical). Additionally, the bond_mii_monitor function now stores the delay in a variable (mimicing the structure of activebackup_arp_mon). Lastly, all of the above renders the kill_timers sentinel moot, and therefore it has been removed. Tested-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'drivers/net/bonding/bond_3ad.c')
-rw-r--r--drivers/net/bonding/bond_3ad.c8
1 files changed, 2 insertions, 6 deletions
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index b33c099..0ae0d7c 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2110,9 +2110,6 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
read_lock(&bond->lock);
- if (bond->kill_timers)
- goto out;
-
//check if there are any slaves
if (bond->slave_cnt == 0)
goto re_arm;
@@ -2161,9 +2158,8 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
}
re_arm:
- if (!bond->kill_timers)
- queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks);
-out:
+ queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks);
+
read_unlock(&bond->lock);
}