From 5646f7acc95f14873f1ec715380c1c493b4243ce Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 25 Jul 2014 17:05:24 -0700 Subject: memory-barriers: Fix control-ordering no-transitivity example The control-ordering example demonstrating lack of transitivity had multiple problems. This commit fixes them. Reported-by: Nikolay Samofatov Signed-off-by: Paul E. McKenney Reviewed-by: Pranith Kumar diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index a4de88f..d67c508 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -697,30 +697,36 @@ should do something like the following: } Finally, control dependencies do -not- provide transitivity. This is -demonstrated by two related examples: +demonstrated by two related examples, with the initial values of +x and y both being zero: CPU 0 CPU 1 ===================== ===================== r1 = ACCESS_ONCE(x); r2 = ACCESS_ONCE(y); - if (r1 >= 0) if (r2 >= 0) + if (r1 > 0) if (r2 > 0) ACCESS_ONCE(y) = 1; ACCESS_ONCE(x) = 1; assert(!(r1 == 1 && r2 == 1)); The above two-CPU example will never trigger the assert(). However, if control dependencies guaranteed transitivity (which they do not), -then adding the following two CPUs would guarantee a related assertion: +then adding the following CPU would guarantee a related assertion: - CPU 2 CPU 3 - ===================== ===================== - ACCESS_ONCE(x) = 2; ACCESS_ONCE(y) = 2; + CPU 2 + ===================== + ACCESS_ONCE(x) = 2; + + assert(!(r1 == 2 && r2 == 1 && x == 2)); /* FAILS!!! */ - assert(!(r1 == 2 && r2 == 2 && x == 1 && y == 1)); /* FAILS!!! */ +But because control dependencies do -not- provide transitivity, the above +assertion can fail after the combined three-CPU example completes. If you +need the three-CPU example to provide ordering, you will need smp_mb() +between the loads and stores in the CPU 0 and CPU 1 code fragments, +that is, just before or just after the "if" statements. -But because control dependencies do -not- provide transitivity, the -above assertion can fail after the combined four-CPU example completes. -If you need the four-CPU example to provide ordering, you will need -smp_mb() between the loads and stores in the CPU 0 and CPU 1 code fragments. +These two examples are the LB and WWC litmus tests from this paper: +http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf and this +site: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html. In summary: -- cgit v0.10.2 From efdcd51a4d5bd355796b1a757ff0355bb09ed394 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 4 Aug 2014 11:49:34 -0700 Subject: memory-barriers: Retain barrier() in fold-to-zero example The transformation in the fold-to-zero example incorrectly omits the barrier() directive. This commit therefore adds it back in. Reported-by: Pranith Kumar Signed-off-by: Paul E. McKenney diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index d67c508..600b45c 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -679,12 +679,15 @@ equal to zero, in which case the compiler is within its rights to transform the above code into the following: q = ACCESS_ONCE(a); + barrier(); ACCESS_ONCE(b) = p; do_something_else(); -This transformation loses the ordering between the load from variable 'a' -and the store to variable 'b'. If you are relying on this ordering, you -should do something like the following: +This transformation fails to require that the CPU respect the ordering +between the load from variable 'a' and the store to variable 'b'. +Yes, the barrier() is still there, but it affects only the compiler, +not the CPU. Therefore, if you are relying on this ordering, you should +do something like the following: q = ACCESS_ONCE(a); BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */ -- cgit v0.10.2 From 2456d2a617de0a37a0f8d1e44f4b270172c4f17a Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 13 Aug 2014 15:40:02 -0700 Subject: memory-barriers: Fix description of 2-legged-if-based control dependencies Sad to say, current compilers really will hoist identical stores from both branches of an "if" statement to precede the conditional. This commit therefore updates the description of control dependencies to reflect this ugly reality. Reported-by: Pranith Kumar Reported-by: Peter Zijlstra Signed-off-by: Paul E. McKenney diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 600b45c..22a969c 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -574,30 +574,14 @@ However, stores are not speculated. This means that ordering -is- provided in the following example: q = ACCESS_ONCE(a); - if (ACCESS_ONCE(q)) { - ACCESS_ONCE(b) = p; - } - -Please note that ACCESS_ONCE() is not optional! Without the ACCESS_ONCE(), -the compiler is within its rights to transform this example: - - q = a; if (q) { - b = p; /* BUG: Compiler can reorder!!! */ - do_something(); - } else { - b = p; /* BUG: Compiler can reorder!!! */ - do_something_else(); + ACCESS_ONCE(b) = p; } -into this, which of course defeats the ordering: - - b = p; - q = a; - if (q) - do_something(); - else - do_something_else(); +Please note that ACCESS_ONCE() is not optional! Without the +ACCESS_ONCE(), might combine the load from 'a' with other loads from +'a', and the store to 'b' with other stores to 'b', with possible highly +counterintuitive effects on ordering. Worse yet, if the compiler is able to prove (say) that the value of variable 'a' is always non-zero, it would be well within its rights @@ -605,11 +589,12 @@ to optimize the original example by eliminating the "if" statement as follows: q = a; - b = p; /* BUG: Compiler can reorder!!! */ - do_something(); + b = p; /* BUG: Compiler and CPU can both reorder!!! */ -The solution is again ACCESS_ONCE() and barrier(), which preserves the -ordering between the load from variable 'a' and the store to variable 'b': +So don't leave out the ACCESS_ONCE(). + +It is tempting to try to enforce ordering on identical stores on both +branches of the "if" statement as follows: q = ACCESS_ONCE(a); if (q) { @@ -622,18 +607,11 @@ ordering between the load from variable 'a' and the store to variable 'b': do_something_else(); } -The initial ACCESS_ONCE() is required to prevent the compiler from -proving the value of 'a', and the pair of barrier() invocations are -required to prevent the compiler from pulling the two identical stores -to 'b' out from the legs of the "if" statement. - -It is important to note that control dependencies absolutely require a -a conditional. For example, the following "optimized" version of -the above example breaks ordering, which is why the barrier() invocations -are absolutely required if you have identical stores in both legs of -the "if" statement: +Unfortunately, current compilers will transform this as follows at high +optimization levels: q = ACCESS_ONCE(a); + barrier(); ACCESS_ONCE(b) = p; /* BUG: No ordering vs. load from a!!! */ if (q) { /* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */ @@ -643,21 +621,36 @@ the "if" statement: do_something_else(); } -It is of course legal for the prior load to be part of the conditional, -for example, as follows: +Now there is no conditional between the load from 'a' and the store to +'b', which means that the CPU is within its rights to reorder them: +The conditional is absolutely required, and must be present in the +assembly code even after all compiler optimizations have been applied. +Therefore, if you need ordering in this example, you need explicit +memory barriers, for example, smp_store_release(): - if (ACCESS_ONCE(a) > 0) { - barrier(); - ACCESS_ONCE(b) = q / 2; + q = ACCESS_ONCE(a); + if (q) { + smp_store_release(&b, p); do_something(); } else { - barrier(); - ACCESS_ONCE(b) = q / 3; + smp_store_release(&b, p); + do_something_else(); + } + +In contrast, without explicit memory barriers, two-legged-if control +ordering is guaranteed only when the stores differ, for example: + + q = ACCESS_ONCE(a); + if (q) { + ACCESS_ONCE(b) = p; + do_something(); + } else { + ACCESS_ONCE(b) = r; do_something_else(); } -This will again ensure that the load from variable 'a' is ordered before the -stores to variable 'b'. +The initial ACCESS_ONCE() is still required to prevent the compiler from +proving the value of 'a'. In addition, you need to be careful what you do with the local variable 'q', otherwise the compiler might be able to guess the value and again remove @@ -665,12 +658,10 @@ the needed conditional. For example: q = ACCESS_ONCE(a); if (q % MAX) { - barrier(); ACCESS_ONCE(b) = p; do_something(); } else { - barrier(); - ACCESS_ONCE(b) = p; + ACCESS_ONCE(b) = r; do_something_else(); } @@ -679,15 +670,15 @@ equal to zero, in which case the compiler is within its rights to transform the above code into the following: q = ACCESS_ONCE(a); - barrier(); ACCESS_ONCE(b) = p; do_something_else(); -This transformation fails to require that the CPU respect the ordering -between the load from variable 'a' and the store to variable 'b'. -Yes, the barrier() is still there, but it affects only the compiler, -not the CPU. Therefore, if you are relying on this ordering, you should -do something like the following: +Given this transformation, the CPU is not required to respect the ordering +between the load from variable 'a' and the store to variable 'b'. It is +tempting to add a barrier(), but this does not help. The conditional +is gone, and the barrier won't bring it back. Therefore, if you are +relying on this ordering, you should make sure that MAX is greater than +one, perhaps as follows: q = ACCESS_ONCE(a); BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */ @@ -695,10 +686,14 @@ do something like the following: ACCESS_ONCE(b) = p; do_something(); } else { - ACCESS_ONCE(b) = p; + ACCESS_ONCE(b) = r; do_something_else(); } +Please note once again that the stores to 'b' differ. If they were +identical, as noted earlier, the compiler could pull this store outside +of the 'if' statement. + Finally, control dependencies do -not- provide transitivity. This is demonstrated by two related examples, with the initial values of x and y both being zero: -- cgit v0.10.2 From 4de376a1b14e32f550931274f06b571abc0f3d4b Mon Sep 17 00:00:00 2001 From: Pranith Kumar Date: Tue, 8 Jul 2014 17:46:50 -0400 Subject: rcu: Remove remaining read-modify-write ACCESS_ONCE() calls Change the remaining uses of ACCESS_ONCE() so that each ACCESS_ONCE() either does a load or a store, but not both. Signed-off-by: Pranith Kumar Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 1b70cb6..4b526ca4 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1684,7 +1684,8 @@ static int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in) if (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) { raw_spin_lock_irq(&rnp->lock); smp_mb__after_unlock_lock(); - ACCESS_ONCE(rsp->gp_flags) &= ~RCU_GP_FLAG_FQS; + ACCESS_ONCE(rsp->gp_flags) = + ACCESS_ONCE(rsp->gp_flags) & ~RCU_GP_FLAG_FQS; raw_spin_unlock_irq(&rnp->lock); } return fqs_state; @@ -2505,7 +2506,8 @@ static void force_quiescent_state(struct rcu_state *rsp) raw_spin_unlock_irqrestore(&rnp_old->lock, flags); return; /* Someone beat us to it. */ } - ACCESS_ONCE(rsp->gp_flags) |= RCU_GP_FLAG_FQS; + ACCESS_ONCE(rsp->gp_flags) = + ACCESS_ONCE(rsp->gp_flags) | RCU_GP_FLAG_FQS; raw_spin_unlock_irqrestore(&rnp_old->lock, flags); wake_up(&rsp->gp_wq); /* Memory barrier implied by wake_up() path. */ } diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index a7997e2..218fae3 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -897,7 +897,8 @@ void synchronize_rcu_expedited(void) /* Clean up and exit. */ smp_mb(); /* ensure expedited GP seen before counter increment. */ - ACCESS_ONCE(sync_rcu_preempt_exp_count)++; + ACCESS_ONCE(sync_rcu_preempt_exp_count) = + sync_rcu_preempt_exp_count + 1; unlock_mb_ret: mutex_unlock(&sync_rcu_preempt_exp_mutex); mb_ret: @@ -2428,8 +2429,9 @@ static int rcu_nocb_kthread(void *arg) list = next; } trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1); - ACCESS_ONCE(rdp->nocb_p_count) -= c; - ACCESS_ONCE(rdp->nocb_p_count_lazy) -= cl; + ACCESS_ONCE(rdp->nocb_p_count) = rdp->nocb_p_count - c; + ACCESS_ONCE(rdp->nocb_p_count_lazy) = + rdp->nocb_p_count_lazy - cl; rdp->n_nocbs_invoked += c; } return 0; -- cgit v0.10.2 From bf33eb1aef23e8049cd222471d35b0988c420b18 Mon Sep 17 00:00:00 2001 From: Pranith Kumar Date: Tue, 8 Jul 2014 18:26:10 -0400 Subject: rcu: Fix sparse warning about rcu_batches_completed_preempt() being non-static fix sparse warning about rcu_batches_completed_preempt() being non-static by marking it as static Signed-off-by: Pranith Kumar Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 218fae3..5defa2f 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -134,7 +134,7 @@ static void __init rcu_bootup_announce(void) * Return the number of RCU-preempt batches processed thus far * for debug and statistics. */ -long rcu_batches_completed_preempt(void) +static long rcu_batches_completed_preempt(void) { return rcu_preempt_state.completed; } -- cgit v0.10.2 From f534ed1fd71cea885a59255d9b44c3b17df03eb1 Mon Sep 17 00:00:00 2001 From: Pranith Kumar Date: Tue, 8 Jul 2014 18:26:11 -0400 Subject: rcu: Use bool type for return value in rcu_is_watching() Use a bool type for return in rcu_is_watching(). Signed-off-by: Pranith Kumar Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 4b526ca4..253ea55 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -819,7 +819,7 @@ bool notrace __rcu_is_watching(void) */ bool notrace rcu_is_watching(void) { - int ret; + bool ret; preempt_disable(); ret = __rcu_is_watching(); -- cgit v0.10.2 From d0bc90fd37e50e4ea22c51c26947fd78c2a7a6c2 Mon Sep 17 00:00:00 2001 From: Pranith Kumar Date: Tue, 8 Jul 2014 18:26:13 -0400 Subject: rcu: Return bool type for rcu_try_advance_all_cbs() Return a bool type instead of 0 in rcu_try_advance_all_cbs(). Signed-off-by: Pranith Kumar Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 5defa2f..bb56456 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1626,7 +1626,7 @@ static bool __maybe_unused rcu_try_advance_all_cbs(void) /* Exit early if we advanced recently. */ if (jiffies == rdtp->last_advance_all) - return 0; + return false; rdtp->last_advance_all = jiffies; for_each_rcu_flavor(rsp) { -- cgit v0.10.2 From 521d24ee598bd8a8b71d7ac76ce2c0da0e548406 Mon Sep 17 00:00:00 2001 From: Pranith Kumar Date: Tue, 8 Jul 2014 18:26:18 -0400 Subject: rcu: Return bool type in rcu_lockdep_current_cpu_online() Return true instead of 1 in rcu_lockdep_current_cpu_online() as this has bool as return type. Signed-off-by: Pranith Kumar Signed-off-by: Paul E. McKenney diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index d231aa1..7e47e44 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -349,7 +349,7 @@ bool rcu_lockdep_current_cpu_online(void); #else /* #if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_PROVE_RCU) */ static inline bool rcu_lockdep_current_cpu_online(void) { - return 1; + return true; } #endif /* #else #if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_PROVE_RCU) */ -- cgit v0.10.2 From e02b2edfa13878c6671d31d5c736f56f89d99bf1 Mon Sep 17 00:00:00 2001 From: Pranith Kumar Date: Wed, 9 Jul 2014 00:08:17 -0400 Subject: rcu: Use true/false instead of 1/0 for a bool type This commit uses true/false instead of 1/0 for bool types in rcu_gp_fqs() and force_qs_rnp(). Signed-off-by: Pranith Kumar Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 253ea55..2719978 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1668,7 +1668,7 @@ static int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in) if (fqs_state == RCU_SAVE_DYNTICK) { /* Collect dyntick-idle snapshots. */ if (is_sysidle_rcu_state(rsp)) { - isidle = 1; + isidle = true; maxj = jiffies - ULONG_MAX / 4; } force_qs_rnp(rsp, dyntick_save_progress_counter, @@ -1677,7 +1677,7 @@ static int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in) fqs_state = RCU_FORCE_QS; } else { /* Handle dyntick-idle and offline CPUs. */ - isidle = 0; + isidle = false; force_qs_rnp(rsp, rcu_implicit_dynticks_qs, &isidle, &maxj); } /* Clear flag to prevent immediate re-entry. */ @@ -2450,7 +2450,7 @@ static void force_qs_rnp(struct rcu_state *rsp, for (; cpu <= rnp->grphi; cpu++, bit <<= 1) { if ((rnp->qsmask & bit) != 0) { if ((rnp->qsmaskinit & bit) != 0) - *isidle = 0; + *isidle = false; if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj)) mask |= bit; } -- cgit v0.10.2 From 85b39d305bfe809a11ff2770d380be3e2465beec Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Tue, 8 Jul 2014 15:17:59 -0700 Subject: rcu: Uninline rcu_read_lock_held() This commit uninlines rcu_read_lock_held(). According to "size vmlinux" this saves 28549 in .text: - 5541731 3014560 14757888 23314179 + 5513182 3026848 14757888 23297918 Note: it looks as if the data grows by 12288 bytes but this is not true, it does not actually grow. But .data starts with ALIGN(THREAD_SIZE) and since .text shrinks the padding grows, and thus .data grows too as it seen by /bin/size. diff System.map: - ffffffff81510000 D _sdata - ffffffff81510000 D init_thread_union + ffffffff81509000 D _sdata + ffffffff8150c000 D init_thread_union Perhaps we can change vmlinux.lds.S to .data itself, so that /bin/size can't "wrongly" report that .data grows if .text shinks. Signed-off-by: Oleg Nesterov Signed-off-by: Paul E. McKenney diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 7e47e44..321ed0d 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -371,41 +371,7 @@ extern struct lockdep_map rcu_sched_lock_map; extern struct lockdep_map rcu_callback_map; int debug_lockdep_rcu_enabled(void); -/** - * rcu_read_lock_held() - might we be in RCU read-side critical section? - * - * If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an RCU - * read-side critical section. In absence of CONFIG_DEBUG_LOCK_ALLOC, - * this assumes we are in an RCU read-side critical section unless it can - * prove otherwise. This is useful for debug checks in functions that - * require that they be called within an RCU read-side critical section. - * - * Checks debug_lockdep_rcu_enabled() to prevent false positives during boot - * and while lockdep is disabled. - * - * Note that rcu_read_lock() and the matching rcu_read_unlock() must - * occur in the same context, for example, it is illegal to invoke - * rcu_read_unlock() in process context if the matching rcu_read_lock() - * was invoked from within an irq handler. - * - * Note that rcu_read_lock() is disallowed if the CPU is either idle or - * offline from an RCU perspective, so check for those as well. - */ -static inline int rcu_read_lock_held(void) -{ - if (!debug_lockdep_rcu_enabled()) - return 1; - if (!rcu_is_watching()) - return 0; - if (!rcu_lockdep_current_cpu_online()) - return 0; - return lock_is_held(&rcu_lock_map); -} - -/* - * rcu_read_lock_bh_held() is defined out of line to avoid #include-file - * hell. - */ +int rcu_read_lock_held(void); int rcu_read_lock_bh_held(void); /** diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 4056d79..ea8ea7b 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -137,6 +137,38 @@ int notrace debug_lockdep_rcu_enabled(void) EXPORT_SYMBOL_GPL(debug_lockdep_rcu_enabled); /** + * rcu_read_lock_held() - might we be in RCU read-side critical section? + * + * If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an RCU + * read-side critical section. In absence of CONFIG_DEBUG_LOCK_ALLOC, + * this assumes we are in an RCU read-side critical section unless it can + * prove otherwise. This is useful for debug checks in functions that + * require that they be called within an RCU read-side critical section. + * + * Checks debug_lockdep_rcu_enabled() to prevent false positives during boot + * and while lockdep is disabled. + * + * Note that rcu_read_lock() and the matching rcu_read_unlock() must + * occur in the same context, for example, it is illegal to invoke + * rcu_read_unlock() in process context if the matching rcu_read_lock() + * was invoked from within an irq handler. + * + * Note that rcu_read_lock() is disallowed if the CPU is either idle or + * offline from an RCU perspective, so check for those as well. + */ +int rcu_read_lock_held(void) +{ + if (!debug_lockdep_rcu_enabled()) + return 1; + if (!rcu_is_watching()) + return 0; + if (!rcu_lockdep_current_cpu_online()) + return 0; + return lock_is_held(&rcu_lock_map); +} +EXPORT_SYMBOL_GPL(rcu_read_lock_held); + +/** * rcu_read_lock_bh_held() - might we be in RCU-bh read-side critical section? * * Check for bottom half being disabled, which covers both the -- cgit v0.10.2 From a8a29b3b7b18251c4e3ffce501f25ae868302a75 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Sat, 12 Jul 2014 19:01:49 +0200 Subject: rcu: Define tracepoint strings only if CONFIG_TRACING is set Commit f7f7bac9cb1c ("rcu: Have the RCU tracepoints use the tracepoint_string infrastructure") unconditionally populates the __tracepoint_str input section, but this section is not assigned an output section if CONFIG_TRACING is not set. This results in the __tracepoint_str turning up in unexpected places, i.e., after _edata. Signed-off-by: Ard Biesheuvel Reviewed-by: Steven Rostedt Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 2719978..dc52dc3 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -79,9 +79,18 @@ static struct lock_class_key rcu_fqs_class[RCU_NUM_LVLS]; * the tracing userspace tools to be able to decipher the string * address to the matching string. */ -#define RCU_STATE_INITIALIZER(sname, sabbr, cr) \ +#ifdef CONFIG_TRACING +# define DEFINE_RCU_TPS(sname) \ static char sname##_varname[] = #sname; \ -static const char *tp_##sname##_varname __used __tracepoint_string = sname##_varname; \ +static const char *tp_##sname##_varname __used __tracepoint_string = sname##_varname; +# define RCU_STATE_NAME(sname) sname##_varname +#else +# define DEFINE_RCU_TPS(sname) +# define RCU_STATE_NAME(sname) __stringify(sname) +#endif + +#define RCU_STATE_INITIALIZER(sname, sabbr, cr) \ +DEFINE_RCU_TPS(sname) \ struct rcu_state sname##_state = { \ .level = { &sname##_state.node[0] }, \ .call = cr, \ @@ -93,7 +102,7 @@ struct rcu_state sname##_state = { \ .orphan_donetail = &sname##_state.orphan_donelist, \ .barrier_mutex = __MUTEX_INITIALIZER(sname##_state.barrier_mutex), \ .onoff_mutex = __MUTEX_INITIALIZER(sname##_state.onoff_mutex), \ - .name = sname##_varname, \ + .name = RCU_STATE_NAME(sname), \ .abbr = sabbr, \ }; \ DEFINE_PER_CPU(struct rcu_data, sname##_data) -- cgit v0.10.2 From fafb6e843f229a6e842a22773f16d93194ca06e4 Mon Sep 17 00:00:00 2001 From: Pranith Kumar Date: Tue, 15 Jul 2014 18:31:47 -0400 Subject: rcu: Update tiny.c references to tree.c This commit updates the references to rcutree.c which is now rcu/tree.c Signed-off-by: Pranith Kumar Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c index d9efcc1..6bd785c 100644 --- a/kernel/rcu/tiny.c +++ b/kernel/rcu/tiny.c @@ -51,7 +51,7 @@ static long long rcu_dynticks_nesting = DYNTICK_TASK_EXIT_IDLE; #include "tiny_plugin.h" -/* Common code for rcu_idle_enter() and rcu_irq_exit(), see kernel/rcutree.c. */ +/* Common code for rcu_idle_enter() and rcu_irq_exit(), see kernel/rcu/tree.c. */ static void rcu_idle_enter_common(long long newval) { if (newval) { @@ -114,7 +114,7 @@ void rcu_irq_exit(void) } EXPORT_SYMBOL_GPL(rcu_irq_exit); -/* Common code for rcu_idle_exit() and rcu_irq_enter(), see kernel/rcutree.c. */ +/* Common code for rcu_idle_exit() and rcu_irq_enter(), see kernel/rcu/tree.c. */ static void rcu_idle_exit_common(long long oldval) { if (oldval) { -- cgit v0.10.2 From 66d701ea7e148f8ed8b1497c9159fbf6175d462f Mon Sep 17 00:00:00 2001 From: Pranith Kumar Date: Wed, 16 Jul 2014 22:20:33 -0400 Subject: rcu: Remove stale comment in tree.c This commit removes a stale comment in rcu/tree.c which was left out when some code was moved around previously in commit 2036d94a7b61 ("rcu: Rework detection of use of RCU by offline CPUs") For reference, the following updated comment exists a few lines below this which means the same: /* Remove the outgoing CPU from the masks in the rcu_node hierarchy. */ Signed-off-by: Pranith Kumar Reviewed-by: Josh Triplett Reviewed-by: Lai Jiangshan Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index dc52dc3..dd6c8b5 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2220,8 +2220,6 @@ static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp) /* Adjust any no-longer-needed kthreads. */ rcu_boost_kthread_setaffinity(rnp, -1); - /* Remove the dead CPU from the bitmasks in the rcu_node hierarchy. */ - /* Exclude any attempts to start a new grace period. */ mutex_lock(&rsp->onoff_mutex); raw_spin_lock_irqsave(&rsp->orphan_lock, flags); -- cgit v0.10.2 From 9fdd3bc9005824704f9802bec7b3e06f5edae434 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 29 Jul 2014 14:50:47 -0700 Subject: rcu: Break more call_rcu() deadlock involving scheduler and perf Commit 96d3fd0d315a9 (rcu: Break call_rcu() deadlock involving scheduler and perf) covered the case where __call_rcu_nocb_enqueue() needs to wake the rcuo kthread due to the queue being initially empty, but did not do anything for the case where the queue was overflowing. This commit therefore also defers wakeup for the overflow case. Signed-off-by: Paul E. McKenney diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h index aca3822..9b56f37 100644 --- a/include/trace/events/rcu.h +++ b/include/trace/events/rcu.h @@ -180,9 +180,12 @@ TRACE_EVENT(rcu_grace_period_init, * argument is a string as follows: * * "WakeEmpty": Wake rcuo kthread, first CB to empty list. + * "WakeEmptyIsDeferred": Wake rcuo kthread later, first CB to empty list. * "WakeOvf": Wake rcuo kthread, CB list is huge. + * "WakeOvfIsDeferred": Wake rcuo kthread later, CB list is huge. * "WakeNot": Don't wake rcuo kthread. * "WakeNotPoll": Don't wake rcuo kthread because it is polling. + * "DeferredWake": Carried out the "IsDeferred" wakeup. * "Poll": Start of new polling cycle for rcu_nocb_poll. * "Sleep": Sleep waiting for CBs for !rcu_nocb_poll. * "WokeEmpty": rcuo kthread woke to find empty list. diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 6a86eb7..e33562f 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -350,7 +350,7 @@ struct rcu_data { int nocb_p_count_lazy; /* (approximate). */ wait_queue_head_t nocb_wq; /* For nocb kthreads to sleep on. */ struct task_struct *nocb_kthread; - bool nocb_defer_wakeup; /* Defer wakeup of nocb_kthread. */ + int nocb_defer_wakeup; /* Defer wakeup of nocb_kthread. */ /* The following fields are used by the leader, hence own cacheline. */ struct rcu_head *nocb_gp_head ____cacheline_internodealigned_in_smp; @@ -383,6 +383,11 @@ struct rcu_data { #define RCU_FORCE_QS 3 /* Need to force quiescent state. */ #define RCU_SIGNAL_INIT RCU_SAVE_DYNTICK +/* Values for nocb_defer_wakeup field in struct rcu_data. */ +#define RCU_NOGP_WAKE_NOT 0 +#define RCU_NOGP_WAKE 1 +#define RCU_NOGP_WAKE_FORCE 2 + #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500)) /* For jiffies_till_first_fqs and */ /* and jiffies_till_next_fqs. */ @@ -589,7 +594,7 @@ static bool __call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *rhp, static bool rcu_nocb_adopt_orphan_cbs(struct rcu_state *rsp, struct rcu_data *rdp, unsigned long flags); -static bool rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp); +static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp); static void do_nocb_deferred_wakeup(struct rcu_data *rdp); static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp); static void rcu_spawn_nocb_kthreads(struct rcu_state *rsp); diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index bb56456..d67cc5c 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2121,16 +2121,23 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp, trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("WakeEmpty")); } else { - rdp->nocb_defer_wakeup = true; + rdp->nocb_defer_wakeup = RCU_NOGP_WAKE; trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("WakeEmptyIsDeferred")); } rdp->qlen_last_fqs_check = 0; } else if (len > rdp->qlen_last_fqs_check + qhimark) { /* ... or if many callbacks queued. */ - wake_nocb_leader(rdp, true); + if (!irqs_disabled_flags(flags)) { + wake_nocb_leader(rdp, true); + trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, + TPS("WakeOvf")); + } else { + rdp->nocb_defer_wakeup = RCU_NOGP_WAKE_FORCE; + trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, + TPS("WakeOvfIsDeferred")); + } rdp->qlen_last_fqs_check = LONG_MAX / 2; - trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("WakeOvf")); } else { trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("WakeNot")); } @@ -2438,7 +2445,7 @@ static int rcu_nocb_kthread(void *arg) } /* Is a deferred wakeup of rcu_nocb_kthread() required? */ -static bool rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp) +static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp) { return ACCESS_ONCE(rdp->nocb_defer_wakeup); } @@ -2446,11 +2453,14 @@ static bool rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp) /* Do a deferred wakeup of rcu_nocb_kthread(). */ static void do_nocb_deferred_wakeup(struct rcu_data *rdp) { + int ndw; + if (!rcu_nocb_need_deferred_wakeup(rdp)) return; - ACCESS_ONCE(rdp->nocb_defer_wakeup) = false; - wake_nocb_leader(rdp, false); - trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("DeferredWakeEmpty")); + ndw = ACCESS_ONCE(rdp->nocb_defer_wakeup); + ACCESS_ONCE(rdp->nocb_defer_wakeup) = RCU_NOGP_WAKE_NOT; + wake_nocb_leader(rdp, ndw == RCU_NOGP_WAKE_FORCE); + trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("DeferredWake")); } /* Initialize per-rcu_data variables for no-CBs CPUs. */ @@ -2557,7 +2567,7 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp) { } -static bool rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp) +static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp) { return false; } -- cgit v0.10.2 From ade9862470dd0595d8e292ecea8445ed90b98df5 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 31 Jul 2014 16:02:33 -0700 Subject: rcu: Make TINY_RCU tinier by putting error checks under #ifdef The rcu_idle_enter_common() and rcu_idle_exit_common() functions contain error checks that have to the best of my knowledge have never triggered over the past several years. These are nevertheless valuable when creating new architectures or doing other low-level changes, so the checks should not be deleted. This commit instead places these checks under #ifdef CONFIG_RCU_TRACE so that they are executed only when specifically requested. The savings are significant: Before: text data bss dec hex filename 1749 39 0 1788 6fc /tmp/b/kernel/rcu/tiny.o 632 152 0 784 310 /tmp/b/kernel/rcu/update.o ---- 2572 After: text data bss dec hex filename 1281 37 0 1318 526 /tmp/b/kernel/rcu/tiny.o 632 152 0 784 310 /tmp/b/kernel/rcu/update.o ---- 2102 This amounts to 470 bytes, or 18% of the original. Switched from #ifdef to IS_ENABLED() on Josh Triplett's advice. Signed-off-by: Paul E. McKenney Reviewed-by: Josh Triplett diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c index 6bd785c..4a55a24 100644 --- a/kernel/rcu/tiny.c +++ b/kernel/rcu/tiny.c @@ -62,7 +62,7 @@ static void rcu_idle_enter_common(long long newval) } RCU_TRACE(trace_rcu_dyntick(TPS("Start"), rcu_dynticks_nesting, newval)); - if (!is_idle_task(current)) { + if (IS_ENABLED(CONFIG_RCU_TRACE) && !is_idle_task(current)) { struct task_struct *idle __maybe_unused = idle_task(smp_processor_id()); RCU_TRACE(trace_rcu_dyntick(TPS("Entry error: not idle task"), @@ -123,7 +123,7 @@ static void rcu_idle_exit_common(long long oldval) return; } RCU_TRACE(trace_rcu_dyntick(TPS("End"), oldval, rcu_dynticks_nesting)); - if (!is_idle_task(current)) { + if (IS_ENABLED(CONFIG_RCU_TRACE) && !is_idle_task(current)) { struct task_struct *idle __maybe_unused = idle_task(smp_processor_id()); RCU_TRACE(trace_rcu_dyntick(TPS("Exit error: not idle task"), -- cgit v0.10.2 From 2aa792e6faf1a00f5accf1f69e87e11a390ba2cd Mon Sep 17 00:00:00 2001 From: Pranith Kumar Date: Tue, 12 Aug 2014 13:07:47 -0400 Subject: rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads The rcu_gp_kthread_wake() function checks for three conditions before waking up grace period kthreads: * Is the thread we are trying to wake up the current thread? * Are the gp_flags zero? (all threads wait on non-zero gp_flags condition) * Is there no thread created for this flavour, hence nothing to wake up? If any one of these condition is true, we do not call wake_up(). It was found that there are quite a few avoidable wake ups both during idle time and under stress induced by rcutorture. Idle: Total:66000, unnecessary:66000, case1:61827, case2:66000, case3:0 Total:68000, unnecessary:68000, case1:63696, case2:68000, case3:0 rcutorture: Total:254000, unnecessary:254000, case1:199913, case2:254000, case3:0 Total:256000, unnecessary:256000, case1:201784, case2:256000, case3:0 Here case{1-3} are the cases listed above. We can avoid these wake ups by using rcu_gp_kthread_wake() to conditionally wake up the grace period kthreads. There is a comment about an implied barrier supplied by the wake_up() logic. This barrier is necessary for the awakened thread to see the updated ->gp_flags. This flag is always being updated with the root node lock held. Also, the awakened thread tries to acquire the root node lock before reading ->gp_flags because of which there is proper ordering. Hence this commit tries to avoid calling wake_up() whenever we can by using rcu_gp_kthread_wake() function. Signed-off-by: Pranith Kumar CC: Mathieu Desnoyers Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index dd6c8b5..9e83cd9 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1938,7 +1938,7 @@ static void rcu_report_qs_rsp(struct rcu_state *rsp, unsigned long flags) { WARN_ON_ONCE(!rcu_gp_in_progress(rsp)); raw_spin_unlock_irqrestore(&rcu_get_root(rsp)->lock, flags); - wake_up(&rsp->gp_wq); /* Memory barrier implied by wake_up() path. */ + rcu_gp_kthread_wake(rsp); } /* @@ -2516,7 +2516,7 @@ static void force_quiescent_state(struct rcu_state *rsp) ACCESS_ONCE(rsp->gp_flags) = ACCESS_ONCE(rsp->gp_flags) | RCU_GP_FLAG_FQS; raw_spin_unlock_irqrestore(&rnp_old->lock, flags); - wake_up(&rsp->gp_wq); /* Memory barrier implied by wake_up() path. */ + rcu_gp_kthread_wake(rsp); } /* -- cgit v0.10.2 From 73a860cd58a1eb258e889b615cebf738ab33aa23 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 14 Aug 2014 10:28:23 -0700 Subject: rcu: Replace flush_signals() with WARN_ON(signal_pending()) Currently, when RCU awakens from a wait_event_interruptible() that might have awakened prematurely, it does a flush_signals(). This is done on the off-chance that someone figured out how to deliver a signal to a kthread, which is supposed to be impossible. Given that this is supposed to be impossible, this commit changes the flush_signals() calls into WARN_ON(signal_pending()). Reported-by: Oleg Nesterov Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 9e83cd9..3e002c1 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1796,7 +1796,7 @@ static int __noreturn rcu_gp_kthread(void *arg) if (rcu_gp_init(rsp)) break; cond_resched(); - flush_signals(current); + WARN_ON(signal_pending(current)); trace_rcu_grace_period(rsp->name, ACCESS_ONCE(rsp->gpnum), TPS("reqwaitsig")); @@ -1842,7 +1842,7 @@ static int __noreturn rcu_gp_kthread(void *arg) } else { /* Deal with stray signal. */ cond_resched(); - flush_signals(current); + WARN_ON(signal_pending(current)); trace_rcu_grace_period(rsp->name, ACCESS_ONCE(rsp->gpnum), TPS("fqswaitsig")); diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index d67cc5c..bbb0a0c 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2237,7 +2237,7 @@ static void rcu_nocb_wait_gp(struct rcu_data *rdp) (d = ULONG_CMP_GE(ACCESS_ONCE(rnp->completed), c))); if (likely(d)) break; - flush_signals(current); + WARN_ON(signal_pending(current)); trace_rcu_future_gp(rnp, rdp, c, TPS("ResumeWait")); } trace_rcu_future_gp(rnp, rdp, c, TPS("EndWait")); @@ -2296,7 +2296,7 @@ wait_again: if (!rcu_nocb_poll) trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, "WokeEmpty"); - flush_signals(current); + WARN_ON(signal_pending(current)); schedule_timeout_interruptible(1); /* Rescan in case we were a victim of memory ordering. */ @@ -2375,7 +2375,7 @@ static void nocb_follower_wait(struct rcu_data *rdp) if (!rcu_nocb_poll) trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, "WokeEmpty"); - flush_signals(current); + WARN_ON(signal_pending(current)); schedule_timeout_interruptible(1); } } -- cgit v0.10.2 From 58ade2dbe9a253635e0835adedfaa822849aa3a3 Mon Sep 17 00:00:00 2001 From: Pranith Kumar Date: Wed, 11 Jun 2014 16:39:43 -0400 Subject: rcutorture: Fix a sparse warning by marking boost_mutex static This commit fixes the following sparse warning by marking boost_mutex static: kernel/rcu/rcutorture.c:185:1: warning: symbol 'boost_mutex' was not declared. Should it be static? Signed-off-by: Pranith Kumar Signed-off-by: Paul E. McKenney Reviewed-by: Josh Triplett diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 948a769..7e67711 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -182,7 +182,7 @@ static u64 notrace rcu_trace_clock_local(void) #endif /* #else #ifdef CONFIG_RCU_TRACE */ static unsigned long boost_starttime; /* jiffies of next boost test start. */ -DEFINE_MUTEX(boost_mutex); /* protect setting boost_starttime */ +static DEFINE_MUTEX(boost_mutex); /* protect setting boost_starttime */ /* and boost task create/destroy. */ static atomic_t barrier_cbs_count; /* Barrier callbacks registered. */ static bool barrier_phase; /* Test phase. */ -- cgit v0.10.2 From 1a5e31fbf9199212915095c47ebf22d0715d3389 Mon Sep 17 00:00:00 2001 From: Pranith Kumar Date: Fri, 11 Jul 2014 17:31:27 -0400 Subject: rcutorture: Use bash shell for all the test scripts Some of the scripts encode a default /bin/sh shell. On systems which use dash as default shell, these scripts fail as they are bash scripts. I encountered this while testing the sprintf() changes on a Debian system where dash is the default shell. This commit changes all such uses to use bash explicitly. Signed-off-by: Pranith Kumar Signed-off-by: Paul E. McKenney diff --git a/tools/testing/selftests/rcutorture/bin/config2frag.sh b/tools/testing/selftests/rcutorture/bin/config2frag.sh index 9f9ffcd..4e394ef 100644 --- a/tools/testing/selftests/rcutorture/bin/config2frag.sh +++ b/tools/testing/selftests/rcutorture/bin/config2frag.sh @@ -1,5 +1,5 @@ -#!/bin/sh -# Usage: sh config2frag.sh < .config > configfrag +#!/bin/bash +# Usage: bash config2frag.sh < .config > configfrag # # Converts the "# CONFIG_XXX is not set" to "CONFIG_XXX=n" so that the # resulting file becomes a legitimate Kconfig fragment. diff --git a/tools/testing/selftests/rcutorture/bin/configcheck.sh b/tools/testing/selftests/rcutorture/bin/configcheck.sh index d686537..6173ed5 100755 --- a/tools/testing/selftests/rcutorture/bin/configcheck.sh +++ b/tools/testing/selftests/rcutorture/bin/configcheck.sh @@ -1,5 +1,5 @@ -#!/bin/sh -# Usage: sh configcheck.sh .config .config-template +#!/bin/bash +# Usage: bash configcheck.sh .config .config-template # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/tools/testing/selftests/rcutorture/bin/configinit.sh b/tools/testing/selftests/rcutorture/bin/configinit.sh index 9c3f3d3..d8f7418 100755 --- a/tools/testing/selftests/rcutorture/bin/configinit.sh +++ b/tools/testing/selftests/rcutorture/bin/configinit.sh @@ -1,6 +1,6 @@ -#!/bin/sh +#!/bin/bash # -# sh configinit.sh config-spec-file [ build output dir ] +# bash configinit.sh config-spec-file [ build output dir ] # # Create a .config file from the spec file. Run from the kernel source tree. # Exits with 0 if all went well, with 1 if all went well but the config diff --git a/tools/testing/selftests/rcutorture/bin/kvm-build.sh b/tools/testing/selftests/rcutorture/bin/kvm-build.sh index 7c1e56b..e4bfb91 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-build.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-build.sh @@ -2,7 +2,7 @@ # # Build a kvm-ready Linux kernel from the tree in the current directory. # -# Usage: sh kvm-build.sh config-template build-dir more-configs +# Usage: bash kvm-build.sh config-template build-dir more-configs # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh index 7f1ff1a..30cbb63 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh @@ -2,7 +2,7 @@ # # Analyze a given results directory for locktorture progress. # -# Usage: sh kvm-recheck-lock.sh resdir +# Usage: bash kvm-recheck-lock.sh resdir # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh index 307c4b9..6e94a5e 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh @@ -2,7 +2,7 @@ # # Analyze a given results directory for rcutorture progress. # -# Usage: sh kvm-recheck-rcu.sh resdir +# Usage: bash kvm-recheck-rcu.sh resdir # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh index 3f6c9b7..3482b3f 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh @@ -4,7 +4,7 @@ # check the build and console output for errors. Given a directory # containing results directories, this recursively checks them all. # -# Usage: sh kvm-recheck.sh resdir ... +# Usage: bash kvm-recheck.sh resdir ... # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh index 0f69dcb..5c265da 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh @@ -6,7 +6,7 @@ # Execute this in the source tree. Do not run it as a background task # because qemu does not seem to like that much. # -# Usage: sh kvm-test-1-run.sh config builddir resdir minutes qemu-args boot_args +# Usage: bash kvm-test-1-run.sh config builddir resdir minutes qemu-args boot_args # # qemu-args defaults to "-nographic", along with arguments specifying the # number of CPUs and other options generated from diff --git a/tools/testing/selftests/rcutorture/bin/kvm.sh b/tools/testing/selftests/rcutorture/bin/kvm.sh index 589e9c3..ff147ad 100644 --- a/tools/testing/selftests/rcutorture/bin/kvm.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm.sh @@ -7,7 +7,7 @@ # Edit the definitions below to set the locations of the various directories, # as well as the test duration. # -# Usage: sh kvm.sh [ options ] +# Usage: bash kvm.sh [ options ] # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/tools/testing/selftests/rcutorture/bin/parse-build.sh b/tools/testing/selftests/rcutorture/bin/parse-build.sh index 5432309..41eeeef 100755 --- a/tools/testing/selftests/rcutorture/bin/parse-build.sh +++ b/tools/testing/selftests/rcutorture/bin/parse-build.sh @@ -1,4 +1,4 @@ -#!/bin/sh +#!/bin/bash # # Check the build output from an rcutorture run for goodness. # The "file" is a pathname on the local system, and "title" is @@ -7,7 +7,7 @@ # The file must contain kernel build output. # # Usage: -# sh parse-build.sh file title +# bash parse-build.sh file title # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/tools/testing/selftests/rcutorture/bin/parse-console.sh b/tools/testing/selftests/rcutorture/bin/parse-console.sh index 4185d4c..2517eae 100755 --- a/tools/testing/selftests/rcutorture/bin/parse-console.sh +++ b/tools/testing/selftests/rcutorture/bin/parse-console.sh @@ -1,11 +1,11 @@ -#!/bin/sh +#!/bin/bash # # Check the console output from an rcutorture run for oopses. # The "file" is a pathname on the local system, and "title" is # a text string for error-message purposes. # # Usage: -# sh parse-console.sh file title +# bash parse-console.sh file title # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/tools/testing/selftests/rcutorture/bin/parse-torture.sh b/tools/testing/selftests/rcutorture/bin/parse-torture.sh index 3455560..bbec40b 100755 --- a/tools/testing/selftests/rcutorture/bin/parse-torture.sh +++ b/tools/testing/selftests/rcutorture/bin/parse-torture.sh @@ -1,4 +1,4 @@ -#!/bin/sh +#!/bin/bash # # Check the console output from a torture run for goodness. # The "file" is a pathname on the local system, and "title" is @@ -8,7 +8,7 @@ # with other dmesg text, as in console-log output. # # Usage: -# sh parse-torture.sh file title +# bash parse-torture.sh file title # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by -- cgit v0.10.2 From 3327d924a7fef224754273d70224f130d63997c6 Mon Sep 17 00:00:00 2001 From: Pranith Kumar Date: Fri, 11 Jul 2014 19:47:35 -0400 Subject: rcutorture: Set executable bit and drop bash from Usage This commit sets the executable bit on test scripts config2frag.sh and kvm.sh. Since #!/bin/bash is set in all the scripts, this commit also drops it from all usage lines because the scripts can now all be invoked directly. Signed-off-by: Pranith Kumar Signed-off-by: Paul E. McKenney diff --git a/tools/testing/selftests/rcutorture/bin/config2frag.sh b/tools/testing/selftests/rcutorture/bin/config2frag.sh old mode 100644 new mode 100755 index 4e394ef..56f51ae --- a/tools/testing/selftests/rcutorture/bin/config2frag.sh +++ b/tools/testing/selftests/rcutorture/bin/config2frag.sh @@ -1,5 +1,5 @@ #!/bin/bash -# Usage: bash config2frag.sh < .config > configfrag +# Usage: config2frag.sh < .config > configfrag # # Converts the "# CONFIG_XXX is not set" to "CONFIG_XXX=n" so that the # resulting file becomes a legitimate Kconfig fragment. diff --git a/tools/testing/selftests/rcutorture/bin/configcheck.sh b/tools/testing/selftests/rcutorture/bin/configcheck.sh index 6173ed5..eee31e2 100755 --- a/tools/testing/selftests/rcutorture/bin/configcheck.sh +++ b/tools/testing/selftests/rcutorture/bin/configcheck.sh @@ -1,5 +1,5 @@ #!/bin/bash -# Usage: bash configcheck.sh .config .config-template +# Usage: configcheck.sh .config .config-template # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/tools/testing/selftests/rcutorture/bin/configinit.sh b/tools/testing/selftests/rcutorture/bin/configinit.sh index d8f7418..15f1a17 100755 --- a/tools/testing/selftests/rcutorture/bin/configinit.sh +++ b/tools/testing/selftests/rcutorture/bin/configinit.sh @@ -1,6 +1,6 @@ #!/bin/bash # -# bash configinit.sh config-spec-file [ build output dir ] +# Usage: configinit.sh config-spec-file [ build output dir ] # # Create a .config file from the spec file. Run from the kernel source tree. # Exits with 0 if all went well, with 1 if all went well but the config diff --git a/tools/testing/selftests/rcutorture/bin/kvm-build.sh b/tools/testing/selftests/rcutorture/bin/kvm-build.sh index e4bfb91..00cb0db 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-build.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-build.sh @@ -2,7 +2,7 @@ # # Build a kvm-ready Linux kernel from the tree in the current directory. # -# Usage: bash kvm-build.sh config-template build-dir more-configs +# Usage: kvm-build.sh config-template build-dir more-configs # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh index 30cbb63..43f7640 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh @@ -2,7 +2,7 @@ # # Analyze a given results directory for locktorture progress. # -# Usage: bash kvm-recheck-lock.sh resdir +# Usage: kvm-recheck-lock.sh resdir # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh index 6e94a5e..d6cc07f 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh @@ -2,7 +2,7 @@ # # Analyze a given results directory for rcutorture progress. # -# Usage: bash kvm-recheck-rcu.sh resdir +# Usage: kvm-recheck-rcu.sh resdir # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh index 3482b3f..4f5b20f 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh @@ -4,7 +4,7 @@ # check the build and console output for errors. Given a directory # containing results directories, this recursively checks them all. # -# Usage: bash kvm-recheck.sh resdir ... +# Usage: kvm-recheck.sh resdir ... # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh index 5c265da..6d2f5fa 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh @@ -6,7 +6,7 @@ # Execute this in the source tree. Do not run it as a background task # because qemu does not seem to like that much. # -# Usage: bash kvm-test-1-run.sh config builddir resdir minutes qemu-args boot_args +# Usage: kvm-test-1-run.sh config builddir resdir minutes qemu-args boot_args # # qemu-args defaults to "-nographic", along with arguments specifying the # number of CPUs and other options generated from diff --git a/tools/testing/selftests/rcutorture/bin/kvm.sh b/tools/testing/selftests/rcutorture/bin/kvm.sh old mode 100644 new mode 100755 index ff147ad..36534f9 --- a/tools/testing/selftests/rcutorture/bin/kvm.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm.sh @@ -7,7 +7,7 @@ # Edit the definitions below to set the locations of the various directories, # as well as the test duration. # -# Usage: bash kvm.sh [ options ] +# Usage: kvm.sh [ options ] # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/tools/testing/selftests/rcutorture/bin/parse-build.sh b/tools/testing/selftests/rcutorture/bin/parse-build.sh index 41eeeef..499d1e5 100755 --- a/tools/testing/selftests/rcutorture/bin/parse-build.sh +++ b/tools/testing/selftests/rcutorture/bin/parse-build.sh @@ -6,8 +6,7 @@ # # The file must contain kernel build output. # -# Usage: -# bash parse-build.sh file title +# Usage: parse-build.sh file title # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/tools/testing/selftests/rcutorture/bin/parse-console.sh b/tools/testing/selftests/rcutorture/bin/parse-console.sh index 2517eae..a0bde6c 100755 --- a/tools/testing/selftests/rcutorture/bin/parse-console.sh +++ b/tools/testing/selftests/rcutorture/bin/parse-console.sh @@ -4,8 +4,7 @@ # The "file" is a pathname on the local system, and "title" is # a text string for error-message purposes. # -# Usage: -# bash parse-console.sh file title +# Usage: parse-console.sh file title # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/tools/testing/selftests/rcutorture/bin/parse-torture.sh b/tools/testing/selftests/rcutorture/bin/parse-torture.sh index bbec40b..e3c5f07 100755 --- a/tools/testing/selftests/rcutorture/bin/parse-torture.sh +++ b/tools/testing/selftests/rcutorture/bin/parse-torture.sh @@ -7,8 +7,7 @@ # The file must contain torture output, but can be interspersed # with other dmesg text, as in console-log output. # -# Usage: -# bash parse-torture.sh file title +# Usage: parse-torture.sh file title # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by -- cgit v0.10.2 From 616fd166f64df42db7d1bdd12918d9105f3add05 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 18 Jul 2014 12:01:39 -0700 Subject: rcu: Add step to initrd documentation This commit tries to get people into the correct directory before creating the initrd directory. Signed-off-by: Paul E. McKenney Reviewed-by: Pranith Kumar diff --git a/tools/testing/selftests/rcutorture/doc/initrd.txt b/tools/testing/selftests/rcutorture/doc/initrd.txt index 49d134c..4170e71 100644 --- a/tools/testing/selftests/rcutorture/doc/initrd.txt +++ b/tools/testing/selftests/rcutorture/doc/initrd.txt @@ -6,6 +6,7 @@ this case. There are probably much better ways of doing this. That said, here are the commands: ------------------------------------------------------------------------ +cd tools/testing/selftests/rcutorture zcat /initrd.img > /tmp/initrd.img.zcat mkdir initrd cd initrd -- cgit v0.10.2 From 9e62b0efdcead5b66c0c006df2f19a449b22cf08 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 21 Jul 2014 13:13:35 -0700 Subject: rcutorture: Test partial nohz_full= configuration The current set of tests covers only cases where either all possible CPUs are nohz_full= CPUs or none of them are. Because there have been some recent bug escapes in cases where only some of the CPUs are nohz_full= CPUs, this commit add a configuration where only half of the CPUs are nohz_full= CPUs. Signed-off-by: Paul E. McKenney Reviewed-by: Pranith Kumar diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE07 b/tools/testing/selftests/rcutorture/configs/rcu/TREE07 index ab62255..715ee5e 100644 --- a/tools/testing/selftests/rcutorture/configs/rcu/TREE07 +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE07 @@ -7,7 +7,7 @@ CONFIG_PREEMPT=n CONFIG_HZ_PERIODIC=n CONFIG_NO_HZ_IDLE=n CONFIG_NO_HZ_FULL=y -CONFIG_NO_HZ_FULL_ALL=y +CONFIG_NO_HZ_FULL_ALL=n CONFIG_NO_HZ_FULL_SYSIDLE=y CONFIG_RCU_FAST_NO_HZ=n CONFIG_RCU_TRACE=y diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE07.boot b/tools/testing/selftests/rcutorture/configs/rcu/TREE07.boot new file mode 100644 index 0000000..d446099 --- /dev/null +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE07.boot @@ -0,0 +1 @@ +nohz_full=2-9 -- cgit v0.10.2 From ae867ff03d09c2aec56b0443b8b04e5a3fa1e336 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 21 Jul 2014 13:35:10 -0700 Subject: rcutorture: Specify MAXSMP=y for TREE01 Setting CONFIG_MAXSMP=y causes cpumasks to be moved offstack, which introduces the possibility of NULL cpumask_var_t pointers. This commit therefore enables CONFIG_MAXSMP=y in TREE01 to increase test coverage. However, because CONFIG_MAXSMP=y implies 8192 CPUs, we need to use the maxcpus= boot parameter to limit the number of CPUs to something reasonable, which in turn requires updating the scripts to handle this. Signed-off-by: Paul E. McKenney Reviewed-by: Pranith Kumar diff --git a/tools/testing/selftests/rcutorture/bin/functions.sh b/tools/testing/selftests/rcutorture/bin/functions.sh index d01b865..b325470 100644 --- a/tools/testing/selftests/rcutorture/bin/functions.sh +++ b/tools/testing/selftests/rcutorture/bin/functions.sh @@ -64,6 +64,26 @@ configfrag_boot_params () { fi } +# configfrag_boot_cpus bootparam-string config-fragment-file config-cpus +# +# Decreases number of CPUs based on any maxcpus= boot parameters specified. +configfrag_boot_cpus () { + local bootargs="`configfrag_boot_params "$1" "$2"`" + local maxcpus + if echo "${bootargs}" | grep -q 'maxcpus=[0-9]' + then + maxcpus="`echo "${bootargs}" | sed -e 's/^.*maxcpus=\([0-9]*\).*$/\1/'`" + if test "$3" -gt "$maxcpus" + then + echo $maxcpus + else + echo $3 + fi + else + echo $3 + fi +} + # configfrag_hotplug_cpu config-fragment-file # # Returns 1 if the config fragment specifies hotplug CPU. diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh index 6d2f5fa..487308f 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh @@ -140,6 +140,7 @@ fi # Generate -smp qemu argument. qemu_args="-nographic $qemu_args" cpu_count=`configNR_CPUS.sh $config_template` +cpu_count=`configfrag_boot_cpus "$boot_args" "$config_template" "$cpu_count"` vcpus=`identify_qemu_vcpus` if test $cpu_count -gt $vcpus then diff --git a/tools/testing/selftests/rcutorture/bin/kvm.sh b/tools/testing/selftests/rcutorture/bin/kvm.sh index 36534f9..e527dc9 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm.sh @@ -188,7 +188,9 @@ for CF in $configs do if test -f "$CONFIGFRAG/$kversion/$CF" then - echo $CF `configNR_CPUS.sh $CONFIGFRAG/$kversion/$CF` >> $T/cfgcpu + cpu_count=`configNR_CPUS.sh $CONFIGFRAG/$kversion/$CF` + cpu_count=`configfrag_boot_cpus "$TORTURE_BOOTARGS" "$CONFIGFRAG/$kversion/$CF" "$cpu_count"` + echo $CF $cpu_count >> $T/cfgcpu else echo "The --configs file $CF does not exist, terminating." exit 1 diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE01 b/tools/testing/selftests/rcutorture/configs/rcu/TREE01 index 063b707..38e3895 100644 --- a/tools/testing/selftests/rcutorture/configs/rcu/TREE01 +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE01 @@ -1,5 +1,4 @@ CONFIG_SMP=y -CONFIG_NR_CPUS=8 CONFIG_PREEMPT_NONE=n CONFIG_PREEMPT_VOLUNTARY=n CONFIG_PREEMPT=y @@ -10,8 +9,7 @@ CONFIG_NO_HZ_FULL=n CONFIG_RCU_FAST_NO_HZ=y CONFIG_RCU_TRACE=y CONFIG_HOTPLUG_CPU=y -CONFIG_RCU_FANOUT=8 -CONFIG_RCU_FANOUT_EXACT=n +CONFIG_MAXSMP=y CONFIG_RCU_NOCB_CPU=y CONFIG_RCU_NOCB_CPU_ZERO=y CONFIG_DEBUG_LOCK_ALLOC=n diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot b/tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot index 0fc8a34..adc3abc 100644 --- a/tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot @@ -1 +1 @@ -rcutorture.torture_type=rcu_bh +rcutorture.torture_type=rcu_bh maxcpus=8 -- cgit v0.10.2 From 188c1e896c0c28ac98809d11b6f29523805b34ef Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Sat, 26 Jul 2014 21:38:09 -0700 Subject: rcutorture: Specify CONFIG_CPUMASK_OFFSTACK=y for TREE07 This commit specifies offstack cpumasks in TREE07 in order to catch references to unallocated cpumask_var_t variables. Signed-off-by: Paul E. McKenney Reviewed-by: Pranith Kumar diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE07 b/tools/testing/selftests/rcutorture/configs/rcu/TREE07 index 715ee5e..8f10176 100644 --- a/tools/testing/selftests/rcutorture/configs/rcu/TREE07 +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE07 @@ -1,5 +1,6 @@ CONFIG_SMP=y CONFIG_NR_CPUS=16 +CONFIG_CPUMASK_OFFSTACK=y CONFIG_PREEMPT_NONE=y CONFIG_PREEMPT_VOLUNTARY=n CONFIG_PREEMPT=n -- cgit v0.10.2 From eea203fea3484598280a07fe503e025e886297fb Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Mon, 14 Jul 2014 09:16:15 -0400 Subject: rcu: Use pr_alert/pr_cont for printing logs User pr_alert/pr_cont for printing the logs from rcutorture module directly instead of writing it to a buffer and then printing it. This allows us from not having to allocate such buffers. Also remove a resulting empty function. I tested this using the parse-torture.sh script as follows: $ dmesg | grep torture > log.txt $ bash parse-torture.sh log.txt test $ There were no warnings which means that parsing went fine. Signed-off-by: Joe Perches Signed-off-by: Pranith Kumar Signed-off-by: Paul E. McKenney diff --git a/include/linux/torture.h b/include/linux/torture.h index 5ca58fc..fec46f8 100644 --- a/include/linux/torture.h +++ b/include/linux/torture.h @@ -51,7 +51,7 @@ /* Definitions for online/offline exerciser. */ int torture_onoff_init(long ooholdoff, long oointerval); -char *torture_onoff_stats(char *page); +void torture_onoff_stats(void); bool torture_onoff_failures(void); /* Low-rider random number generator. */ diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 7e67711..ff4f0c7 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -242,7 +242,7 @@ struct rcu_torture_ops { void (*call)(struct rcu_head *head, void (*func)(struct rcu_head *rcu)); void (*cb_barrier)(void); void (*fqs)(void); - void (*stats)(char *page); + void (*stats)(void); int irq_capable; int can_boost; const char *name; @@ -525,21 +525,21 @@ static void srcu_torture_barrier(void) srcu_barrier(&srcu_ctl); } -static void srcu_torture_stats(char *page) +static void srcu_torture_stats(void) { int cpu; int idx = srcu_ctl.completed & 0x1; - page += sprintf(page, "%s%s per-CPU(idx=%d):", - torture_type, TORTURE_FLAG, idx); + pr_alert("%s%s per-CPU(idx=%d):", + torture_type, TORTURE_FLAG, idx); for_each_possible_cpu(cpu) { long c0, c1; c0 = (long)per_cpu_ptr(srcu_ctl.per_cpu_ref, cpu)->c[!idx]; c1 = (long)per_cpu_ptr(srcu_ctl.per_cpu_ref, cpu)->c[idx]; - page += sprintf(page, " %d(%ld,%ld)", cpu, c0, c1); + pr_cont(" %d(%ld,%ld)", cpu, c0, c1); } - sprintf(page, "\n"); + pr_cont("\n"); } static void srcu_torture_synchronize_expedited(void) @@ -1031,10 +1031,15 @@ rcu_torture_reader(void *arg) } /* - * Create an RCU-torture statistics message in the specified buffer. + * Print torture statistics. Caller must ensure that there is only + * one call to this function at a given time!!! This is normally + * accomplished by relying on the module system to only have one copy + * of the module loaded, and then by giving the rcu_torture_stats + * kthread full control (or the init/cleanup functions when rcu_torture_stats + * thread is not running). */ static void -rcu_torture_printk(char *page) +rcu_torture_stats_print(void) { int cpu; int i; @@ -1052,55 +1057,60 @@ rcu_torture_printk(char *page) if (pipesummary[i] != 0) break; } - page += sprintf(page, "%s%s ", torture_type, TORTURE_FLAG); - page += sprintf(page, - "rtc: %p ver: %lu tfle: %d rta: %d rtaf: %d rtf: %d ", - rcu_torture_current, - rcu_torture_current_version, - list_empty(&rcu_torture_freelist), - atomic_read(&n_rcu_torture_alloc), - atomic_read(&n_rcu_torture_alloc_fail), - atomic_read(&n_rcu_torture_free)); - page += sprintf(page, "rtmbe: %d rtbke: %ld rtbre: %ld ", - atomic_read(&n_rcu_torture_mberror), - n_rcu_torture_boost_ktrerror, - n_rcu_torture_boost_rterror); - page += sprintf(page, "rtbf: %ld rtb: %ld nt: %ld ", - n_rcu_torture_boost_failure, - n_rcu_torture_boosts, - n_rcu_torture_timers); - page = torture_onoff_stats(page); - page += sprintf(page, "barrier: %ld/%ld:%ld", - n_barrier_successes, - n_barrier_attempts, - n_rcu_torture_barrier_error); - page += sprintf(page, "\n%s%s ", torture_type, TORTURE_FLAG); + + pr_alert("%s%s ", torture_type, TORTURE_FLAG); + pr_cont("rtc: %p ver: %lu tfle: %d rta: %d rtaf: %d rtf: %d ", + rcu_torture_current, + rcu_torture_current_version, + list_empty(&rcu_torture_freelist), + atomic_read(&n_rcu_torture_alloc), + atomic_read(&n_rcu_torture_alloc_fail), + atomic_read(&n_rcu_torture_free)); + pr_cont("rtmbe: %d rtbke: %ld rtbre: %ld ", + atomic_read(&n_rcu_torture_mberror), + n_rcu_torture_boost_ktrerror, + n_rcu_torture_boost_rterror); + pr_cont("rtbf: %ld rtb: %ld nt: %ld ", + n_rcu_torture_boost_failure, + n_rcu_torture_boosts, + n_rcu_torture_timers); + torture_onoff_stats(); + pr_cont("barrier: %ld/%ld:%ld\n", + n_barrier_successes, + n_barrier_attempts, + n_rcu_torture_barrier_error); + + pr_alert("%s%s ", torture_type, TORTURE_FLAG); if (atomic_read(&n_rcu_torture_mberror) != 0 || n_rcu_torture_barrier_error != 0 || n_rcu_torture_boost_ktrerror != 0 || n_rcu_torture_boost_rterror != 0 || n_rcu_torture_boost_failure != 0 || i > 1) { - page += sprintf(page, "!!! "); + pr_cont("%s", "!!! "); atomic_inc(&n_rcu_torture_error); WARN_ON_ONCE(1); } - page += sprintf(page, "Reader Pipe: "); + pr_cont("Reader Pipe: "); for (i = 0; i < RCU_TORTURE_PIPE_LEN + 1; i++) - page += sprintf(page, " %ld", pipesummary[i]); - page += sprintf(page, "\n%s%s ", torture_type, TORTURE_FLAG); - page += sprintf(page, "Reader Batch: "); + pr_cont(" %ld", pipesummary[i]); + pr_cont("\n"); + + pr_alert("%s%s ", torture_type, TORTURE_FLAG); + pr_cont("Reader Batch: "); for (i = 0; i < RCU_TORTURE_PIPE_LEN + 1; i++) - page += sprintf(page, " %ld", batchsummary[i]); - page += sprintf(page, "\n%s%s ", torture_type, TORTURE_FLAG); - page += sprintf(page, "Free-Block Circulation: "); + pr_cont(" %ld", batchsummary[i]); + pr_cont("\n"); + + pr_alert("%s%s ", torture_type, TORTURE_FLAG); + pr_cont("Free-Block Circulation: "); for (i = 0; i < RCU_TORTURE_PIPE_LEN + 1; i++) { - page += sprintf(page, " %d", - atomic_read(&rcu_torture_wcount[i])); + pr_cont(" %d", atomic_read(&rcu_torture_wcount[i])); } - page += sprintf(page, "\n"); + pr_cont("\n"); + if (cur_ops->stats) - cur_ops->stats(page); + cur_ops->stats(); if (rtcv_snap == rcu_torture_current_version && rcu_torture_current != NULL) { int __maybe_unused flags; @@ -1109,10 +1119,9 @@ rcu_torture_printk(char *page) rcutorture_get_gp_data(cur_ops->ttype, &flags, &gpnum, &completed); - page += sprintf(page, - "??? Writer stall state %d g%lu c%lu f%#x\n", - rcu_torture_writer_state, - gpnum, completed, flags); + pr_alert("??? Writer stall state %d g%lu c%lu f%#x\n", + rcu_torture_writer_state, + gpnum, completed, flags); show_rcu_gp_kthreads(); rcutorture_trace_dump(); } @@ -1120,30 +1129,6 @@ rcu_torture_printk(char *page) } /* - * Print torture statistics. Caller must ensure that there is only - * one call to this function at a given time!!! This is normally - * accomplished by relying on the module system to only have one copy - * of the module loaded, and then by giving the rcu_torture_stats - * kthread full control (or the init/cleanup functions when rcu_torture_stats - * thread is not running). - */ -static void -rcu_torture_stats_print(void) -{ - int size = nr_cpu_ids * 200 + 8192; - char *buf; - - buf = kmalloc(size, GFP_KERNEL); - if (!buf) { - pr_err("rcu-torture: Out of memory, need: %d", size); - return; - } - rcu_torture_printk(buf); - pr_alert("%s", buf); - kfree(buf); -} - -/* * Periodically prints torture statistics, if periodic statistics printing * was specified via the stat_interval module parameter. */ diff --git a/kernel/torture.c b/kernel/torture.c index d600af21..ede8b25 100644 --- a/kernel/torture.c +++ b/kernel/torture.c @@ -211,18 +211,16 @@ EXPORT_SYMBOL_GPL(torture_onoff_cleanup); /* * Print online/offline testing statistics. */ -char *torture_onoff_stats(char *page) +void torture_onoff_stats(void) { #ifdef CONFIG_HOTPLUG_CPU - page += sprintf(page, - "onoff: %ld/%ld:%ld/%ld %d,%d:%d,%d %lu:%lu (HZ=%d) ", - n_online_successes, n_online_attempts, - n_offline_successes, n_offline_attempts, - min_online, max_online, - min_offline, max_offline, - sum_online, sum_offline, HZ); + pr_cont("onoff: %ld/%ld:%ld/%ld %d,%d:%d,%d %lu:%lu (HZ=%d) ", + n_online_successes, n_online_attempts, + n_offline_successes, n_offline_attempts, + min_online, max_online, + min_offline, max_offline, + sum_online, sum_offline, HZ); #endif /* #ifdef CONFIG_HOTPLUG_CPU */ - return page; } EXPORT_SYMBOL_GPL(torture_onoff_stats); -- cgit v0.10.2 From 38706bc5a29a73645e512c06ffb759fb56259d83 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 18 Aug 2014 21:12:17 -0700 Subject: rcutorture: Add callback-flood test Although RCU is designed to handle arbitrary floods of callbacks, this capability is not routinely tested. This commit therefore adds a cbflood capability in which kthreads repeatedly registers large numbers of callbacks. One such kthread is created for each four CPUs (rounding up), and the test may be controlled by several cbflood_* kernel boot parameters, which control the number of bursts per flood, the number of callbacks per burst, the time between bursts, and the time between floods. The default values are large enough to exercise RCU's emergency responses to callback flooding. Signed-off-by: Paul E. McKenney Cc: David Miller Reviewed-by: Pranith Kumar diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 5ae8608..0a104be 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2881,6 +2881,24 @@ bytes respectively. Such letter suffixes can also be entirely omitted. Lazy RCU callbacks are those which RCU can prove do nothing more than free memory. + rcutorture.cbflood_inter_holdoff= [KNL] + Set holdoff time (jiffies) between successive + callback-flood tests. + + rcutorture.cbflood_intra_holdoff= [KNL] + Set holdoff time (jiffies) between successive + bursts of callbacks within a given callback-flood + test. + + rcutorture.cbflood_n_burst= [KNL] + Set the number of bursts making up a given + callback-flood test. Set this to zero to + disable callback-flood testing. + + rcutorture.cbflood_n_per_burst= [KNL] + Set the number of callbacks to be registered + in a given burst of a callback-flood test. + rcutorture.fqs_duration= [KNL] Set duration of force_quiescent_state bursts. diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index ff4f0c7..0bcd53a 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -49,11 +49,19 @@ #include #include #include +#include MODULE_LICENSE("GPL"); MODULE_AUTHOR("Paul E. McKenney and Josh Triplett "); +torture_param(int, cbflood_inter_holdoff, HZ, + "Holdoff between floods (jiffies)"); +torture_param(int, cbflood_intra_holdoff, 1, + "Holdoff between bursts (jiffies)"); +torture_param(int, cbflood_n_burst, 3, "# bursts in flood, zero to disable"); +torture_param(int, cbflood_n_per_burst, 20000, + "# callbacks per burst in flood"); torture_param(int, fqs_duration, 0, "Duration of fqs bursts (us), 0 to disable"); torture_param(int, fqs_holdoff, 0, "Holdoff time within fqs bursts (us)"); @@ -96,10 +104,12 @@ module_param(torture_type, charp, 0444); MODULE_PARM_DESC(torture_type, "Type of RCU to torture (rcu, rcu_bh, ...)"); static int nrealreaders; +static int ncbflooders; static struct task_struct *writer_task; static struct task_struct **fakewriter_tasks; static struct task_struct **reader_tasks; static struct task_struct *stats_task; +static struct task_struct **cbflood_task; static struct task_struct *fqs_task; static struct task_struct *boost_tasks[NR_CPUS]; static struct task_struct *stall_task; @@ -138,6 +148,7 @@ static long n_rcu_torture_boosts; static long n_rcu_torture_timers; static long n_barrier_attempts; static long n_barrier_successes; +static atomic_long_t n_cbfloods; static struct list_head rcu_torture_removed; static int rcu_torture_writer_state; @@ -707,6 +718,58 @@ checkwait: stutter_wait("rcu_torture_boost"); return 0; } +static void rcu_torture_cbflood_cb(struct rcu_head *rhp) +{ +} + +/* + * RCU torture callback-flood kthread. Repeatedly induces bursts of calls + * to call_rcu() or analogous, increasing the probability of occurrence + * of callback-overflow corner cases. + */ +static int +rcu_torture_cbflood(void *arg) +{ + int err = 1; + int i; + int j; + struct rcu_head *rhp; + + if (cbflood_n_per_burst > 0 && + cbflood_inter_holdoff > 0 && + cbflood_intra_holdoff > 0 && + cur_ops->call && + cur_ops->cb_barrier) { + rhp = vmalloc(sizeof(*rhp) * + cbflood_n_burst * cbflood_n_per_burst); + err = !rhp; + } + if (err) { + VERBOSE_TOROUT_STRING("rcu_torture_cbflood disabled: Bad args or OOM"); + while (!torture_must_stop()) + schedule_timeout_interruptible(HZ); + return 0; + } + VERBOSE_TOROUT_STRING("rcu_torture_cbflood task started"); + do { + schedule_timeout_interruptible(cbflood_inter_holdoff); + atomic_long_inc(&n_cbfloods); + WARN_ON(signal_pending(current)); + for (i = 0; i < cbflood_n_burst; i++) { + for (j = 0; j < cbflood_n_per_burst; j++) { + cur_ops->call(&rhp[i * cbflood_n_per_burst + j], + rcu_torture_cbflood_cb); + } + schedule_timeout_interruptible(cbflood_intra_holdoff); + WARN_ON(signal_pending(current)); + } + cur_ops->cb_barrier(); + stutter_wait("rcu_torture_cbflood"); + } while (!torture_must_stop()); + torture_kthread_stopping("rcu_torture_cbflood"); + return 0; +} + /* * RCU torture force-quiescent-state kthread. Repeatedly induces * bursts of calls to force_quiescent_state(), increasing the probability @@ -1075,10 +1138,11 @@ rcu_torture_stats_print(void) n_rcu_torture_boosts, n_rcu_torture_timers); torture_onoff_stats(); - pr_cont("barrier: %ld/%ld:%ld\n", + pr_cont("barrier: %ld/%ld:%ld ", n_barrier_successes, n_barrier_attempts, n_rcu_torture_barrier_error); + pr_cont("cbflood: %ld\n", atomic_long_read(&n_cbfloods)); pr_alert("%s%s ", torture_type, TORTURE_FLAG); if (atomic_read(&n_rcu_torture_mberror) != 0 || @@ -1432,6 +1496,8 @@ rcu_torture_cleanup(void) torture_stop_kthread(rcu_torture_stats, stats_task); torture_stop_kthread(rcu_torture_fqs, fqs_task); + for (i = 0; i < ncbflooders; i++) + torture_stop_kthread(rcu_torture_cbflood, cbflood_task[i]); if ((test_boost == 1 && cur_ops->can_boost) || test_boost == 2) { unregister_cpu_notifier(&rcutorture_cpu_nb); @@ -1678,6 +1744,24 @@ rcu_torture_init(void) goto unwind; if (object_debug) rcu_test_debug_objects(); + if (cbflood_n_burst > 0) { + /* Create the cbflood threads */ + ncbflooders = (num_online_cpus() + 3) / 4; + cbflood_task = kcalloc(ncbflooders, sizeof(*cbflood_task), + GFP_KERNEL); + if (!cbflood_task) { + VERBOSE_TOROUT_ERRSTRING("out of memory"); + firsterr = -ENOMEM; + goto unwind; + } + for (i = 0; i < ncbflooders; i++) { + firsterr = torture_create_kthread(rcu_torture_cbflood, + NULL, + cbflood_task[i]); + if (firsterr) + goto unwind; + } + } rcutorture_record_test_transition(); torture_init_end(); return 0; -- cgit v0.10.2 From b76592412a320dd58572fa3517c39adb2fdbd7ed Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 25 Aug 2014 20:41:47 -0700 Subject: torture: Print PID in hung-kernel diagnostic message Signed-off-by: Paul E. McKenney Reviewed-by: Pranith Kumar diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh index 487308f..f6b2b47 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh @@ -215,7 +215,7 @@ then fi if test $kruntime -ge $((seconds + grace)) then - echo "!!! Hang at $kruntime vs. $seconds seconds" >> $resdir/Warnings 2>&1 + echo "!!! PID $qemu_pid hung at $kruntime vs. $seconds seconds" >> $resdir/Warnings 2>&1 kill -KILL $qemu_pid break fi -- cgit v0.10.2 From bc51896da2ceef188f9cd708943d48c1259ebe84 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 26 Aug 2014 11:35:43 -0700 Subject: torture: Check for nul bytes in console output When starting a new torture run while an old one is still running, both qemu processes can be outputting to the same console.out file. This can cause quite a bit of confusion, so this commit checks for this situation, which is normally indicated by nul bytes in the console output. Yes, if your new run uses up an exact number of blocks of the file, this check will be ineffective, but the odds are not bad. Signed-off-by: Paul E. McKenney Reviewed-by: Pranith Kumar diff --git a/tools/testing/selftests/rcutorture/bin/parse-console.sh b/tools/testing/selftests/rcutorture/bin/parse-console.sh index a0bde6c..f962ba4 100755 --- a/tools/testing/selftests/rcutorture/bin/parse-console.sh +++ b/tools/testing/selftests/rcutorture/bin/parse-console.sh @@ -32,6 +32,10 @@ title="$2" . functions.sh +if grep -Pq '\x00' < $file +then + print_warning Console output contains nul bytes, old qemu still running? +fi egrep 'Badness|WARNING:|Warn|BUG|===========|Call Trace:|Oops:' < $file | grep -v 'ODEBUG: ' | grep -v 'Warning: unable to open an initial console' > $T if test -s $T then -- cgit v0.10.2 From 8315f42295d2667a7f942f154b73a86fd7cb2227 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 27 Jun 2014 13:42:20 -0700 Subject: rcu: Add call_rcu_tasks() This commit adds a new RCU-tasks flavor of RCU, which provides call_rcu_tasks(). This RCU flavor's quiescent states are voluntary context switch (not preemption!) and userspace execution (not the idle loop -- use some sort of schedule_on_each_cpu() if you need to handle the idle tasks. Note that unlike other RCU flavors, these quiescent states occur in tasks, not necessarily CPUs. Includes fixes from Steven Rostedt. This RCU flavor is assumed to have very infrequent latency-tolerant updaters. This assumption permits significant simplifications, including a single global callback list protected by a single global lock, along with a single task-private linked list containing all tasks that have not yet passed through a quiescent state. If experience shows this assumption to be incorrect, the required additional complexity will be added. Suggested-by: Steven Rostedt Signed-off-by: Paul E. McKenney diff --git a/include/linux/init_task.h b/include/linux/init_task.h index 2bb4c4f3..dffd925 100644 --- a/include/linux/init_task.h +++ b/include/linux/init_task.h @@ -117,6 +117,14 @@ extern struct group_info init_groups; #else #define INIT_TASK_RCU_PREEMPT(tsk) #endif +#ifdef CONFIG_TASKS_RCU +#define INIT_TASK_RCU_TASKS(tsk) \ + .rcu_tasks_holdout = false, \ + .rcu_tasks_holdout_list = \ + LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list), +#else +#define INIT_TASK_RCU_TASKS(tsk) +#endif extern struct cred init_cred; @@ -224,6 +232,7 @@ extern struct task_group root_task_group; INIT_FTRACE_GRAPH \ INIT_TRACE_RECURSION \ INIT_TASK_RCU_PREEMPT(tsk) \ + INIT_TASK_RCU_TASKS(tsk) \ INIT_CPUSET_SEQ(tsk) \ INIT_RT_MUTEXES(tsk) \ INIT_VTIME(tsk) \ diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index d231aa1..3432063 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -197,6 +197,26 @@ void call_rcu_sched(struct rcu_head *head, void synchronize_sched(void); +/** + * call_rcu_tasks() - Queue an RCU for invocation task-based grace period + * @head: structure to be used for queueing the RCU updates. + * @func: actual callback function to be invoked after the grace period + * + * The callback function will be invoked some time after a full grace + * period elapses, in other words after all currently executing RCU + * read-side critical sections have completed. call_rcu_tasks() assumes + * that the read-side critical sections end at a voluntary context + * switch (not a preemption!), entry into idle, or transition to usermode + * execution. As such, there are no read-side primitives analogous to + * rcu_read_lock() and rcu_read_unlock() because this primitive is intended + * to determine that all tasks have passed through a safe state, not so + * much for data-strcuture synchronization. + * + * See the description of call_rcu() for more detailed information on + * memory ordering guarantees. + */ +void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head *head)); + #ifdef CONFIG_PREEMPT_RCU void __rcu_read_lock(void); @@ -294,6 +314,22 @@ static inline void rcu_user_hooks_switch(struct task_struct *prev, rcu_irq_exit(); \ } while (0) +/* + * Note a voluntary context switch for RCU-tasks benefit. This is a + * macro rather than an inline function to avoid #include hell. + */ +#ifdef CONFIG_TASKS_RCU +#define rcu_note_voluntary_context_switch(t) \ + do { \ + preempt_disable(); /* Exclude synchronize_sched(); */ \ + if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \ + ACCESS_ONCE((t)->rcu_tasks_holdout) = false; \ + preempt_enable(); \ + } while (0) +#else /* #ifdef CONFIG_TASKS_RCU */ +#define rcu_note_voluntary_context_switch(t) do { } while (0) +#endif /* #else #ifdef CONFIG_TASKS_RCU */ + #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) bool __rcu_is_watching(void); #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 5c2c885..eaacac4 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1270,6 +1270,11 @@ struct task_struct { #ifdef CONFIG_TREE_PREEMPT_RCU struct rcu_node *rcu_blocked_node; #endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */ +#ifdef CONFIG_TASKS_RCU + unsigned long rcu_tasks_nvcsw; + bool rcu_tasks_holdout; + struct list_head rcu_tasks_holdout_list; +#endif /* #ifdef CONFIG_TASKS_RCU */ #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT) struct sched_info sched_info; @@ -2000,28 +2005,24 @@ extern void task_clear_jobctl_pending(struct task_struct *task, unsigned int mask); #ifdef CONFIG_PREEMPT_RCU - #define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */ #define RCU_READ_UNLOCK_NEED_QS (1 << 1) /* RCU core needs CPU response. */ +#endif /* #ifdef CONFIG_PREEMPT_RCU */ static inline void rcu_copy_process(struct task_struct *p) { +#ifdef CONFIG_PREEMPT_RCU p->rcu_read_lock_nesting = 0; p->rcu_read_unlock_special = 0; -#ifdef CONFIG_TREE_PREEMPT_RCU p->rcu_blocked_node = NULL; -#endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */ INIT_LIST_HEAD(&p->rcu_node_entry); +#endif /* #ifdef CONFIG_PREEMPT_RCU */ +#ifdef CONFIG_TASKS_RCU + p->rcu_tasks_holdout = false; + INIT_LIST_HEAD(&p->rcu_tasks_holdout_list); +#endif /* #ifdef CONFIG_TASKS_RCU */ } -#else - -static inline void rcu_copy_process(struct task_struct *p) -{ -} - -#endif - static inline void tsk_restore_flags(struct task_struct *task, unsigned long orig_flags, unsigned long flags) { diff --git a/init/Kconfig b/init/Kconfig index e84c642..c4539c4 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -507,6 +507,16 @@ config PREEMPT_RCU This option enables preemptible-RCU code that is common between TREE_PREEMPT_RCU and, in the old days, TINY_PREEMPT_RCU. +config TASKS_RCU + bool "Task_based RCU implementation using voluntary context switch" + default n + help + This option enables a task-based RCU implementation that uses + only voluntary context switch (not preemption!), idle, and + user-mode execution as quiescent states. + + If unsure, say N. + config RCU_STALL_COMMON def_bool ( TREE_RCU || TREE_PREEMPT_RCU || RCU_TRACE ) help diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c index d9efcc1..717f008 100644 --- a/kernel/rcu/tiny.c +++ b/kernel/rcu/tiny.c @@ -254,6 +254,8 @@ void rcu_check_callbacks(int cpu, int user) rcu_sched_qs(cpu); else if (!in_softirq()) rcu_bh_qs(cpu); + if (user) + rcu_note_voluntary_context_switch(current); } /* diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 1b70cb6..8ad91d1 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2410,6 +2410,8 @@ void rcu_check_callbacks(int cpu, int user) rcu_preempt_check_callbacks(cpu); if (rcu_pending(cpu)) invoke_rcu_core(); + if (user) + rcu_note_voluntary_context_switch(current); trace_rcu_utilization(TPS("End scheduler-tick")); } diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 4056d79..19b3dac 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -47,6 +47,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS @@ -347,3 +348,173 @@ static int __init check_cpu_stall_init(void) early_initcall(check_cpu_stall_init); #endif /* #ifdef CONFIG_RCU_STALL_COMMON */ + +#ifdef CONFIG_TASKS_RCU + +/* + * Simple variant of RCU whose quiescent states are voluntary context switch, + * user-space execution, and idle. As such, grace periods can take one good + * long time. There are no read-side primitives similar to rcu_read_lock() + * and rcu_read_unlock() because this implementation is intended to get + * the system into a safe state for some of the manipulations involved in + * tracing and the like. Finally, this implementation does not support + * high call_rcu_tasks() rates from multiple CPUs. If this is required, + * per-CPU callback lists will be needed. + */ + +/* Global list of callbacks and associated lock. */ +static struct rcu_head *rcu_tasks_cbs_head; +static struct rcu_head **rcu_tasks_cbs_tail = &rcu_tasks_cbs_head; +static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock); + +/* Post an RCU-tasks callback. */ +void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp)) +{ + unsigned long flags; + + rhp->next = NULL; + rhp->func = func; + raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags); + *rcu_tasks_cbs_tail = rhp; + rcu_tasks_cbs_tail = &rhp->next; + raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags); +} +EXPORT_SYMBOL_GPL(call_rcu_tasks); + +/* See if the current task has stopped holding out, remove from list if so. */ +static void check_holdout_task(struct task_struct *t) +{ + if (!ACCESS_ONCE(t->rcu_tasks_holdout) || + t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw) || + !ACCESS_ONCE(t->on_rq)) { + ACCESS_ONCE(t->rcu_tasks_holdout) = false; + list_del_rcu(&t->rcu_tasks_holdout_list); + put_task_struct(t); + } +} + +/* RCU-tasks kthread that detects grace periods and invokes callbacks. */ +static int __noreturn rcu_tasks_kthread(void *arg) +{ + unsigned long flags; + struct task_struct *g, *t; + struct rcu_head *list; + struct rcu_head *next; + LIST_HEAD(rcu_tasks_holdouts); + + /* FIXME: Add housekeeping affinity. */ + + /* + * Each pass through the following loop makes one check for + * newly arrived callbacks, and, if there are some, waits for + * one RCU-tasks grace period and then invokes the callbacks. + * This loop is terminated by the system going down. ;-) + */ + for (;;) { + + /* Pick up any new callbacks. */ + raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags); + list = rcu_tasks_cbs_head; + rcu_tasks_cbs_head = NULL; + rcu_tasks_cbs_tail = &rcu_tasks_cbs_head; + raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags); + + /* If there were none, wait a bit and start over. */ + if (!list) { + schedule_timeout_interruptible(HZ); + WARN_ON(signal_pending(current)); + continue; + } + + /* + * Wait for all pre-existing t->on_rq and t->nvcsw + * transitions to complete. Invoking synchronize_sched() + * suffices because all these transitions occur with + * interrupts disabled. Without this synchronize_sched(), + * a read-side critical section that started before the + * grace period might be incorrectly seen as having started + * after the grace period. + * + * This synchronize_sched() also dispenses with the + * need for a memory barrier on the first store to + * ->rcu_tasks_holdout, as it forces the store to happen + * after the beginning of the grace period. + */ + synchronize_sched(); + + /* + * There were callbacks, so we need to wait for an + * RCU-tasks grace period. Start off by scanning + * the task list for tasks that are not already + * voluntarily blocked. Mark these tasks and make + * a list of them in rcu_tasks_holdouts. + */ + rcu_read_lock(); + for_each_process_thread(g, t) { + if (t != current && ACCESS_ONCE(t->on_rq) && + !is_idle_task(t)) { + get_task_struct(t); + t->rcu_tasks_nvcsw = ACCESS_ONCE(t->nvcsw); + ACCESS_ONCE(t->rcu_tasks_holdout) = true; + list_add(&t->rcu_tasks_holdout_list, + &rcu_tasks_holdouts); + } + } + rcu_read_unlock(); + + /* + * Each pass through the following loop scans the list + * of holdout tasks, removing any that are no longer + * holdouts. When the list is empty, we are done. + */ + while (!list_empty(&rcu_tasks_holdouts)) { + schedule_timeout_interruptible(HZ); + WARN_ON(signal_pending(current)); + rcu_read_lock(); + list_for_each_entry_rcu(t, &rcu_tasks_holdouts, + rcu_tasks_holdout_list) + check_holdout_task(t); + rcu_read_unlock(); + } + + /* + * Because ->on_rq and ->nvcsw are not guaranteed + * to have a full memory barriers prior to them in the + * schedule() path, memory reordering on other CPUs could + * cause their RCU-tasks read-side critical sections to + * extend past the end of the grace period. However, + * because these ->nvcsw updates are carried out with + * interrupts disabled, we can use synchronize_sched() + * to force the needed ordering on all such CPUs. + * + * This synchronize_sched() also confines all + * ->rcu_tasks_holdout accesses to be within the grace + * period, avoiding the need for memory barriers for + * ->rcu_tasks_holdout accesses. + */ + synchronize_sched(); + + /* Invoke the callbacks. */ + while (list) { + next = list->next; + local_bh_disable(); + list->func(list); + local_bh_enable(); + list = next; + cond_resched(); + } + } +} + +/* Spawn rcu_tasks_kthread() at boot time. */ +static int __init rcu_spawn_tasks_kthread(void) +{ + struct task_struct __maybe_unused *t; + + t = kthread_run(rcu_tasks_kthread, NULL, "rcu_tasks_kthread"); + BUG_ON(IS_ERR(t)); + return 0; +} +early_initcall(rcu_spawn_tasks_kthread); + +#endif /* #ifdef CONFIG_TASKS_RCU */ -- cgit v0.10.2 From bde6c3aa993066acb0d6ce32ecabe03b9d5df92d Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 1 Jul 2014 11:26:57 -0700 Subject: rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops RCU-tasks requires the occasional voluntary context switch from CPU-bound in-kernel tasks. In some cases, this requires instrumenting cond_resched(). However, there is some reluctance to countenance unconditionally instrumenting cond_resched() (see http://lwn.net/Articles/603252/), so this commit creates a separate cond_resched_rcu_qs() that may be used in place of cond_resched() in locations prone to long-duration in-kernel looping. This commit currently instruments only RCU-tasks. Future possibilities include also instrumenting RCU, RCU-bh, and RCU-sched in order to reduce IPI usage. Signed-off-by: Paul E. McKenney diff --git a/fs/file.c b/fs/file.c index 66923fe..1cafc4c 100644 --- a/fs/file.c +++ b/fs/file.c @@ -367,7 +367,7 @@ static struct fdtable *close_files(struct files_struct * files) struct file * file = xchg(&fdt->fd[i], NULL); if (file) { filp_close(file, files); - cond_resched(); + cond_resched_rcu_qs(); } } i++; diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 3432063..4733504 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -330,6 +330,19 @@ static inline void rcu_user_hooks_switch(struct task_struct *prev, #define rcu_note_voluntary_context_switch(t) do { } while (0) #endif /* #else #ifdef CONFIG_TASKS_RCU */ +/** + * cond_resched_rcu_qs - Report potential quiescent states to RCU + * + * This macro resembles cond_resched(), except that it is defined to + * report potential quiescent states to RCU-tasks even if the cond_resched() + * machinery were to be shut off, as some advocate for PREEMPT kernels. + */ +#define cond_resched_rcu_qs() \ +do { \ + rcu_note_voluntary_context_switch(current); \ + cond_resched(); \ +} while (0) + #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) bool __rcu_is_watching(void); #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */ diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 948a769..1787167 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -667,7 +667,7 @@ static int rcu_torture_boost(void *arg) } call_rcu_time = jiffies; } - cond_resched(); + cond_resched_rcu_qs(); stutter_wait("rcu_torture_boost"); if (torture_must_stop()) goto checkwait; @@ -1019,7 +1019,7 @@ rcu_torture_reader(void *arg) __this_cpu_inc(rcu_torture_batch[completed]); preempt_enable(); cur_ops->readunlock(idx); - cond_resched(); + cond_resched_rcu_qs(); stutter_wait("rcu_torture_reader"); } while (!torture_must_stop()); if (irqreader && cur_ops->irq_capable) { diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 8ad91d1..e23dad0 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1647,7 +1647,7 @@ static int rcu_gp_init(struct rcu_state *rsp) rnp->level, rnp->grplo, rnp->grphi, rnp->qsmask); raw_spin_unlock_irq(&rnp->lock); - cond_resched(); + cond_resched_rcu_qs(); } mutex_unlock(&rsp->onoff_mutex); @@ -1736,7 +1736,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp) /* smp_mb() provided by prior unlock-lock pair. */ nocb += rcu_future_gp_cleanup(rsp, rnp); raw_spin_unlock_irq(&rnp->lock); - cond_resched(); + cond_resched_rcu_qs(); } rnp = rcu_get_root(rsp); raw_spin_lock_irq(&rnp->lock); @@ -1785,7 +1785,7 @@ static int __noreturn rcu_gp_kthread(void *arg) /* Locking provides needed memory barrier. */ if (rcu_gp_init(rsp)) break; - cond_resched(); + cond_resched_rcu_qs(); flush_signals(current); trace_rcu_grace_period(rsp->name, ACCESS_ONCE(rsp->gpnum), @@ -1828,10 +1828,10 @@ static int __noreturn rcu_gp_kthread(void *arg) trace_rcu_grace_period(rsp->name, ACCESS_ONCE(rsp->gpnum), TPS("fqsend")); - cond_resched(); + cond_resched_rcu_qs(); } else { /* Deal with stray signal. */ - cond_resched(); + cond_resched_rcu_qs(); flush_signals(current); trace_rcu_grace_period(rsp->name, ACCESS_ONCE(rsp->gpnum), @@ -2434,7 +2434,7 @@ static void force_qs_rnp(struct rcu_state *rsp, struct rcu_node *rnp; rcu_for_each_leaf_node(rsp, rnp) { - cond_resched(); + cond_resched_rcu_qs(); mask = 0; raw_spin_lock_irqsave(&rnp->lock, flags); smp_mb__after_unlock_lock(); diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index a7997e2..7672586 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1848,7 +1848,7 @@ static int rcu_oom_notify(struct notifier_block *self, get_online_cpus(); for_each_online_cpu(cpu) { smp_call_function_single(cpu, rcu_oom_notify_cpu, NULL, 1); - cond_resched(); + cond_resched_rcu_qs(); } put_online_cpus(); diff --git a/mm/mlock.c b/mm/mlock.c index ce84cb0..ab3150c 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -789,7 +789,7 @@ static int do_mlockall(int flags) /* Ignore errors */ mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags); - cond_resched(); + cond_resched_rcu_qs(); } out: return 0; -- cgit v0.10.2 From 53c6d4edf874d3cbc031a53738c6cba9277faea5 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 1 Jul 2014 12:22:23 -0700 Subject: rcu: Add synchronous grace-period waiting for RCU-tasks It turns out to be easier to add the synchronous grace-period waiting functions to RCU-tasks than to work around their absense in rcutorture, so this commit adds them. The key point is that the existence of call_rcu_tasks() means that rcutorture needs an rcu_barrier_tasks(). Signed-off-by: Paul E. McKenney diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 4733504..640152f 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -216,6 +216,8 @@ void synchronize_sched(void); * memory ordering guarantees. */ void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head *head)); +void synchronize_rcu_tasks(void); +void rcu_barrier_tasks(void); #ifdef CONFIG_PREEMPT_RCU diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 19b3dac..5fd1ddb 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -381,6 +381,61 @@ void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp)) } EXPORT_SYMBOL_GPL(call_rcu_tasks); +/** + * synchronize_rcu_tasks - wait until an rcu-tasks grace period has elapsed. + * + * Control will return to the caller some time after a full rcu-tasks + * grace period has elapsed, in other words after all currently + * executing rcu-tasks read-side critical sections have elapsed. These + * read-side critical sections are delimited by calls to schedule(), + * cond_resched_rcu_qs(), idle execution, userspace execution, calls + * to synchronize_rcu_tasks(), and (in theory, anyway) cond_resched(). + * + * This is a very specialized primitive, intended only for a few uses in + * tracing and other situations requiring manipulation of function + * preambles and profiling hooks. The synchronize_rcu_tasks() function + * is not (yet) intended for heavy use from multiple CPUs. + * + * Note that this guarantee implies further memory-ordering guarantees. + * On systems with more than one CPU, when synchronize_rcu_tasks() returns, + * each CPU is guaranteed to have executed a full memory barrier since the + * end of its last RCU-tasks read-side critical section whose beginning + * preceded the call to synchronize_rcu_tasks(). In addition, each CPU + * having an RCU-tasks read-side critical section that extends beyond + * the return from synchronize_rcu_tasks() is guaranteed to have executed + * a full memory barrier after the beginning of synchronize_rcu_tasks() + * and before the beginning of that RCU-tasks read-side critical section. + * Note that these guarantees include CPUs that are offline, idle, or + * executing in user mode, as well as CPUs that are executing in the kernel. + * + * Furthermore, if CPU A invoked synchronize_rcu_tasks(), which returned + * to its caller on CPU B, then both CPU A and CPU B are guaranteed + * to have executed a full memory barrier during the execution of + * synchronize_rcu_tasks() -- even if CPU A and CPU B are the same CPU + * (but again only if the system has more than one CPU). + */ +void synchronize_rcu_tasks(void) +{ + /* Complain if the scheduler has not started. */ + rcu_lockdep_assert(!rcu_scheduler_active, + "synchronize_rcu_tasks called too soon"); + + /* Wait for the grace period. */ + wait_rcu_gp(call_rcu_tasks); +} + +/** + * rcu_barrier_tasks - Wait for in-flight call_rcu_tasks() callbacks. + * + * Although the current implementation is guaranteed to wait, it is not + * obligated to, for example, if there are no pending callbacks. + */ +void rcu_barrier_tasks(void) +{ + /* There is only one callback queue, so this is easy. ;-) */ + synchronize_rcu_tasks(); +} + /* See if the current task has stopped holding out, remove from list if so. */ static void check_holdout_task(struct task_struct *t) { -- cgit v0.10.2 From 3f95aa81d265223fdb13ea2b59883766a05adbdf Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 4 Aug 2014 06:10:23 -0700 Subject: rcu: Make TASKS_RCU handle tasks that are almost done exiting Once a task has passed exit_notify() in the do_exit() code path, it is no longer on the task lists, and is therefore no longer visible to rcu_tasks_kthread(). This means that an almost-exited task might be preempted while within a trampoline, and this task won't be waited on by rcu_tasks_kthread(). This commit fixes this bug by adding an srcu_struct. An exiting task does srcu_read_lock() just before calling exit_notify(), and does the corresponding srcu_read_unlock() after doing the final preempt_disable(). This means that rcu_tasks_kthread() can do synchronize_srcu() to wait for all mostly-exited tasks to reach their final preempt_disable() region, and then use synchronize_sched() to wait for those tasks to finish exiting. Reported-by: Oleg Nesterov Suggested-by: Lai Jiangshan Signed-off-by: Paul E. McKenney diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 640152f..54b2ebb 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -321,6 +321,8 @@ static inline void rcu_user_hooks_switch(struct task_struct *prev, * macro rather than an inline function to avoid #include hell. */ #ifdef CONFIG_TASKS_RCU +#define TASKS_RCU(x) x +extern struct srcu_struct tasks_rcu_exit_srcu; #define rcu_note_voluntary_context_switch(t) \ do { \ preempt_disable(); /* Exclude synchronize_sched(); */ \ @@ -329,6 +331,7 @@ static inline void rcu_user_hooks_switch(struct task_struct *prev, preempt_enable(); \ } while (0) #else /* #ifdef CONFIG_TASKS_RCU */ +#define TASKS_RCU(x) do { } while (0) #define rcu_note_voluntary_context_switch(t) do { } while (0) #endif /* #else #ifdef CONFIG_TASKS_RCU */ diff --git a/kernel/exit.c b/kernel/exit.c index 32c58f7..d13f2ee 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -667,6 +667,7 @@ void do_exit(long code) { struct task_struct *tsk = current; int group_dead; + TASKS_RCU(int tasks_rcu_i); profile_task_exit(tsk); @@ -775,6 +776,7 @@ void do_exit(long code) */ flush_ptrace_hw_breakpoint(tsk); + TASKS_RCU(tasks_rcu_i = __srcu_read_lock(&tasks_rcu_exit_srcu)); exit_notify(tsk, group_dead); proc_exit_connector(tsk); #ifdef CONFIG_NUMA @@ -814,6 +816,7 @@ void do_exit(long code) if (tsk->nr_dirtied) __this_cpu_add(dirty_throttle_leaks, tsk->nr_dirtied); exit_rcu(); + TASKS_RCU(__srcu_read_unlock(&tasks_rcu_exit_srcu, tasks_rcu_i)); /* * The setting of TASK_RUNNING by try_to_wake_up() may be delayed diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 5fd1ddb..403fc4a 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -367,6 +367,13 @@ static struct rcu_head *rcu_tasks_cbs_head; static struct rcu_head **rcu_tasks_cbs_tail = &rcu_tasks_cbs_head; static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock); +/* Track exiting tasks in order to allow them to be waited for. */ +DEFINE_SRCU(tasks_rcu_exit_srcu); + +/* Control stall timeouts. Disable with <= 0, otherwise jiffies till stall. */ +static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 3; +module_param(rcu_task_stall_timeout, int, 0644); + /* Post an RCU-tasks callback. */ void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp)) { @@ -518,6 +525,15 @@ static int __noreturn rcu_tasks_kthread(void *arg) rcu_read_unlock(); /* + * Wait for tasks that are in the process of exiting. + * This does only part of the job, ensuring that all + * tasks that were previously exiting reach the point + * where they have disabled preemption, allowing the + * later synchronize_sched() to finish the job. + */ + synchronize_srcu(&tasks_rcu_exit_srcu); + + /* * Each pass through the following loop scans the list * of holdout tasks, removing any that are no longer * holdouts. When the list is empty, we are done. @@ -546,6 +562,11 @@ static int __noreturn rcu_tasks_kthread(void *arg) * ->rcu_tasks_holdout accesses to be within the grace * period, avoiding the need for memory barriers for * ->rcu_tasks_holdout accesses. + * + * In addition, this synchronize_sched() waits for exiting + * tasks to complete their final preempt_disable() region + * of execution, cleaning up after the synchronize_srcu() + * above. */ synchronize_sched(); -- cgit v0.10.2 From 06c2a9238fad48ec38f1be00455bf942d54377ee Mon Sep 17 00:00:00 2001 From: Steven Rostedt Date: Wed, 2 Jul 2014 18:17:19 -0700 Subject: rcu: Export RCU-tasks APIs to GPL modules This commit exports the RCU-tasks synchronous APIs, synchronize_rcu_tasks() and rcu_barrier_tasks(), to GPL-licensed kernel modules. Signed-off-by: Steven Rostedt Signed-off-by: Paul E. McKenney Reviewed-by: Josh Triplett diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 403fc4a..aef8109 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -430,6 +430,7 @@ void synchronize_rcu_tasks(void) /* Wait for the grace period. */ wait_rcu_gp(call_rcu_tasks); } +EXPORT_SYMBOL_GPL(synchronize_rcu_tasks); /** * rcu_barrier_tasks - Wait for in-flight call_rcu_tasks() callbacks. @@ -442,6 +443,7 @@ void rcu_barrier_tasks(void) /* There is only one callback queue, so this is easy. ;-) */ synchronize_rcu_tasks(); } +EXPORT_SYMBOL_GPL(rcu_barrier_tasks); /* See if the current task has stopped holding out, remove from list if so. */ static void check_holdout_task(struct task_struct *t) -- cgit v0.10.2 From 69c604557ce34015629b325b85ff1a4996038a3b Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 1 Jul 2014 11:59:36 -0700 Subject: rcutorture: Add torture tests for RCU-tasks This commit adds torture tests for RCU-tasks. It also fixes a bug that would segfault for an RCU flavor lacking a callback-barrier function. Signed-off-by: Paul E. McKenney Reviewed-by: Josh Triplett diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 54b2ebb..a3123f5 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -55,6 +55,7 @@ enum rcutorture_type { RCU_FLAVOR, RCU_BH_FLAVOR, RCU_SCHED_FLAVOR, + RCU_TASKS_FLAVOR, SRCU_FLAVOR, INVALID_RCU_FLAVOR }; diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 1787167..75b1abf 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -601,6 +601,52 @@ static struct rcu_torture_ops sched_ops = { .name = "sched" }; +#ifdef CONFIG_TASKS_RCU + +/* + * Definitions for RCU-tasks torture testing. + */ + +static int tasks_torture_read_lock(void) +{ + return 0; +} + +static void tasks_torture_read_unlock(int idx) +{ +} + +static void rcu_tasks_torture_deferred_free(struct rcu_torture *p) +{ + call_rcu_tasks(&p->rtort_rcu, rcu_torture_cb); +} + +static struct rcu_torture_ops tasks_ops = { + .ttype = RCU_TASKS_FLAVOR, + .init = rcu_sync_torture_init, + .readlock = tasks_torture_read_lock, + .read_delay = rcu_read_delay, /* just reuse rcu's version. */ + .readunlock = tasks_torture_read_unlock, + .completed = rcu_no_completed, + .deferred_free = rcu_tasks_torture_deferred_free, + .sync = synchronize_rcu_tasks, + .exp_sync = synchronize_rcu_tasks, + .call = call_rcu_tasks, + .cb_barrier = rcu_barrier_tasks, + .fqs = NULL, + .stats = NULL, + .irq_capable = 1, + .name = "tasks" +}; + +#define RCUTORTURE_TASKS_OPS &tasks_ops, + +#else /* #ifdef CONFIG_TASKS_RCU */ + +#define RCUTORTURE_TASKS_OPS + +#endif /* #else #ifdef CONFIG_TASKS_RCU */ + /* * RCU torture priority-boost testing. Runs one real-time thread per * CPU for moderate bursts, repeatedly registering RCU callbacks and @@ -1295,7 +1341,8 @@ static int rcu_torture_barrier_cbs(void *arg) if (atomic_dec_and_test(&barrier_cbs_count)) wake_up(&barrier_wq); } while (!torture_must_stop()); - cur_ops->cb_barrier(); + if (cur_ops->cb_barrier != NULL) + cur_ops->cb_barrier(); destroy_rcu_head_on_stack(&rcu); torture_kthread_stopping("rcu_torture_barrier_cbs"); return 0; @@ -1534,6 +1581,7 @@ rcu_torture_init(void) int firsterr = 0; static struct rcu_torture_ops *torture_ops[] = { &rcu_ops, &rcu_bh_ops, &rcu_busted_ops, &srcu_ops, &sched_ops, + RCUTORTURE_TASKS_OPS }; if (!torture_init_begin(torture_type, verbose, &rcutorture_runnable)) -- cgit v0.10.2 From f1a828f5fa3537456c417a81ad534c14022c268c Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 1 Jul 2014 12:56:20 -0700 Subject: rcutorture: Add RCU-tasks test cases This commit adds the TASKS01 and TASKS02 Kconfig fragments, along with the corresponding TASKS01.boot and TASKS02.boot boot-parameter files specifying that rcutorture test RCU-tasks instead of the default flavor. Signed-off-by: Paul E. McKenney diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TASKS01 b/tools/testing/selftests/rcutorture/configs/rcu/TASKS01 new file mode 100644 index 0000000..97f0a0b --- /dev/null +++ b/tools/testing/selftests/rcutorture/configs/rcu/TASKS01 @@ -0,0 +1,9 @@ +CONFIG_SMP=y +CONFIG_NR_CPUS=2 +CONFIG_HOTPLUG_CPU=y +CONFIG_PREEMPT_NONE=n +CONFIG_PREEMPT_VOLUNTARY=n +CONFIG_PREEMPT=y +CONFIG_DEBUG_LOCK_ALLOC=y +CONFIG_PROVE_RCU=y +CONFIG_TASKS_RCU=y diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TASKS01.boot b/tools/testing/selftests/rcutorture/configs/rcu/TASKS01.boot new file mode 100644 index 0000000..cd2a188 --- /dev/null +++ b/tools/testing/selftests/rcutorture/configs/rcu/TASKS01.boot @@ -0,0 +1 @@ +rcutorture.torture_type=tasks diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TASKS02 b/tools/testing/selftests/rcutorture/configs/rcu/TASKS02 new file mode 100644 index 0000000..696d2ea --- /dev/null +++ b/tools/testing/selftests/rcutorture/configs/rcu/TASKS02 @@ -0,0 +1,5 @@ +CONFIG_SMP=n +CONFIG_PREEMPT_NONE=y +CONFIG_PREEMPT_VOLUNTARY=n +CONFIG_PREEMPT=n +CONFIG_TASKS_RCU=y diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TASKS02.boot b/tools/testing/selftests/rcutorture/configs/rcu/TASKS02.boot new file mode 100644 index 0000000..cd2a188 --- /dev/null +++ b/tools/testing/selftests/rcutorture/configs/rcu/TASKS02.boot @@ -0,0 +1 @@ +rcutorture.torture_type=tasks diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TASKS03 b/tools/testing/selftests/rcutorture/configs/rcu/TASKS03 new file mode 100644 index 0000000..9c60da5 --- /dev/null +++ b/tools/testing/selftests/rcutorture/configs/rcu/TASKS03 @@ -0,0 +1,13 @@ +CONFIG_SMP=y +CONFIG_NR_CPUS=2 +CONFIG_HOTPLUG_CPU=n +CONFIG_SUSPEND=n +CONFIG_HIBERNATION=n +CONFIG_PREEMPT_NONE=n +CONFIG_PREEMPT_VOLUNTARY=n +CONFIG_PREEMPT=y +CONFIG_TASKS_RCU=y +CONFIG_HZ_PERIODIC=n +CONFIG_NO_HZ_IDLE=n +CONFIG_NO_HZ_FULL=y +CONFIG_NO_HZ_FULL_ALL=y diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TASKS03.boot b/tools/testing/selftests/rcutorture/configs/rcu/TASKS03.boot new file mode 100644 index 0000000..cd2a188 --- /dev/null +++ b/tools/testing/selftests/rcutorture/configs/rcu/TASKS03.boot @@ -0,0 +1 @@ +rcutorture.torture_type=tasks -- cgit v0.10.2 From 52db30ab23b6d00cf80b22a510c4ea4be4458031 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 1 Jul 2014 18:16:30 -0700 Subject: rcu: Add stall-warning checks for RCU-tasks This commit adds a ten-minute RCU-tasks stall warning. The actual time is controlled by the boot/sysfs parameter rcu_task_stall_timeout, with values less than or equal to zero disabling the stall warnings. The default value is ten minutes, which means that the tasks that have not yet responded will get their stacks dumped every ten minutes, until they pass through a voluntary context switch. Signed-off-by: Paul E. McKenney diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 5ae8608..e98be95 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2982,6 +2982,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted. rcupdate.rcu_cpu_stall_timeout= [KNL] Set timeout for RCU CPU stall warning messages. + rcupdate.rcu_task_stall_timeout= [KNL] + Set timeout in jiffies for RCU task stall warning + messages. Disable with a value less than or equal + to zero. + rdinit= [KNL] Format: Run specified binary instead of /init from the ramdisk, diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index aef8109..bad7dbd 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -371,7 +371,7 @@ static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock); DEFINE_SRCU(tasks_rcu_exit_srcu); /* Control stall timeouts. Disable with <= 0, otherwise jiffies till stall. */ -static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 3; +static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 10; module_param(rcu_task_stall_timeout, int, 0644); /* Post an RCU-tasks callback. */ @@ -445,8 +445,9 @@ void rcu_barrier_tasks(void) } EXPORT_SYMBOL_GPL(rcu_barrier_tasks); -/* See if the current task has stopped holding out, remove from list if so. */ -static void check_holdout_task(struct task_struct *t) +/* See if tasks are still holding out, complain if so. */ +static void check_holdout_task(struct task_struct *t, + bool needreport, bool *firstreport) { if (!ACCESS_ONCE(t->rcu_tasks_holdout) || t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw) || @@ -454,7 +455,15 @@ static void check_holdout_task(struct task_struct *t) ACCESS_ONCE(t->rcu_tasks_holdout) = false; list_del_rcu(&t->rcu_tasks_holdout_list); put_task_struct(t); + return; } + if (!needreport) + return; + if (*firstreport) { + pr_err("INFO: rcu_tasks detected stalls on tasks:\n"); + *firstreport = false; + } + sched_show_task(t); } /* RCU-tasks kthread that detects grace periods and invokes callbacks. */ @@ -462,6 +471,7 @@ static int __noreturn rcu_tasks_kthread(void *arg) { unsigned long flags; struct task_struct *g, *t; + unsigned long lastreport; struct rcu_head *list; struct rcu_head *next; LIST_HEAD(rcu_tasks_holdouts); @@ -540,13 +550,24 @@ static int __noreturn rcu_tasks_kthread(void *arg) * of holdout tasks, removing any that are no longer * holdouts. When the list is empty, we are done. */ + lastreport = jiffies; while (!list_empty(&rcu_tasks_holdouts)) { + bool firstreport; + bool needreport; + int rtst; + schedule_timeout_interruptible(HZ); + rtst = ACCESS_ONCE(rcu_task_stall_timeout); + needreport = rtst > 0 && + time_after(jiffies, lastreport + rtst); + if (needreport) + lastreport = jiffies; + firstreport = true; WARN_ON(signal_pending(current)); rcu_read_lock(); list_for_each_entry_rcu(t, &rcu_tasks_holdouts, rcu_tasks_holdout_list) - check_holdout_task(t); + check_holdout_task(t, needreport, &firstreport); rcu_read_unlock(); } -- cgit v0.10.2 From c7b24d2b9a0f2ce19fdf631d3148c80a8f6010b1 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 28 Jul 2014 14:39:25 -0700 Subject: rcu: Improve RCU-tasks energy efficiency The current RCU-tasks implementation uses strict polling to detect callback arrivals. This works quite well, but is not so good for energy efficiency. This commit therefore replaces the strict polling with a wait queue. Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index bad7dbd..444c8a3 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -365,6 +365,7 @@ early_initcall(check_cpu_stall_init); /* Global list of callbacks and associated lock. */ static struct rcu_head *rcu_tasks_cbs_head; static struct rcu_head **rcu_tasks_cbs_tail = &rcu_tasks_cbs_head; +static DECLARE_WAIT_QUEUE_HEAD(rcu_tasks_cbs_wq); static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock); /* Track exiting tasks in order to allow them to be waited for. */ @@ -378,13 +379,17 @@ module_param(rcu_task_stall_timeout, int, 0644); void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp)) { unsigned long flags; + bool needwake; rhp->next = NULL; rhp->func = func; raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags); + needwake = !rcu_tasks_cbs_head; *rcu_tasks_cbs_tail = rhp; rcu_tasks_cbs_tail = &rhp->next; raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags); + if (needwake) + wake_up(&rcu_tasks_cbs_wq); } EXPORT_SYMBOL_GPL(call_rcu_tasks); @@ -495,8 +500,12 @@ static int __noreturn rcu_tasks_kthread(void *arg) /* If there were none, wait a bit and start over. */ if (!list) { - schedule_timeout_interruptible(HZ); - WARN_ON(signal_pending(current)); + wait_event_interruptible(rcu_tasks_cbs_wq, + rcu_tasks_cbs_head); + if (!rcu_tasks_cbs_head) { + WARN_ON(signal_pending(current)); + schedule_timeout_interruptible(HZ/10); + } continue; } @@ -602,6 +611,7 @@ static int __noreturn rcu_tasks_kthread(void *arg) list = next; cond_resched(); } + schedule_timeout_uninterruptible(HZ/10); } } -- cgit v0.10.2 From 37fe5f0e2713608573c5df5e529e13a135625629 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 29 Jul 2014 09:49:23 -0700 Subject: documentation: Add verbiage on RCU-tasks stall warning messages This commit documents RCU-tasks stall warning messages and also describes when to use the new cond_resched_rcu_qs() API. Signed-off-by: Paul E. McKenney diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 68fe3ad..ef5a2fd 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -56,8 +56,20 @@ RCU_STALL_RAT_DELAY two jiffies. (This is a cpp macro, not a kernel configuration parameter.) -When a CPU detects that it is stalling, it will print a message similar -to the following: +rcupdate.rcu_task_stall_timeout + + This boot/sysfs parameter controls the RCU-tasks stall warning + interval. A value of zero or less suppresses RCU-tasks stall + warnings. A positive value sets the stall-warning interval + in jiffies. An RCU-tasks stall warning starts wtih the line: + + INFO: rcu_tasks detected stalls on tasks: + + And continues with the output of sched_show_task() for each + task stalling the current RCU-tasks grace period. + +For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling, +it will print a message similar to the following: INFO: rcu_sched_state detected stall on CPU 5 (t=2500 jiffies) @@ -174,8 +186,12 @@ o A CPU looping with preemption disabled. This condition can o A CPU looping with bottom halves disabled. This condition can result in RCU-sched and RCU-bh stalls. -o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel - without invoking schedule(). +o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the + kernel without invoking schedule(). Note that cond_resched() + does not necessarily prevent RCU CPU stall warnings. Therefore, + if the looping in the kernel is really expected and desirable + behavior, you might need to replace some of the cond_resched() + calls with calls to cond_resched_rcu_qs(). o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might happen to preempt a low-priority task in the middle of an RCU @@ -208,11 +224,10 @@ o A hardware failure. This is quite unlikely, but has occurred This resulted in a series of RCU CPU stall warnings, eventually leading the realization that the CPU had failed. -The RCU, RCU-sched, and RCU-bh implementations have CPU stall warning. -SRCU does not have its own CPU stall warnings, but its calls to -synchronize_sched() will result in RCU-sched detecting RCU-sched-related -CPU stalls. Please note that RCU only detects CPU stalls when there is -a grace period in progress. No grace period, no CPU stall warnings. +The RCU, RCU-sched, RCU-bh, and RCU-tasks implementations have CPU stall +warning. Note that SRCU does -not- have CPU stall warnings. Please note +that RCU only detects CPU stalls when there is a grace period in progress. +No grace period, no CPU stall warnings. To diagnose the cause of the stall, inspect the stack traces. The offending function will usually be near the top of the stack. -- cgit v0.10.2 From 84a8f446ffd70c2799a96268aaa4d47c22a83ff0 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 4 Aug 2014 07:24:21 -0700 Subject: rcu: Defer rcu_tasks_kthread() creation till first call_rcu_tasks() It is expected that many sites will have CONFIG_TASKS_RCU=y, but will never actually invoke call_rcu_tasks(). For such sites, creating rcu_tasks_kthread() at boot is wasteful. This commit therefore defers creation of this kthread until the time of the first call_rcu_tasks(). This of course means that the first call_rcu_tasks() must be invoked from process context after the scheduler is fully operational. Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 444c8a3..e1d7174 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -375,7 +375,12 @@ DEFINE_SRCU(tasks_rcu_exit_srcu); static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 10; module_param(rcu_task_stall_timeout, int, 0644); -/* Post an RCU-tasks callback. */ +static void rcu_spawn_tasks_kthread(void); + +/* + * Post an RCU-tasks callback. First call must be from process context + * after the scheduler if fully operational. + */ void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp)) { unsigned long flags; @@ -388,8 +393,10 @@ void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp)) *rcu_tasks_cbs_tail = rhp; rcu_tasks_cbs_tail = &rhp->next; raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags); - if (needwake) + if (needwake) { + rcu_spawn_tasks_kthread(); wake_up(&rcu_tasks_cbs_wq); + } } EXPORT_SYMBOL_GPL(call_rcu_tasks); @@ -615,15 +622,27 @@ static int __noreturn rcu_tasks_kthread(void *arg) } } -/* Spawn rcu_tasks_kthread() at boot time. */ -static int __init rcu_spawn_tasks_kthread(void) +/* Spawn rcu_tasks_kthread() at first call to call_rcu_tasks(). */ +static void rcu_spawn_tasks_kthread(void) { - struct task_struct __maybe_unused *t; + static DEFINE_MUTEX(rcu_tasks_kthread_mutex); + static struct task_struct *rcu_tasks_kthread_ptr; + struct task_struct *t; + if (ACCESS_ONCE(rcu_tasks_kthread_ptr)) { + smp_mb(); /* Ensure caller sees full kthread. */ + return; + } + mutex_lock(&rcu_tasks_kthread_mutex); + if (rcu_tasks_kthread_ptr) { + mutex_unlock(&rcu_tasks_kthread_mutex); + return; + } t = kthread_run(rcu_tasks_kthread, NULL, "rcu_tasks_kthread"); BUG_ON(IS_ERR(t)); - return 0; + smp_mb(); /* Ensure others see full kthread. */ + ACCESS_ONCE(rcu_tasks_kthread_ptr) = t; + mutex_unlock(&rcu_tasks_kthread_mutex); } -early_initcall(rcu_spawn_tasks_kthread); #endif /* #ifdef CONFIG_TASKS_RCU */ -- cgit v0.10.2 From 176f8f7a52cc6d09d686f0d900abda6942a52fbb Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 4 Aug 2014 17:43:50 -0700 Subject: rcu: Make TASKS_RCU handle nohz_full= CPUs Currently TASKS_RCU would ignore a CPU running a task in nohz_full= usermode execution. There would be neither a context switch nor a scheduling-clock interrupt to tell TASKS_RCU that the task in question had passed through a quiescent state. The grace period would therefore extend indefinitely. This commit therefore makes RCU's dyntick-idle subsystem record the task_struct structure of the task that is running in dyntick-idle mode on each CPU. The TASKS_RCU grace period can then access this information and record a quiescent state on behalf of any CPU running in dyntick-idle usermode. Signed-off-by: Paul E. McKenney diff --git a/include/linux/init_task.h b/include/linux/init_task.h index dffd925..03b2748 100644 --- a/include/linux/init_task.h +++ b/include/linux/init_task.h @@ -121,7 +121,8 @@ extern struct group_info init_groups; #define INIT_TASK_RCU_TASKS(tsk) \ .rcu_tasks_holdout = false, \ .rcu_tasks_holdout_list = \ - LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list), + LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list), \ + .rcu_tasks_idle_cpu = -1, #else #define INIT_TASK_RCU_TASKS(tsk) #endif diff --git a/include/linux/sched.h b/include/linux/sched.h index eaacac4..ec8b347 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1274,6 +1274,7 @@ struct task_struct { unsigned long rcu_tasks_nvcsw; bool rcu_tasks_holdout; struct list_head rcu_tasks_holdout_list; + int rcu_tasks_idle_cpu; #endif /* #ifdef CONFIG_TASKS_RCU */ #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT) @@ -2020,6 +2021,7 @@ static inline void rcu_copy_process(struct task_struct *p) #ifdef CONFIG_TASKS_RCU p->rcu_tasks_holdout = false; INIT_LIST_HEAD(&p->rcu_tasks_holdout_list); + p->rcu_tasks_idle_cpu = -1; #endif /* #ifdef CONFIG_TASKS_RCU */ } diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index e23dad0..c880f53 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -526,6 +526,7 @@ static void rcu_eqs_enter_common(struct rcu_dynticks *rdtp, long long oldval, atomic_inc(&rdtp->dynticks); smp_mb__after_atomic(); /* Force ordering with next sojourn. */ WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1); + rcu_dynticks_task_enter(); /* * It is illegal to enter an extended quiescent state while @@ -642,6 +643,7 @@ void rcu_irq_exit(void) static void rcu_eqs_exit_common(struct rcu_dynticks *rdtp, long long oldval, int user) { + rcu_dynticks_task_exit(); smp_mb__before_atomic(); /* Force ordering w/previous sojourn. */ atomic_inc(&rdtp->dynticks); /* CPUs seeing atomic_inc() must see later RCU read-side crit sects */ diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 6a86eb7..3a92000 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -605,6 +605,8 @@ static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, static void rcu_bind_gp_kthread(void); static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp); static bool rcu_nohz_full_cpu(struct rcu_state *rsp); +static void rcu_dynticks_task_enter(void); +static void rcu_dynticks_task_exit(void); #endif /* #ifndef RCU_TREE_NONCORE */ diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 7672586..e466b40 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -3036,3 +3036,19 @@ static void rcu_bind_gp_kthread(void) housekeeping_affine(current); #endif /* #else #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */ } + +/* Record the current task on dyntick-idle entry. */ +static void rcu_dynticks_task_enter(void) +{ +#if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) + ACCESS_ONCE(current->rcu_tasks_idle_cpu) = smp_processor_id(); +#endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */ +} + +/* Record no current task on dyntick-idle exit. */ +static void rcu_dynticks_task_exit(void) +{ +#if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) + ACCESS_ONCE(current->rcu_tasks_idle_cpu) = -1; +#endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */ +} diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index e1d7174..2658de4 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -463,7 +463,9 @@ static void check_holdout_task(struct task_struct *t, { if (!ACCESS_ONCE(t->rcu_tasks_holdout) || t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw) || - !ACCESS_ONCE(t->on_rq)) { + !ACCESS_ONCE(t->on_rq) || + (IS_ENABLED(CONFIG_NO_HZ_FULL) && + !is_idle_task(t) && t->rcu_tasks_idle_cpu >= 0)) { ACCESS_ONCE(t->rcu_tasks_holdout) = false; list_del_rcu(&t->rcu_tasks_holdout_list); put_task_struct(t); -- cgit v0.10.2 From 8f20a5e83d2c5d0e126a2fc9bca67f7430dac907 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 5 Aug 2014 05:10:24 -0700 Subject: rcu: Make rcu_tasks_kthread()'s GP-wait loop allow preemption The grace-period-wait loop in rcu_tasks_kthread() is under (unnecessary) RCU protection, and therefore has no preemption points in a PREEMPT=n kernel. This commit therefore removes the RCU protection and inserts cond_resched(). Reported-by: Frederic Weisbecker Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 2658de4..f86d1ae 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -467,7 +467,7 @@ static void check_holdout_task(struct task_struct *t, (IS_ENABLED(CONFIG_NO_HZ_FULL) && !is_idle_task(t) && t->rcu_tasks_idle_cpu >= 0)) { ACCESS_ONCE(t->rcu_tasks_holdout) = false; - list_del_rcu(&t->rcu_tasks_holdout_list); + list_del_init(&t->rcu_tasks_holdout_list); put_task_struct(t); return; } @@ -573,6 +573,7 @@ static int __noreturn rcu_tasks_kthread(void *arg) bool firstreport; bool needreport; int rtst; + struct task_struct *t1; schedule_timeout_interruptible(HZ); rtst = ACCESS_ONCE(rcu_task_stall_timeout); @@ -582,11 +583,11 @@ static int __noreturn rcu_tasks_kthread(void *arg) lastreport = jiffies; firstreport = true; WARN_ON(signal_pending(current)); - rcu_read_lock(); - list_for_each_entry_rcu(t, &rcu_tasks_holdouts, - rcu_tasks_holdout_list) + list_for_each_entry_safe(t, t1, &rcu_tasks_holdouts, + rcu_tasks_holdout_list) { check_holdout_task(t, needreport, &firstreport); - rcu_read_unlock(); + cond_resched(); + } } /* -- cgit v0.10.2 From 01a81330344b09028881c953a51d1106a9e63518 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 5 Aug 2014 05:23:35 -0700 Subject: rcu: Remove redundant preempt_disable() from rcu_note_voluntary_context_switch() In theory, synchronize_sched() requires a read-side critical section to order against. In practice, preemption can be thought of as being disabled across every machine instruction, at least for those machine instructions that are not in the idle loop and not on offline CPUs. So this commit removes the redundant preempt_disable() from rcu_note_voluntary_context_switch(). Please note that the single instruction in question is the store of zero to ->rcu_tasks_holdout. The "if" is simply a performance optimization that avoids unnecessary stores. To see this, keep in mind that both the "if" condition and the store are in a quiescent state. Therefore, even if the task is preempted for a full grace period (presumably due to its having done a context switch beforehand), the store will be recording a legitimate quiescent state. Reported-by: Lai Jiangshan Signed-off-by: Paul E. McKenney Conflicts: include/linux/rcupdate.h diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index a3123f5..132e1e34 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -326,10 +326,8 @@ static inline void rcu_user_hooks_switch(struct task_struct *prev, extern struct srcu_struct tasks_rcu_exit_srcu; #define rcu_note_voluntary_context_switch(t) \ do { \ - preempt_disable(); /* Exclude synchronize_sched(); */ \ if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \ ACCESS_ONCE((t)->rcu_tasks_holdout) = false; \ - preempt_enable(); \ } while (0) #else /* #ifdef CONFIG_TASKS_RCU */ #define TASKS_RCU(x) do { } while (0) -- cgit v0.10.2 From 4ff475ed4cf61a7f56bbfbc424147189d0022b38 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Sun, 10 Aug 2014 19:47:12 -0700 Subject: rcu: Additional information on RCU-tasks stall-warning messages Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index f86d1ae..9487b48 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -48,6 +48,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS @@ -461,6 +462,8 @@ EXPORT_SYMBOL_GPL(rcu_barrier_tasks); static void check_holdout_task(struct task_struct *t, bool needreport, bool *firstreport) { + int cpu; + if (!ACCESS_ONCE(t->rcu_tasks_holdout) || t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw) || !ACCESS_ONCE(t->on_rq) || @@ -477,6 +480,12 @@ static void check_holdout_task(struct task_struct *t, pr_err("INFO: rcu_tasks detected stalls on tasks:\n"); *firstreport = false; } + cpu = task_cpu(t); + pr_alert("%p: %c%c nvcsw: %lu/%lu holdout: %d idle_cpu: %d/%d\n", + t, ".I"[is_idle_task(t)], + "N."[cpu < 0 || !tick_nohz_full_cpu(cpu)], + t->rcu_tasks_nvcsw, t->nvcsw, t->rcu_tasks_holdout, + t->rcu_tasks_idle_cpu, cpu); sched_show_task(t); } -- cgit v0.10.2 From 1d082fd061884a587c490c4fc8a2056ce1e47624 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 14 Aug 2014 16:01:53 -0700 Subject: rcu: Remove local_irq_disable() in rcu_preempt_note_context_switch() The rcu_preempt_note_context_switch() function is on a scheduling fast path, so it would be good to avoid disabling irqs. The reason that irqs are disabled is to synchronize process-level and irq-handler access to the task_struct ->rcu_read_unlock_special bitmask. This commit therefore makes ->rcu_read_unlock_special instead be a union of bools with a short allowing single-access checks in RCU's __rcu_read_unlock(). This results in the process-level and irq-handler accesses being simple loads and stores, so that irqs need no longer be disabled. This commit therefore removes the irq disabling from rcu_preempt_note_context_switch(). Reported-by: Peter Zijlstra Signed-off-by: Paul E. McKenney diff --git a/include/linux/init_task.h b/include/linux/init_task.h index 03b2748..77fc43f 100644 --- a/include/linux/init_task.h +++ b/include/linux/init_task.h @@ -111,7 +111,7 @@ extern struct group_info init_groups; #ifdef CONFIG_PREEMPT_RCU #define INIT_TASK_RCU_PREEMPT(tsk) \ .rcu_read_lock_nesting = 0, \ - .rcu_read_unlock_special = 0, \ + .rcu_read_unlock_special.s = 0, \ .rcu_node_entry = LIST_HEAD_INIT(tsk.rcu_node_entry), \ INIT_TASK_RCU_TREE_PREEMPT() #else diff --git a/include/linux/sched.h b/include/linux/sched.h index ec8b347..42888d7 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1212,6 +1212,13 @@ struct sched_dl_entity { struct hrtimer dl_timer; }; +union rcu_special { + struct { + bool blocked; + bool need_qs; + } b; + short s; +}; struct rcu_node; enum perf_event_task_context { @@ -1264,7 +1271,7 @@ struct task_struct { #ifdef CONFIG_PREEMPT_RCU int rcu_read_lock_nesting; - char rcu_read_unlock_special; + union rcu_special rcu_read_unlock_special; struct list_head rcu_node_entry; #endif /* #ifdef CONFIG_PREEMPT_RCU */ #ifdef CONFIG_TREE_PREEMPT_RCU @@ -2005,16 +2012,11 @@ extern void task_clear_jobctl_trapping(struct task_struct *task); extern void task_clear_jobctl_pending(struct task_struct *task, unsigned int mask); -#ifdef CONFIG_PREEMPT_RCU -#define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */ -#define RCU_READ_UNLOCK_NEED_QS (1 << 1) /* RCU core needs CPU response. */ -#endif /* #ifdef CONFIG_PREEMPT_RCU */ - static inline void rcu_copy_process(struct task_struct *p) { #ifdef CONFIG_PREEMPT_RCU p->rcu_read_lock_nesting = 0; - p->rcu_read_unlock_special = 0; + p->rcu_read_unlock_special.s = 0; p->rcu_blocked_node = NULL; INIT_LIST_HEAD(&p->rcu_node_entry); #endif /* #ifdef CONFIG_PREEMPT_RCU */ diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index e466b40..0981c0c 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -155,9 +155,8 @@ EXPORT_SYMBOL_GPL(rcu_batches_completed); * not in a quiescent state. There might be any number of tasks blocked * while in an RCU read-side critical section. * - * Unlike the other rcu_*_qs() functions, callers to this function - * must disable irqs in order to protect the assignment to - * ->rcu_read_unlock_special. + * As with the other rcu_*_qs() functions, callers to this function + * must disable preemption. */ static void rcu_preempt_qs(int cpu) { @@ -166,7 +165,7 @@ static void rcu_preempt_qs(int cpu) if (rdp->passed_quiesce == 0) trace_rcu_grace_period(TPS("rcu_preempt"), rdp->gpnum, TPS("cpuqs")); rdp->passed_quiesce = 1; - current->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS; + current->rcu_read_unlock_special.b.need_qs = false; } /* @@ -190,14 +189,14 @@ static void rcu_preempt_note_context_switch(int cpu) struct rcu_node *rnp; if (t->rcu_read_lock_nesting > 0 && - (t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) { + !t->rcu_read_unlock_special.b.blocked) { /* Possibly blocking in an RCU read-side critical section. */ rdp = per_cpu_ptr(rcu_preempt_state.rda, cpu); rnp = rdp->mynode; raw_spin_lock_irqsave(&rnp->lock, flags); smp_mb__after_unlock_lock(); - t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BLOCKED; + t->rcu_read_unlock_special.b.blocked = true; t->rcu_blocked_node = rnp; /* @@ -239,7 +238,7 @@ static void rcu_preempt_note_context_switch(int cpu) : rnp->gpnum + 1); raw_spin_unlock_irqrestore(&rnp->lock, flags); } else if (t->rcu_read_lock_nesting < 0 && - t->rcu_read_unlock_special) { + t->rcu_read_unlock_special.s) { /* * Complete exit from RCU read-side critical section on @@ -257,9 +256,7 @@ static void rcu_preempt_note_context_switch(int cpu) * grace period, then the fact that the task has been enqueued * means that we continue to block the current grace period. */ - local_irq_save(flags); rcu_preempt_qs(cpu); - local_irq_restore(flags); } /* @@ -340,7 +337,7 @@ void rcu_read_unlock_special(struct task_struct *t) bool drop_boost_mutex = false; #endif /* #ifdef CONFIG_RCU_BOOST */ struct rcu_node *rnp; - int special; + union rcu_special special; /* NMI handlers cannot block and cannot safely manipulate state. */ if (in_nmi()) @@ -350,12 +347,13 @@ void rcu_read_unlock_special(struct task_struct *t) /* * If RCU core is waiting for this CPU to exit critical section, - * let it know that we have done so. + * let it know that we have done so. Because irqs are disabled, + * t->rcu_read_unlock_special cannot change. */ special = t->rcu_read_unlock_special; - if (special & RCU_READ_UNLOCK_NEED_QS) { + if (special.b.need_qs) { rcu_preempt_qs(smp_processor_id()); - if (!t->rcu_read_unlock_special) { + if (!t->rcu_read_unlock_special.s) { local_irq_restore(flags); return; } @@ -368,8 +366,8 @@ void rcu_read_unlock_special(struct task_struct *t) } /* Clean up if blocked during RCU read-side critical section. */ - if (special & RCU_READ_UNLOCK_BLOCKED) { - t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_BLOCKED; + if (special.b.blocked) { + t->rcu_read_unlock_special.b.blocked = false; /* * Remove this task from the list it blocked on. The @@ -658,7 +656,7 @@ static void rcu_preempt_check_callbacks(int cpu) } if (t->rcu_read_lock_nesting > 0 && per_cpu(rcu_preempt_data, cpu).qs_pending) - t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS; + t->rcu_read_unlock_special.b.need_qs = true; } #ifdef CONFIG_RCU_BOOST @@ -941,7 +939,7 @@ void exit_rcu(void) return; t->rcu_read_lock_nesting = 1; barrier(); - t->rcu_read_unlock_special = RCU_READ_UNLOCK_BLOCKED; + t->rcu_read_unlock_special.b.blocked = true; __rcu_read_unlock(); } diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 9487b48..6fb9115 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -93,7 +93,7 @@ void __rcu_read_unlock(void) barrier(); /* critical section before exit code. */ t->rcu_read_lock_nesting = INT_MIN; barrier(); /* assign before ->rcu_read_unlock_special load */ - if (unlikely(ACCESS_ONCE(t->rcu_read_unlock_special))) + if (unlikely(ACCESS_ONCE(t->rcu_read_unlock_special.s))) rcu_read_unlock_special(t); barrier(); /* ->rcu_read_unlock_special load before assign */ t->rcu_read_lock_nesting = 0; -- cgit v0.10.2 From 284a8c93af47306beed967a303d84730b32bab39 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 14 Aug 2014 16:38:46 -0700 Subject: rcu: Per-CPU operation cleanups to rcu_*_qs() functions The rcu_bh_qs(), rcu_preempt_qs(), and rcu_sched_qs() functions use old-style per-CPU variable access and write to ->passed_quiesce even if it is already set. This commit therefore updates to use the new-style per-CPU variable access functions and avoids the spurious writes. This commit also eliminates the "cpu" argument to these functions because they are always invoked on the indicated CPU. Reported-by: Peter Zijlstra Signed-off-by: Paul E. McKenney diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 132e1e34..2fab0e3 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -261,8 +261,8 @@ static inline int rcu_preempt_depth(void) /* Internal to kernel */ void rcu_init(void); -void rcu_sched_qs(int cpu); -void rcu_bh_qs(int cpu); +void rcu_sched_qs(void); +void rcu_bh_qs(void); void rcu_check_callbacks(int cpu, int user); struct notifier_block; void rcu_idle_enter(void); diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index d40a6a4..38cc5b1 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -80,7 +80,7 @@ static inline void kfree_call_rcu(struct rcu_head *head, static inline void rcu_note_context_switch(int cpu) { - rcu_sched_qs(cpu); + rcu_sched_qs(); } /* diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c index 717f008..61b8d2c 100644 --- a/kernel/rcu/tiny.c +++ b/kernel/rcu/tiny.c @@ -72,7 +72,7 @@ static void rcu_idle_enter_common(long long newval) current->pid, current->comm, idle->pid, idle->comm); /* must be idle task! */ } - rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */ + rcu_sched_qs(); /* implies rcu_bh_inc() */ barrier(); rcu_dynticks_nesting = newval; } @@ -217,7 +217,7 @@ static int rcu_qsctr_help(struct rcu_ctrlblk *rcp) * are at it, given that any rcu quiescent state is also an rcu_bh * quiescent state. Use "+" instead of "||" to defeat short circuiting. */ -void rcu_sched_qs(int cpu) +void rcu_sched_qs(void) { unsigned long flags; @@ -231,7 +231,7 @@ void rcu_sched_qs(int cpu) /* * Record an rcu_bh quiescent state. */ -void rcu_bh_qs(int cpu) +void rcu_bh_qs(void) { unsigned long flags; @@ -251,9 +251,9 @@ void rcu_check_callbacks(int cpu, int user) { RCU_TRACE(check_cpu_stalls()); if (user || rcu_is_cpu_rrupt_from_idle()) - rcu_sched_qs(cpu); + rcu_sched_qs(); else if (!in_softirq()) - rcu_bh_qs(cpu); + rcu_bh_qs(); if (user) rcu_note_voluntary_context_switch(current); } diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index c880f53..4c34062 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -188,22 +188,24 @@ static int rcu_gp_in_progress(struct rcu_state *rsp) * one since the start of the grace period, this just sets a flag. * The caller must have disabled preemption. */ -void rcu_sched_qs(int cpu) +void rcu_sched_qs(void) { - struct rcu_data *rdp = &per_cpu(rcu_sched_data, cpu); - - if (rdp->passed_quiesce == 0) - trace_rcu_grace_period(TPS("rcu_sched"), rdp->gpnum, TPS("cpuqs")); - rdp->passed_quiesce = 1; + if (!__this_cpu_read(rcu_sched_data.passed_quiesce)) { + trace_rcu_grace_period(TPS("rcu_sched"), + __this_cpu_read(rcu_sched_data.gpnum), + TPS("cpuqs")); + __this_cpu_write(rcu_sched_data.passed_quiesce, 1); + } } -void rcu_bh_qs(int cpu) +void rcu_bh_qs(void) { - struct rcu_data *rdp = &per_cpu(rcu_bh_data, cpu); - - if (rdp->passed_quiesce == 0) - trace_rcu_grace_period(TPS("rcu_bh"), rdp->gpnum, TPS("cpuqs")); - rdp->passed_quiesce = 1; + if (!__this_cpu_read(rcu_bh_data.passed_quiesce)) { + trace_rcu_grace_period(TPS("rcu_bh"), + __this_cpu_read(rcu_bh_data.gpnum), + TPS("cpuqs")); + __this_cpu_write(rcu_bh_data.passed_quiesce, 1); + } } static DEFINE_PER_CPU(int, rcu_sched_qs_mask); @@ -278,7 +280,7 @@ static void rcu_momentary_dyntick_idle(void) void rcu_note_context_switch(int cpu) { trace_rcu_utilization(TPS("Start context switch")); - rcu_sched_qs(cpu); + rcu_sched_qs(); rcu_preempt_note_context_switch(cpu); if (unlikely(raw_cpu_read(rcu_sched_qs_mask))) rcu_momentary_dyntick_idle(); @@ -2395,8 +2397,8 @@ void rcu_check_callbacks(int cpu, int user) * at least not while the corresponding CPU is online. */ - rcu_sched_qs(cpu); - rcu_bh_qs(cpu); + rcu_sched_qs(); + rcu_bh_qs(); } else if (!in_softirq()) { @@ -2407,7 +2409,7 @@ void rcu_check_callbacks(int cpu, int user) * critical section, so note it. */ - rcu_bh_qs(cpu); + rcu_bh_qs(); } rcu_preempt_check_callbacks(cpu); if (rcu_pending(cpu)) diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 0981c0c..25e692a 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -158,14 +158,16 @@ EXPORT_SYMBOL_GPL(rcu_batches_completed); * As with the other rcu_*_qs() functions, callers to this function * must disable preemption. */ -static void rcu_preempt_qs(int cpu) -{ - struct rcu_data *rdp = &per_cpu(rcu_preempt_data, cpu); - - if (rdp->passed_quiesce == 0) - trace_rcu_grace_period(TPS("rcu_preempt"), rdp->gpnum, TPS("cpuqs")); - rdp->passed_quiesce = 1; - current->rcu_read_unlock_special.b.need_qs = false; +static void rcu_preempt_qs(void) +{ + if (!__this_cpu_read(rcu_preempt_data.passed_quiesce)) { + trace_rcu_grace_period(TPS("rcu_preempt"), + __this_cpu_read(rcu_preempt_data.gpnum), + TPS("cpuqs")); + __this_cpu_write(rcu_preempt_data.passed_quiesce, 1); + barrier(); /* Coordinate with rcu_preempt_check_callbacks(). */ + current->rcu_read_unlock_special.b.need_qs = false; + } } /* @@ -256,7 +258,7 @@ static void rcu_preempt_note_context_switch(int cpu) * grace period, then the fact that the task has been enqueued * means that we continue to block the current grace period. */ - rcu_preempt_qs(cpu); + rcu_preempt_qs(); } /* @@ -352,7 +354,7 @@ void rcu_read_unlock_special(struct task_struct *t) */ special = t->rcu_read_unlock_special; if (special.b.need_qs) { - rcu_preempt_qs(smp_processor_id()); + rcu_preempt_qs(); if (!t->rcu_read_unlock_special.s) { local_irq_restore(flags); return; @@ -651,11 +653,12 @@ static void rcu_preempt_check_callbacks(int cpu) struct task_struct *t = current; if (t->rcu_read_lock_nesting == 0) { - rcu_preempt_qs(cpu); + rcu_preempt_qs(); return; } if (t->rcu_read_lock_nesting > 0 && - per_cpu(rcu_preempt_data, cpu).qs_pending) + per_cpu(rcu_preempt_data, cpu).qs_pending && + !per_cpu(rcu_preempt_data, cpu).passed_quiesce) t->rcu_read_unlock_special.b.need_qs = true; } diff --git a/kernel/softirq.c b/kernel/softirq.c index 5918d22..348ec76 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -278,7 +278,7 @@ restart: pending >>= softirq_bit; } - rcu_bh_qs(smp_processor_id()); + rcu_bh_qs(); local_irq_disable(); pending = local_softirq_pending(); -- cgit v0.10.2 From a53dd6a65668850493cce94395c1b88a015eb338 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 10 Sep 2014 14:48:05 -0700 Subject: rcutorture: Add RCU-tasks tests to default rcutorture list Although the test cases have been added, they must be specified explicitly via the kvm.sh --configs argument in order to run them. This commit therefore adds the RCU-tasks tests to the CFLIST so that they will be run automatically by default. Signed-off-by: Paul E. McKenney diff --git a/tools/testing/selftests/rcutorture/configs/rcu/CFLIST b/tools/testing/selftests/rcutorture/configs/rcu/CFLIST index cd3d29c..a3a1a05 100644 --- a/tools/testing/selftests/rcutorture/configs/rcu/CFLIST +++ b/tools/testing/selftests/rcutorture/configs/rcu/CFLIST @@ -11,3 +11,6 @@ SRCU-N SRCU-P TINY01 TINY02 +TASKS01 +TASKS02 +TASKS03 -- cgit v0.10.2 From f4579fc57cf4244057b713b1f73f4dc9f0b11e97 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 25 Jul 2014 11:21:47 -0700 Subject: rcu: Fix attempt to avoid unsolicited offloading of callbacks Commit b58cc46c5f6b (rcu: Don't offload callbacks unless specifically requested) failed to adjust the callback lists of the CPUs that are known to be no-CBs CPUs only because they are also nohz_full= CPUs. This failure can result in callbacks that are posted during early boot getting stranded on nxtlist for CPUs whose no-CBs property becomes apparent late, and there can also be spurious warnings about offline CPUs posting callbacks. This commit fixes these problems by adding an early-boot rcu_init_nohz() that properly initializes the no-CBs CPUs. Note that kernels built with CONFIG_RCU_NOCB_CPU_ALL=y or with CONFIG_RCU_NOCB_CPU=n do not exhibit this bug. Neither do kernels booted without the nohz_full= boot parameter. Signed-off-by: Paul E. McKenney Reviewed-by: Pranith Kumar Tested-by: Paul Gortmaker diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index d231aa1..cc7bed1 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -269,6 +269,14 @@ static inline void rcu_user_hooks_switch(struct task_struct *prev, struct task_struct *next) { } #endif /* CONFIG_RCU_USER_QS */ +#ifdef CONFIG_RCU_NOCB_CPU +void rcu_init_nohz(void); +#else /* #ifdef CONFIG_RCU_NOCB_CPU */ +static inline void rcu_init_nohz(void) +{ +} +#endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */ + /** * RCU_NONIDLE - Indicate idle-loop code that needs RCU readers * @a: Code that RCU needs to pay attention to. diff --git a/init/Kconfig b/init/Kconfig index e84c642..64ee4d9 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -737,7 +737,7 @@ choice config RCU_NOCB_CPU_NONE bool "No build_forced no-CBs CPUs" - depends on RCU_NOCB_CPU && !NO_HZ_FULL_ALL + depends on RCU_NOCB_CPU help This option does not force any of the CPUs to be no-CBs CPUs. Only CPUs designated by the rcu_nocbs= boot parameter will be @@ -751,7 +751,7 @@ config RCU_NOCB_CPU_NONE config RCU_NOCB_CPU_ZERO bool "CPU 0 is a build_forced no-CBs CPU" - depends on RCU_NOCB_CPU && !NO_HZ_FULL_ALL + depends on RCU_NOCB_CPU help This option forces CPU 0 to be a no-CBs CPU, so that its RCU callbacks are invoked by a per-CPU kthread whose name begins diff --git a/init/main.c b/init/main.c index bb1aed9..e3c4cdd 100644 --- a/init/main.c +++ b/init/main.c @@ -578,6 +578,7 @@ asmlinkage __visible void __init start_kernel(void) idr_init_cache(); rcu_init(); tick_nohz_init(); + rcu_init_nohz(); context_tracking_init(); radix_tree_init(); /* init some links before init_ISA_irqs() */ diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index a7997e2..06d077c 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -85,33 +85,6 @@ static void __init rcu_bootup_announce_oddness(void) pr_info("\tBoot-time adjustment of leaf fanout to %d.\n", rcu_fanout_leaf); if (nr_cpu_ids != NR_CPUS) pr_info("\tRCU restricting CPUs from NR_CPUS=%d to nr_cpu_ids=%d.\n", NR_CPUS, nr_cpu_ids); -#ifdef CONFIG_RCU_NOCB_CPU -#ifndef CONFIG_RCU_NOCB_CPU_NONE - if (!have_rcu_nocb_mask) { - zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL); - have_rcu_nocb_mask = true; - } -#ifdef CONFIG_RCU_NOCB_CPU_ZERO - pr_info("\tOffload RCU callbacks from CPU 0\n"); - cpumask_set_cpu(0, rcu_nocb_mask); -#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ZERO */ -#ifdef CONFIG_RCU_NOCB_CPU_ALL - pr_info("\tOffload RCU callbacks from all CPUs\n"); - cpumask_copy(rcu_nocb_mask, cpu_possible_mask); -#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ALL */ -#endif /* #ifndef CONFIG_RCU_NOCB_CPU_NONE */ - if (have_rcu_nocb_mask) { - if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) { - pr_info("\tNote: kernel parameter 'rcu_nocbs=' contains nonexistent CPUs.\n"); - cpumask_and(rcu_nocb_mask, cpu_possible_mask, - rcu_nocb_mask); - } - cpulist_scnprintf(nocb_buf, sizeof(nocb_buf), rcu_nocb_mask); - pr_info("\tOffload RCU callbacks from CPUs: %s.\n", nocb_buf); - if (rcu_nocb_poll) - pr_info("\tPoll for callbacks from no-CBs CPUs.\n"); - } -#endif /* #ifdef CONFIG_RCU_NOCB_CPU */ } #ifdef CONFIG_TREE_PREEMPT_RCU @@ -2451,6 +2424,67 @@ static void do_nocb_deferred_wakeup(struct rcu_data *rdp) trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("DeferredWakeEmpty")); } +void __init rcu_init_nohz(void) +{ + int cpu; + bool need_rcu_nocb_mask = true; + struct rcu_state *rsp; + +#ifdef CONFIG_RCU_NOCB_CPU_NONE + need_rcu_nocb_mask = false; +#endif /* #ifndef CONFIG_RCU_NOCB_CPU_NONE */ + +#if defined(CONFIG_NO_HZ_FULL) + if (tick_nohz_full_running && cpumask_weight(tick_nohz_full_mask)) + need_rcu_nocb_mask = true; +#endif /* #if defined(CONFIG_NO_HZ_FULL) */ + + if (!have_rcu_nocb_mask && need_rcu_nocb_mask) { + zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL); + have_rcu_nocb_mask = true; + } + if (!have_rcu_nocb_mask) + return; + +#ifdef CONFIG_RCU_NOCB_CPU_ZERO + pr_info("\tOffload RCU callbacks from CPU 0\n"); + cpumask_set_cpu(0, rcu_nocb_mask); +#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ZERO */ +#ifdef CONFIG_RCU_NOCB_CPU_ALL + pr_info("\tOffload RCU callbacks from all CPUs\n"); + cpumask_copy(rcu_nocb_mask, cpu_possible_mask); +#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ALL */ +#if defined(CONFIG_NO_HZ_FULL) + if (tick_nohz_full_running) + cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask); +#endif /* #if defined(CONFIG_NO_HZ_FULL) */ + + if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) { + pr_info("\tNote: kernel parameter 'rcu_nocbs=' contains nonexistent CPUs.\n"); + cpumask_and(rcu_nocb_mask, cpu_possible_mask, + rcu_nocb_mask); + } + cpulist_scnprintf(nocb_buf, sizeof(nocb_buf), rcu_nocb_mask); + pr_info("\tOffload RCU callbacks from CPUs: %s.\n", nocb_buf); + if (rcu_nocb_poll) + pr_info("\tPoll for callbacks from no-CBs CPUs.\n"); + + for_each_rcu_flavor(rsp) { + for_each_cpu(cpu, rcu_nocb_mask) { + struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu); + + /* + * If there are early callbacks, they will need + * to be moved to the nocb lists. + */ + WARN_ON_ONCE(rdp->nxttail[RCU_NEXT_TAIL] != + &rdp->nxtlist && + rdp->nxttail[RCU_NEXT_TAIL] != NULL); + init_nocb_callback_list(rdp); + } + } +} + /* Initialize per-rcu_data variables for no-CBs CPUs. */ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp) { @@ -2479,10 +2513,6 @@ static void __init rcu_spawn_nocb_kthreads(struct rcu_state *rsp) if (rcu_nocb_mask == NULL) return; -#if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL) - if (tick_nohz_full_running) - cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask); -#endif /* #if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL) */ if (ls == -1) { ls = int_sqrt(nr_cpu_ids); rcu_nocb_leader_stride = ls; -- cgit v0.10.2 From 949cccdbe6d286544ce3fe170298183eb7ada81c Mon Sep 17 00:00:00 2001 From: Pranith Kumar Date: Fri, 25 Jul 2014 16:02:07 -0700 Subject: rcu: Check the return value of zalloc_cpumask_var() This commit checks the return value of the zalloc_cpumask_var() used for allocating cpumask for rcu_nocb_mask. Signed-off-by: Pranith Kumar Signed-off-by: Paul E. McKenney Tested-by: Paul Gortmaker diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 06d077c..105b0ce 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2440,7 +2440,10 @@ void __init rcu_init_nohz(void) #endif /* #if defined(CONFIG_NO_HZ_FULL) */ if (!have_rcu_nocb_mask && need_rcu_nocb_mask) { - zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL); + if (!zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL)) { + pr_info("rcu_nocb_mask allocation failed, callback offloading disabled.\n"); + return; + } have_rcu_nocb_mask = true; } if (!have_rcu_nocb_mask) -- cgit v0.10.2 From c271d3a957384a162f7a6aae53455d8e8afd1f3e Mon Sep 17 00:00:00 2001 From: Pranith Kumar Date: Tue, 8 Jul 2014 18:26:14 -0400 Subject: rcu: Use true/false for return in __call_rcu_nocb() Return true/false instead of 0/1 in __call_rcu_nocb() as this returns a bool type. Signed-off-by: Pranith Kumar Signed-off-by: Paul E. McKenney Tested-by: Paul Gortmaker diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 105b0ce..36c678b 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2123,7 +2123,7 @@ static bool __call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *rhp, { if (!rcu_is_nocb_cpu(rdp->cpu)) - return 0; + return false; __call_rcu_nocb_enqueue(rdp, rhp, &rhp->next, 1, lazy, flags); if (__is_kfree_rcu_offset((unsigned long)rhp->func)) trace_rcu_kfree_callback(rdp->rsp->name, rhp, @@ -2134,7 +2134,7 @@ static bool __call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *rhp, trace_rcu_callback(rdp->rsp->name, rhp, -atomic_long_read(&rdp->nocb_q_count_lazy), -atomic_long_read(&rdp->nocb_q_count)); - return 1; + return true; } /* -- cgit v0.10.2 From 0a9e1e111b3a9e1c21d2dd27ca361cd9601d99af Mon Sep 17 00:00:00 2001 From: Pranith Kumar Date: Tue, 8 Jul 2014 18:26:15 -0400 Subject: rcu: Use true/false for return in rcu_nocb_adopt_orphan_cbs() Return true/false in rcu_nocb_adopt_orphan_cbs() instead of 0/1 as this function has return type of bool. Signed-off-by: Pranith Kumar Signed-off-by: Paul E. McKenney Tested-by: Paul Gortmaker diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 36c678b..6625841 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2150,7 +2150,7 @@ static bool __maybe_unused rcu_nocb_adopt_orphan_cbs(struct rcu_state *rsp, /* If this is not a no-CBs CPU, tell the caller to do it the old way. */ if (!rcu_is_nocb_cpu(smp_processor_id())) - return 0; + return false; rsp->qlen = 0; rsp->qlen_lazy = 0; @@ -2169,7 +2169,7 @@ static bool __maybe_unused rcu_nocb_adopt_orphan_cbs(struct rcu_state *rsp, rsp->orphan_nxtlist = NULL; rsp->orphan_nxttail = &rsp->orphan_nxtlist; } - return 1; + return true; } /* -- cgit v0.10.2 From 4afc7e269befc7b6e09a994e48c67e36f4a378e1 Mon Sep 17 00:00:00 2001 From: Pranith Kumar Date: Tue, 8 Jul 2014 18:26:16 -0400 Subject: rcu: Use false for return in __call_rcu_nocb() Return false instead of 0 in __call_rcu_nocb() as this has bool as return type. Signed-off-by: Pranith Kumar Signed-off-by: Paul E. McKenney Tested-by: Paul Gortmaker diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 6625841..4271104 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2574,7 +2574,7 @@ static void rcu_init_one_nocb(struct rcu_node *rnp) static bool __call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *rhp, bool lazy, unsigned long flags) { - return 0; + return false; } static bool __maybe_unused rcu_nocb_adopt_orphan_cbs(struct rcu_state *rsp, -- cgit v0.10.2 From f4aa84ba24872e3a8e59b58bc8533cae95597f2e Mon Sep 17 00:00:00 2001 From: Pranith Kumar Date: Tue, 8 Jul 2014 18:26:17 -0400 Subject: rcu: Return false instead of 0 in rcu_nocb_adopt_orphan_cbs() Return false instead of 0 in rcu_nocb_adopt_orphan_cbs() as this has bool as return type. Signed-off-by: Pranith Kumar Signed-off-by: Paul E. McKenney Tested-by: Paul Gortmaker diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 4271104..4c1af96 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2581,7 +2581,7 @@ static bool __maybe_unused rcu_nocb_adopt_orphan_cbs(struct rcu_state *rsp, struct rcu_data *rdp, unsigned long flags) { - return 0; + return false; } static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp) -- cgit v0.10.2 From 9386c0b75dda05f535a10ea1abf1817fe292c81c Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Sun, 13 Jul 2014 12:00:53 -0700 Subject: rcu: Rationalize kthread spawning Currently, RCU spawns kthreads from several different early_initcall() functions. Although this has served RCU well for quite some time, as more kthreads are added a more deterministic approach is required. This commit therefore causes all of RCU's early-boot kthreads to be spawned from a single early_initcall() function. Signed-off-by: Paul E. McKenney Reviewed-by: Josh Triplett Tested-by: Paul Gortmaker diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 1b70cb6..9be47f4 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3489,7 +3489,7 @@ static int rcu_pm_notify(struct notifier_block *self, } /* - * Spawn the kthread that handles this RCU flavor's grace periods. + * Spawn the kthreads that handle each RCU flavor's grace periods. */ static int __init rcu_spawn_gp_kthread(void) { @@ -3498,6 +3498,7 @@ static int __init rcu_spawn_gp_kthread(void) struct rcu_state *rsp; struct task_struct *t; + rcu_scheduler_fully_active = 1; for_each_rcu_flavor(rsp) { t = kthread_run(rcu_gp_kthread, rsp, "%s", rsp->name); BUG_ON(IS_ERR(t)); @@ -3507,6 +3508,7 @@ static int __init rcu_spawn_gp_kthread(void) raw_spin_unlock_irqrestore(&rnp->lock, flags); rcu_spawn_nocb_kthreads(rsp); } + rcu_spawn_boost_kthreads(); return 0; } early_initcall(rcu_spawn_gp_kthread); diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 6a86eb7..a966092 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -572,6 +572,7 @@ static void rcu_preempt_do_callbacks(void); static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp, struct rcu_node *rnp); #endif /* #ifdef CONFIG_RCU_BOOST */ +static void __init rcu_spawn_boost_kthreads(void); static void rcu_prepare_kthreads(int cpu); static void rcu_cleanup_after_idle(int cpu); static void rcu_prepare_for_idle(int cpu); diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 4c1af96..410c744 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1435,14 +1435,13 @@ static struct smp_hotplug_thread rcu_cpu_thread_spec = { }; /* - * Spawn all kthreads -- called as soon as the scheduler is running. + * Spawn boost kthreads -- called as soon as the scheduler is running. */ -static int __init rcu_spawn_kthreads(void) +static void __init rcu_spawn_boost_kthreads(void) { struct rcu_node *rnp; int cpu; - rcu_scheduler_fully_active = 1; for_each_possible_cpu(cpu) per_cpu(rcu_cpu_has_work, cpu) = 0; BUG_ON(smpboot_register_percpu_thread(&rcu_cpu_thread_spec)); @@ -1452,9 +1451,7 @@ static int __init rcu_spawn_kthreads(void) rcu_for_each_leaf_node(rcu_state_p, rnp) (void)rcu_spawn_one_boost_kthread(rcu_state_p, rnp); } - return 0; } -early_initcall(rcu_spawn_kthreads); static void rcu_prepare_kthreads(int cpu) { @@ -1492,12 +1489,9 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) { } -static int __init rcu_scheduler_really_started(void) +static void __init rcu_spawn_boost_kthreads(void) { - rcu_scheduler_fully_active = 1; - return 0; } -early_initcall(rcu_scheduler_really_started); static void rcu_prepare_kthreads(int cpu) { -- cgit v0.10.2 From 35ce7f29a44a888c45c0a9f202f69e10613c5306 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 11 Jul 2014 11:30:24 -0700 Subject: rcu: Create rcuo kthreads only for onlined CPUs RCU currently uses for_each_possible_cpu() to spawn rcuo kthreads, which can result in more rcuo kthreads than one would expect, for example, derRichard reported 64 CPUs worth of rcuo kthreads on an 8-CPU image. This commit therefore creates rcuo kthreads only for those CPUs that actually come online. This was reported by derRichard on the OFTC IRC network. Reported-by: Richard Weinberger Signed-off-by: Paul E. McKenney Reviewed-by: Josh Triplett Tested-by: Paul Gortmaker diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 9be47f4..b49c843 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3442,6 +3442,7 @@ static int rcu_cpu_notify(struct notifier_block *self, case CPU_UP_PREPARE_FROZEN: rcu_prepare_cpu(cpu); rcu_prepare_kthreads(cpu); + rcu_spawn_all_nocb_kthreads(cpu); break; case CPU_ONLINE: case CPU_DOWN_FAILED: @@ -3506,8 +3507,8 @@ static int __init rcu_spawn_gp_kthread(void) raw_spin_lock_irqsave(&rnp->lock, flags); rsp->gp_kthread = t; raw_spin_unlock_irqrestore(&rnp->lock, flags); - rcu_spawn_nocb_kthreads(rsp); } + rcu_spawn_nocb_kthreads(); rcu_spawn_boost_kthreads(); return 0; } diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index a966092..a9a226d 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -593,7 +593,11 @@ static bool rcu_nocb_adopt_orphan_cbs(struct rcu_state *rsp, static bool rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp); static void do_nocb_deferred_wakeup(struct rcu_data *rdp); static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp); -static void rcu_spawn_nocb_kthreads(struct rcu_state *rsp); +static void rcu_spawn_all_nocb_kthreads(int cpu); +static void __init rcu_spawn_nocb_kthreads(void); +#ifdef CONFIG_RCU_NOCB_CPU +static void __init rcu_organize_nocb_kthreads(struct rcu_state *rsp); +#endif /* #ifdef CONFIG_RCU_NOCB_CPU */ static void __maybe_unused rcu_kick_nohz_cpu(int cpu); static bool init_nocb_callback_list(struct rcu_data *rdp); static void rcu_sysidle_enter(struct rcu_dynticks *rdtp, int irq); diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 410c744..31c7afb 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2479,6 +2479,7 @@ void __init rcu_init_nohz(void) rdp->nxttail[RCU_NEXT_TAIL] != NULL); init_nocb_callback_list(rdp); } + rcu_organize_nocb_kthreads(rsp); } } @@ -2490,15 +2491,85 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp) rdp->nocb_follower_tail = &rdp->nocb_follower_head; } +/* + * If the specified CPU is a no-CBs CPU that does not already have its + * rcuo kthread for the specified RCU flavor, spawn it. If the CPUs are + * brought online out of order, this can require re-organizing the + * leader-follower relationships. + */ +static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu) +{ + struct rcu_data *rdp; + struct rcu_data *rdp_last; + struct rcu_data *rdp_old_leader; + struct rcu_data *rdp_spawn = per_cpu_ptr(rsp->rda, cpu); + struct task_struct *t; + + /* + * If this isn't a no-CBs CPU or if it already has an rcuo kthread, + * then nothing to do. + */ + if (!rcu_is_nocb_cpu(cpu) || rdp_spawn->nocb_kthread) + return; + + /* If we didn't spawn the leader first, reorganize! */ + rdp_old_leader = rdp_spawn->nocb_leader; + if (rdp_old_leader != rdp_spawn && !rdp_old_leader->nocb_kthread) { + rdp_last = NULL; + rdp = rdp_old_leader; + do { + rdp->nocb_leader = rdp_spawn; + if (rdp_last && rdp != rdp_spawn) + rdp_last->nocb_next_follower = rdp; + rdp_last = rdp; + rdp = rdp->nocb_next_follower; + rdp_last->nocb_next_follower = NULL; + } while (rdp); + rdp_spawn->nocb_next_follower = rdp_old_leader; + } + + /* Spawn the kthread for this CPU and RCU flavor. */ + t = kthread_run(rcu_nocb_kthread, rdp_spawn, + "rcuo%c/%d", rsp->abbr, cpu); + BUG_ON(IS_ERR(t)); + ACCESS_ONCE(rdp_spawn->nocb_kthread) = t; +} + +/* + * If the specified CPU is a no-CBs CPU that does not already have its + * rcuo kthreads, spawn them. + */ +static void rcu_spawn_all_nocb_kthreads(int cpu) +{ + struct rcu_state *rsp; + + if (rcu_scheduler_fully_active) + for_each_rcu_flavor(rsp) + rcu_spawn_one_nocb_kthread(rsp, cpu); +} + +/* + * Once the scheduler is running, spawn rcuo kthreads for all online + * no-CBs CPUs. This assumes that the early_initcall()s happen before + * non-boot CPUs come online -- if this changes, we will need to add + * some mutual exclusion. + */ +static void __init rcu_spawn_nocb_kthreads(void) +{ + int cpu; + + for_each_online_cpu(cpu) + rcu_spawn_all_nocb_kthreads(cpu); +} + /* How many follower CPU IDs per leader? Default of -1 for sqrt(nr_cpu_ids). */ static int rcu_nocb_leader_stride = -1; module_param(rcu_nocb_leader_stride, int, 0444); /* - * Create a kthread for each RCU flavor for each no-CBs CPU. - * Also initialize leader-follower relationships. + * Initialize leader-follower relationships for all no-CBs CPU. */ -static void __init rcu_spawn_nocb_kthreads(struct rcu_state *rsp) +static void __init rcu_organize_nocb_kthreads(struct rcu_state *rsp) { int cpu; int ls = rcu_nocb_leader_stride; @@ -2506,7 +2577,6 @@ static void __init rcu_spawn_nocb_kthreads(struct rcu_state *rsp) struct rcu_data *rdp; struct rcu_data *rdp_leader = NULL; /* Suppress misguided gcc warn. */ struct rcu_data *rdp_prev = NULL; - struct task_struct *t; if (rcu_nocb_mask == NULL) return; @@ -2532,12 +2602,6 @@ static void __init rcu_spawn_nocb_kthreads(struct rcu_state *rsp) rdp_prev->nocb_next_follower = rdp; } rdp_prev = rdp; - - /* Spawn the kthread for this CPU. */ - t = kthread_run(rcu_nocb_kthread, rdp, - "rcuo%c/%d", rsp->abbr, cpu); - BUG_ON(IS_ERR(t)); - ACCESS_ONCE(rdp->nocb_kthread) = t; } } @@ -2591,7 +2655,11 @@ static void do_nocb_deferred_wakeup(struct rcu_data *rdp) { } -static void __init rcu_spawn_nocb_kthreads(struct rcu_state *rsp) +static void rcu_spawn_all_nocb_kthreads(int cpu) +{ +} + +static void __init rcu_spawn_nocb_kthreads(void) { } -- cgit v0.10.2 From 22c2f669611590b428647ac9a73bc63ef3989d4b Mon Sep 17 00:00:00 2001 From: Pranith Kumar Date: Thu, 17 Jul 2014 20:11:01 -0400 Subject: rcu: Check for have_rcu_nocb_mask instead of rcu_nocb_mask If we configure a kernel with CONFIG_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_NONE=y and CONFIG_CPUMASK_OFFSTACK=n and do not pass in a rcu_nocb= boot parameter, the cpumask rcu_nocb_mask can be garbage instead of NULL. Hence this commit replaces checks for rcu_nocb_mask == NULL with a check for have_rcu_nocb_mask. Signed-off-by: Pranith Kumar Signed-off-by: Paul E. McKenney Tested-by: Paul Gortmaker diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 31c7afb..39e68bc 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2578,7 +2578,7 @@ static void __init rcu_organize_nocb_kthreads(struct rcu_state *rsp) struct rcu_data *rdp_leader = NULL; /* Suppress misguided gcc warn. */ struct rcu_data *rdp_prev = NULL; - if (rcu_nocb_mask == NULL) + if (!have_rcu_nocb_mask) return; if (ls == -1) { ls = int_sqrt(nr_cpu_ids); @@ -2608,9 +2608,9 @@ static void __init rcu_organize_nocb_kthreads(struct rcu_state *rsp) /* Prevent __call_rcu() from enqueuing callbacks on no-CBs CPUs */ static bool init_nocb_callback_list(struct rcu_data *rdp) { - if (rcu_nocb_mask == NULL || - !cpumask_test_cpu(rdp->cpu, rcu_nocb_mask)) + if (!rcu_is_nocb_cpu(rdp->cpu)) return false; + rdp->nxttail[RCU_NEXT_TAIL] = NULL; return true; } -- cgit v0.10.2 From 417e8d26557c4264a484d78a7491316751afa46f Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 21 Jul 2014 11:26:54 -0700 Subject: rcu: Eliminate redundant rcu_sysidle_state variable Now that we have rcu_state_p, which references rcu_preempt_state for TREE_PREEMPT_RCU and rcu_sched_state for TREE_RCU, we don't need a separate rcu_sysidle_state variable. This commit therefore eliminates rcu_preempt_state in favor of rcu_state_p. Signed-off-by: Paul E. McKenney Reviewed-by: Pranith Kumar Acked-by: Frederic Weisbecker Tested-by: Paul Gortmaker diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 39e68bc..3ddad4f 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2690,16 +2690,6 @@ static void __maybe_unused rcu_kick_nohz_cpu(int cpu) #ifdef CONFIG_NO_HZ_FULL_SYSIDLE -/* - * Define RCU flavor that holds sysidle state. This needs to be the - * most active flavor of RCU. - */ -#ifdef CONFIG_PREEMPT_RCU -static struct rcu_state *rcu_sysidle_state = &rcu_preempt_state; -#else /* #ifdef CONFIG_PREEMPT_RCU */ -static struct rcu_state *rcu_sysidle_state = &rcu_sched_state; -#endif /* #else #ifdef CONFIG_PREEMPT_RCU */ - static int full_sysidle_state; /* Current system-idle state. */ #define RCU_SYSIDLE_NOT 0 /* Some CPU is not idle. */ #define RCU_SYSIDLE_SHORT 1 /* All CPUs idle for brief period. */ @@ -2841,7 +2831,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, * not the flavor of RCU that tracks sysidle state, or if this * is an offline or the timekeeping CPU, nothing to do. */ - if (!*isidle || rdp->rsp != rcu_sysidle_state || + if (!*isidle || rdp->rsp != rcu_state_p || cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu) return; if (rcu_gp_in_progress(rdp->rsp)) @@ -2867,7 +2857,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, */ static bool is_sysidle_rcu_state(struct rcu_state *rsp) { - return rsp == rcu_sysidle_state; + return rsp == rcu_state_p; } /* @@ -2945,7 +2935,7 @@ static void rcu_sysidle_cancel(void) static void rcu_sysidle_report(struct rcu_state *rsp, int isidle, unsigned long maxj, bool gpkt) { - if (rsp != rcu_sysidle_state) + if (rsp != rcu_state_p) return; /* Wrong flavor, ignore. */ if (gpkt && nr_cpu_ids <= CONFIG_NO_HZ_FULL_SYSIDLE_SMALL) return; /* Running state machine from timekeeping CPU. */ @@ -3014,13 +3004,12 @@ bool rcu_sys_is_idle(void) /* Scan all the CPUs looking for nonidle CPUs. */ for_each_possible_cpu(cpu) { - rdp = per_cpu_ptr(rcu_sysidle_state->rda, cpu); + rdp = per_cpu_ptr(rcu_state_p->rda, cpu); rcu_sysidle_check_cpu(rdp, &isidle, &maxj); if (!isidle) break; } - rcu_sysidle_report(rcu_sysidle_state, - isidle, maxj, false); + rcu_sysidle_report(rcu_state_p, isidle, maxj, false); oldrss = rss; rss = ACCESS_ONCE(full_sysidle_state); } @@ -3047,7 +3036,7 @@ bool rcu_sys_is_idle(void) * provided by the memory allocator. */ if (nr_cpu_ids > CONFIG_NO_HZ_FULL_SYSIDLE_SMALL && - !rcu_gp_in_progress(rcu_sysidle_state) && + !rcu_gp_in_progress(rcu_state_p) && !rsh.inuse && xchg(&rsh.inuse, 1) == 0) call_rcu(&rsh.rh, rcu_sysidle_cb); return false; -- cgit v0.10.2 From 663e131090dd10bac9dc0b4f5b624dd3211b20f6 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 21 Jul 2014 11:34:33 -0700 Subject: rcu: Don't track sysidle state if no nohz_full= CPUs If there are no nohz_full= CPUs, then there is currently no reason to track sysidle state. This commit therefore short-circuits this state tracking if !tick_nohz_full_enabled(). Note that these checks will need to be revisited if nohz_full= state can ever be changed at runtime. Signed-off-by: Paul E. McKenney Acked-by: Frederic Weisbecker Tested-by: Paul Gortmaker diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 3ddad4f..d5aec54 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2707,6 +2707,10 @@ static void rcu_sysidle_enter(struct rcu_dynticks *rdtp, int irq) { unsigned long j; + /* If there are no nohz_full= CPUs, no need to track this. */ + if (!tick_nohz_full_enabled()) + return; + /* Adjust nesting, check for fully idle. */ if (irq) { rdtp->dynticks_idle_nesting--; @@ -2772,6 +2776,10 @@ void rcu_sysidle_force_exit(void) */ static void rcu_sysidle_exit(struct rcu_dynticks *rdtp, int irq) { + /* If there are no nohz_full= CPUs, no need to track this. */ + if (!tick_nohz_full_enabled()) + return; + /* Adjust nesting, check for already non-idle. */ if (irq) { rdtp->dynticks_idle_nesting++; @@ -2826,6 +2834,10 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, unsigned long j; struct rcu_dynticks *rdtp = rdp->dynticks; + /* If there are no nohz_full= CPUs, don't check system-wide idleness. */ + if (!tick_nohz_full_enabled()) + return; + /* * If some other CPU has already reported non-idle, if this is * not the flavor of RCU that tracks sysidle state, or if this @@ -2952,6 +2964,10 @@ static void rcu_sysidle_report(struct rcu_state *rsp, int isidle, static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, unsigned long maxj) { + /* If there are no nohz_full= CPUs, no need to track this. */ + if (!tick_nohz_full_enabled()) + return; + rcu_sysidle_report(rsp, isidle, maxj, true); } @@ -2978,7 +2994,8 @@ static void rcu_sysidle_cb(struct rcu_head *rhp) /* * Check to see if the system is fully idle, other than the timekeeping CPU. - * The caller must have disabled interrupts. + * The caller must have disabled interrupts. This is not intended to be + * called unless tick_nohz_full_enabled(). */ bool rcu_sys_is_idle(void) { -- cgit v0.10.2 From 39953dfd40077c7480b1d5deb4d617e086b1c865 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 12 Aug 2014 10:47:48 -0700 Subject: rcu: Avoid misordering in __call_rcu_nocb_enqueue() The NOCB leader wakeup ordering depends on the store to the header happening before the check for the leader already being awake. However, because atomic_long_add() does not return a value, it does not provide ordering guarantees, the incorrect comment in wake_nocb_leader() notwithstanding. This commit therefore adds a smp_mb__after_atomic() after the final atomic_long_add() to provide the needed ordering guarantee. Reported-by: Amit Shah Signed-off-by: Paul E. McKenney Tested-by: Paul Gortmaker diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index d5aec54..4ad63d8 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2042,7 +2042,7 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force) if (!ACCESS_ONCE(rdp_leader->nocb_kthread)) return; if (ACCESS_ONCE(rdp_leader->nocb_leader_sleep) || force) { - /* Prior xchg orders against prior callback enqueue. */ + /* Prior smp_mb__after_atomic() orders against prior enqueue. */ ACCESS_ONCE(rdp_leader->nocb_leader_sleep) = false; wake_up(&rdp_leader->nocb_wq); } @@ -2071,6 +2071,7 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp, ACCESS_ONCE(*old_rhpp) = rhp; atomic_long_add(rhcount, &rdp->nocb_q_count); atomic_long_add(rhcount_lazy, &rdp->nocb_q_count_lazy); + smp_mb__after_atomic(); /* Store *old_rhpp before _wake test. */ /* If we are not being polled and there is a kthread, awaken it ... */ t = ACCESS_ONCE(rdp->nocb_kthread); -- cgit v0.10.2 From 1772947bd0126661866069157e95197e9c0020e9 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 12 Aug 2014 11:27:31 -0700 Subject: rcu: Handle NOCB callbacks from irq-disabled idle code If an RCU callback is queued on a no-CBs CPU from idle code with irqs disabled, and if that CPU stays idle forever after, the callback will never be invoked. This commit therefore adds a check for this situation in ____call_rcu_nocb(), invoking the RCU core solely for the purpose of the ensuing return-to-idle transition. (If the CPU doesn't return to idle, the next scheduling-clock interrupt will fix things up.) Reported-by: Amit Shah Signed-off-by: Paul E. McKenney Tested-by: Paul Gortmaker diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 4ad63d8..8b73518 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2129,6 +2129,17 @@ static bool __call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *rhp, trace_rcu_callback(rdp->rsp->name, rhp, -atomic_long_read(&rdp->nocb_q_count_lazy), -atomic_long_read(&rdp->nocb_q_count)); + + /* + * If called from an extended quiescent state with interrupts + * disabled, invoke the RCU core in order to allow the idle-entry + * deferred-wakeup check to function. + */ + if (irqs_disabled_flags(flags) && + !rcu_is_watching() && + cpu_online(smp_processor_id())) + invoke_rcu_core(); + return true; } -- cgit v0.10.2 From c847f14217d5aec5336272a54a32ffcf6e06ddcb Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 12 Aug 2014 13:54:21 -0700 Subject: rcu: Avoid misordering in nocb_leader_wait() The NOCB follower wakeup ordering depends on the store to the tail pointer happening before the wakeup. However, because atomic_long_add() does not return a value, it does not provide ordering guarantees, and the locking in wake_up() only guarantees that the store will happen before the unlock, which might be too late. Even though this is only a theoretical issue, this commit adds a smp_mb__after_atomic() after the final atomic_long_add() to provide the needed ordering guarantee. Reported-by: Amit Shah Signed-off-by: Paul E. McKenney Tested-by: Paul Gortmaker diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 8b73518..c554acc 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2306,6 +2306,7 @@ wait_again: atomic_long_add(rdp->nocb_gp_count, &rdp->nocb_follower_count); atomic_long_add(rdp->nocb_gp_count_lazy, &rdp->nocb_follower_count_lazy); + smp_mb__after_atomic(); /* Store *tail before wakeup. */ if (rdp != my_rdp && tail == &rdp->nocb_follower_head) { /* * List was empty, wake up the follower. -- cgit v0.10.2 From 23a8e5c2d2a481fcf382490369c27b405a650212 Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Thu, 11 Sep 2014 20:40:16 -0700 Subject: locktorture: Rename locktorture_runnable parameter ... to just 'torture_runnable'. It follows other variable naming and is shorter. Signed-off-by: Davidlohr Bueso Signed-off-by: Paul E. McKenney diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index 0955b88..8c770b2 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -87,9 +87,9 @@ static struct lock_writer_stress_stats *lwsa; #else #define LOCKTORTURE_RUNNABLE_INIT 0 #endif -int locktorture_runnable = LOCKTORTURE_RUNNABLE_INIT; -module_param(locktorture_runnable, int, 0444); -MODULE_PARM_DESC(locktorture_runnable, "Start locktorture at module init"); +int torture_runnable = LOCKTORTURE_RUNNABLE_INIT; +module_param(torture_runnable, int, 0444); +MODULE_PARM_DESC(torture_runnable, "Start locktorture at module init"); /* Forward reference. */ static void lock_torture_cleanup(void); @@ -355,7 +355,7 @@ static int __init lock_torture_init(void) &lock_busted_ops, &spin_lock_ops, &spin_lock_irq_ops, }; - if (!torture_init_begin(torture_type, verbose, &locktorture_runnable)) + if (!torture_init_begin(torture_type, verbose, &torture_runnable)) return -EBUSY; /* Process args and tell the world that the torturer is on the job. */ -- cgit v0.10.2 From cdf26bb10bcb50161d452b16eb3cf2901645d625 Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Thu, 11 Sep 2014 20:40:17 -0700 Subject: locktorture: Add documentation Just like Documentation/RCU/torture.txt, begin a document for the locktorture module. This module is still pretty green, so I have just added some specific sections to the doc (general desc, params, usage, etc.). Further development should update the file. Signed-off-by: Davidlohr Bueso [ paulmck: Apply Randy Dunlap review comments. ] Signed-off-by: Paul E. McKenney diff --git a/Documentation/locking/locktorture.txt b/Documentation/locking/locktorture.txt new file mode 100644 index 0000000..3eb9b81 --- /dev/null +++ b/Documentation/locking/locktorture.txt @@ -0,0 +1,130 @@ +Kernel Lock Torture Test Operation + +CONFIG_LOCK_TORTURE_TEST + +The CONFIG LOCK_TORTURE_TEST config option provides a kernel module +that runs torture tests on core kernel locking primitives. The kernel +module, 'locktorture', may be built after the fact on the running +kernel to be tested, if desired. The tests periodically output status +messages via printk(), which can be examined via the dmesg (perhaps +grepping for "torture"). The test is started when the module is loaded, +and stops when the module is unloaded. This program is based on how RCU +is tortured, via rcutorture. + +This torture test consists of creating a number of kernel threads which +acquire the lock and hold it for specific amount of time, thus simulating +different critical region behaviors. The amount of contention on the lock +can be simulated by either enlarging this critical region hold time and/or +creating more kthreads. + + +MODULE PARAMETERS + +This module has the following parameters: + + + ** Locktorture-specific ** + +nwriters_stress Number of kernel threads that will stress exclusive lock + ownership (writers). The default value is twice the number + of online CPUs. + +torture_type Type of lock to torture. By default, only spinlocks will + be tortured. This module can torture the following locks, + with string values as follows: + + o "lock_busted": Simulates a buggy lock implementation. + + o "spin_lock": spin_lock() and spin_unlock() pairs. + + o "spin_lock_irq": spin_lock_irq() and spin_unlock_irq() + pairs. + +torture_runnable Start locktorture at boot time in the case where the + module is built into the kernel, otherwise wait for + torture_runnable to be set via sysfs before starting. + By default it will begin once the module is loaded. + + + ** Torture-framework (RCU + locking) ** + +shutdown_secs The number of seconds to run the test before terminating + the test and powering off the system. The default is + zero, which disables test termination and system shutdown. + This capability is useful for automated testing. + +onoff_interval The number of seconds between each attempt to execute a + randomly selected CPU-hotplug operation. Defaults + to zero, which disables CPU hotplugging. In + CONFIG_HOTPLUG_CPU=n kernels, locktorture will silently + refuse to do any CPU-hotplug operations regardless of + what value is specified for onoff_interval. + +onoff_holdoff The number of seconds to wait until starting CPU-hotplug + operations. This would normally only be used when + locktorture was built into the kernel and started + automatically at boot time, in which case it is useful + in order to avoid confusing boot-time code with CPUs + coming and going. This parameter is only useful if + CONFIG_HOTPLUG_CPU is enabled. + +stat_interval Number of seconds between statistics-related printk()s. + By default, locktorture will report stats every 60 seconds. + Setting the interval to zero causes the statistics to + be printed -only- when the module is unloaded, and this + is the default. + +stutter The length of time to run the test before pausing for this + same period of time. Defaults to "stutter=5", so as + to run and pause for (roughly) five-second intervals. + Specifying "stutter=0" causes the test to run continuously + without pausing, which is the old default behavior. + +shuffle_interval The number of seconds to keep the test threads affinitied + to a particular subset of the CPUs, defaults to 3 seconds. + Used in conjunction with test_no_idle_hz. + +verbose Enable verbose debugging printing, via printk(). Enabled + by default. This extra information is mostly related to + high-level errors and reports from the main 'torture' + framework. + + +STATISTICS + +Statistics are printed in the following format: + +spin_lock-torture: Writes: Total: 93746064 Max/Min: 0/0 Fail: 0 + (A) (B) (C) (D) + +(A): Lock type that is being tortured -- torture_type parameter. + +(B): Number of times the lock was acquired. + +(C): Min and max number of times threads failed to acquire the lock. + +(D): true/false values if there were errors acquiring the lock. This should + -only- be positive if there is a bug in the locking primitive's + implementation. Otherwise a lock should never fail (i.e., spin_lock()). + Of course, the same applies for (C), above. A dummy example of this is + the "lock_busted" type. + +USAGE + +The following script may be used to torture locks: + + #!/bin/sh + + modprobe locktorture + sleep 3600 + rmmod locktorture + dmesg | grep torture: + +The output can be manually inspected for the error flag of "!!!". +One could of course create a more elaborate script that automatically +checked for such errors. The "rmmod" command forces a "SUCCESS", +"FAILURE", or "RCU_HOTPLUG" indication to be printk()ed. The first +two are self-explanatory, while the last indicates that while there +were no locking failures, CPU-hotplug problems were detected. + +Also see: Documentation/RCU/torture.txt -- cgit v0.10.2 From 42ddc75ddd478edac6ad9dc8c63abb4441541af2 Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Thu, 11 Sep 2014 20:40:18 -0700 Subject: locktorture: Support mutexes Add a "mutex_lock" torture test. The main difference with the already existing spinlock tests is that the latency of the critical region is much larger. We randomly delay for (arbitrarily) either 500 ms or, otherwise, 25 ms. While this can considerably reduce the amount of writes compared to non blocking locks, if run long enough it can have the same torturous effect. Furthermore it is more representative of mutex hold times and can stress better things like thrashing. Signed-off-by: Davidlohr Bueso Signed-off-by: Paul E. McKenney diff --git a/Documentation/locking/locktorture.txt b/Documentation/locking/locktorture.txt index 3eb9b81..f2a905b 100644 --- a/Documentation/locking/locktorture.txt +++ b/Documentation/locking/locktorture.txt @@ -40,6 +40,8 @@ torture_type Type of lock to torture. By default, only spinlocks will o "spin_lock_irq": spin_lock_irq() and spin_unlock_irq() pairs. + o "mutex_lock": mutex_lock() and mutex_unlock() pairs. + torture_runnable Start locktorture at boot time in the case where the module is built into the kernel, otherwise wait for torture_runnable to be set via sysfs before starting. diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index 8c770b2..414ba45 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -66,7 +67,7 @@ torture_param(bool, verbose, true, static char *torture_type = "spin_lock"; module_param(torture_type, charp, 0444); MODULE_PARM_DESC(torture_type, - "Type of lock to torture (spin_lock, spin_lock_irq, ...)"); + "Type of lock to torture (spin_lock, spin_lock_irq, mutex_lock, ...)"); static atomic_t n_lock_torture_errors; @@ -206,6 +207,42 @@ static struct lock_torture_ops spin_lock_irq_ops = { .name = "spin_lock_irq" }; +static DEFINE_MUTEX(torture_mutex); + +static int torture_mutex_lock(void) __acquires(torture_mutex) +{ + mutex_lock(&torture_mutex); + return 0; +} + +static void torture_mutex_delay(struct torture_random_state *trsp) +{ + const unsigned long longdelay_ms = 100; + + /* We want a long delay occasionally to force massive contention. */ + if (!(torture_random(trsp) % + (nrealwriters_stress * 2000 * longdelay_ms))) + mdelay(longdelay_ms * 5); + else + mdelay(longdelay_ms / 5); +#ifdef CONFIG_PREEMPT + if (!(torture_random(trsp) % (nrealwriters_stress * 20000))) + preempt_schedule(); /* Allow test to be preempted. */ +#endif +} + +static void torture_mutex_unlock(void) __releases(torture_mutex) +{ + mutex_unlock(&torture_mutex); +} + +static struct lock_torture_ops mutex_lock_ops = { + .writelock = torture_mutex_lock, + .write_delay = torture_mutex_delay, + .writeunlock = torture_mutex_unlock, + .name = "mutex_lock" +}; + /* * Lock torture writer kthread. Repeatedly acquires and releases * the lock, checking for duplicate acquisitions. @@ -352,7 +389,7 @@ static int __init lock_torture_init(void) int i; int firsterr = 0; static struct lock_torture_ops *torture_ops[] = { - &lock_busted_ops, &spin_lock_ops, &spin_lock_irq_ops, + &lock_busted_ops, &spin_lock_ops, &spin_lock_irq_ops, &mutex_lock_ops, }; if (!torture_init_begin(torture_type, verbose, &torture_runnable)) -- cgit v0.10.2 From f095bfc0ea04829d6962edaf06a5c56e0c251f5b Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Thu, 11 Sep 2014 20:40:19 -0700 Subject: locktorture: Teach about lock debugging Regular locks are very different than locks with debugging. For instance for mutexes, debugging forces to only take the slowpaths. As such, the locktorture module should take this into account when printing related information -- specifically when printing user passed parameters, it seems the right place for such info. Signed-off-by: Davidlohr Bueso Signed-off-by: Paul E. McKenney diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index 414ba45..a6049fa 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -64,6 +64,7 @@ torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable"); torture_param(bool, verbose, true, "Enable verbose debugging printk()s"); +static bool debug_lock = false; static char *torture_type = "spin_lock"; module_param(torture_type, charp, 0444); MODULE_PARM_DESC(torture_type, @@ -349,8 +350,9 @@ lock_torture_print_module_parms(struct lock_torture_ops *cur_ops, const char *tag) { pr_alert("%s" TORTURE_FLAG - "--- %s: nwriters_stress=%d stat_interval=%d verbose=%d shuffle_interval=%d stutter=%d shutdown_secs=%d onoff_interval=%d onoff_holdoff=%d\n", - torture_type, tag, nrealwriters_stress, stat_interval, verbose, + "--- %s%s: nwriters_stress=%d stat_interval=%d verbose=%d shuffle_interval=%d stutter=%d shutdown_secs=%d onoff_interval=%d onoff_holdoff=%d\n", + torture_type, tag, debug_lock ? " [debug]": "", + nrealwriters_stress, stat_interval, verbose, shuffle_interval, stutter, shutdown_secs, onoff_interval, onoff_holdoff); } @@ -418,6 +420,15 @@ static int __init lock_torture_init(void) nrealwriters_stress = nwriters_stress; else nrealwriters_stress = 2 * num_online_cpus(); + +#ifdef CONFIG_DEBUG_MUTEXES + if (strncmp(torture_type, "mutex", 5) == 0) + debug_lock = true; +#endif +#ifdef CONFIG_DEBUG_SPINLOCK + if (strncmp(torture_type, "spin", 4) == 0) + debug_lock = true; +#endif lock_torture_print_module_parms(cur_ops, "Start of test"); /* Initialize the statistics so that each run gets its own numbers. */ -- cgit v0.10.2 From 1e6757a92189278c484799ea98fc69bdc528940e Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Thu, 11 Sep 2014 20:40:20 -0700 Subject: locktorture: Make statistics generic The statistics structure can serve well for both reader and writer locks, thus simply rename some fields that mention 'write' and leave the declaration of lwsa. Signed-off-by: Davidlohr Bueso Signed-off-by: Paul E. McKenney diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index a6049fa..de703a7 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -78,11 +78,11 @@ static struct task_struct **writer_tasks; static int nrealwriters_stress; static bool lock_is_write_held; -struct lock_writer_stress_stats { - long n_write_lock_fail; - long n_write_lock_acquired; +struct lock_stress_stats { + long n_lock_fail; + long n_lock_acquired; }; -static struct lock_writer_stress_stats *lwsa; +static struct lock_stress_stats *lwsa; /* writer statistics */ #if defined(MODULE) #define LOCKTORTURE_RUNNABLE_INIT 1 @@ -250,7 +250,7 @@ static struct lock_torture_ops mutex_lock_ops = { */ static int lock_torture_writer(void *arg) { - struct lock_writer_stress_stats *lwsp = arg; + struct lock_stress_stats *lwsp = arg; static DEFINE_TORTURE_RANDOM(rand); VERBOSE_TOROUT_STRING("lock_torture_writer task started"); @@ -261,9 +261,9 @@ static int lock_torture_writer(void *arg) schedule_timeout_uninterruptible(1); cur_ops->writelock(); if (WARN_ON_ONCE(lock_is_write_held)) - lwsp->n_write_lock_fail++; + lwsp->n_lock_fail++; lock_is_write_held = 1; - lwsp->n_write_lock_acquired++; + lwsp->n_lock_acquired++; cur_ops->write_delay(&rand); lock_is_write_held = 0; cur_ops->writeunlock(); @@ -281,17 +281,17 @@ static void lock_torture_printk(char *page) bool fail = 0; int i; long max = 0; - long min = lwsa[0].n_write_lock_acquired; + long min = lwsa[0].n_lock_acquired; long long sum = 0; for (i = 0; i < nrealwriters_stress; i++) { - if (lwsa[i].n_write_lock_fail) + if (lwsa[i].n_lock_fail) fail = true; - sum += lwsa[i].n_write_lock_acquired; - if (max < lwsa[i].n_write_lock_fail) - max = lwsa[i].n_write_lock_fail; - if (min > lwsa[i].n_write_lock_fail) - min = lwsa[i].n_write_lock_fail; + sum += lwsa[i].n_lock_acquired; + if (max < lwsa[i].n_lock_fail) + max = lwsa[i].n_lock_fail; + if (min > lwsa[i].n_lock_fail) + min = lwsa[i].n_lock_fail; } page += sprintf(page, "%s%s ", torture_type, TORTURE_FLAG); page += sprintf(page, @@ -441,8 +441,8 @@ static int __init lock_torture_init(void) goto unwind; } for (i = 0; i < nrealwriters_stress; i++) { - lwsa[i].n_write_lock_fail = 0; - lwsa[i].n_write_lock_acquired = 0; + lwsa[i].n_lock_fail = 0; + lwsa[i].n_lock_acquired = 0; } /* Start up the kthreads. */ -- cgit v0.10.2 From d36a7a0d5e8b5bff1671723d733eb61621b0cee4 Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Thu, 11 Sep 2014 20:40:21 -0700 Subject: torture: Address race in module cleanup When performing module cleanups by calling torture_cleanup() the 'torture_type' string in nullified However, callers are not necessarily done, and might still need to reference the variable. This impacts both rcutorture and locktorture, causing printing things like: [ 94.226618] (null)-torture: Stopping lock_torture_writer task [ 94.226624] (null)-torture: Stopping lock_torture_stats task Thus delay this operation until the very end of the cleanup process. The consequence (which shouldn't matter for this kid of program) is, of course, that we delay the window between rmmod and modprobing, for instance in module_torture_begin(). Signed-off-by: Davidlohr Bueso Signed-off-by: Paul E. McKenney diff --git a/include/linux/torture.h b/include/linux/torture.h index fec46f8..7759fc3 100644 --- a/include/linux/torture.h +++ b/include/linux/torture.h @@ -77,7 +77,8 @@ int torture_stutter_init(int s); /* Initialization and cleanup. */ bool torture_init_begin(char *ttype, bool v, int *runnable); void torture_init_end(void); -bool torture_cleanup(void); +bool torture_cleanup_begin(void); +void torture_cleanup_end(void); bool torture_must_stop(void); bool torture_must_stop_irq(void); void torture_kthread_stopping(char *title); diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index de703a7..988267c 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -361,7 +361,7 @@ static void lock_torture_cleanup(void) { int i; - if (torture_cleanup()) + if (torture_cleanup_begin()) return; if (writer_tasks) { @@ -384,6 +384,7 @@ static void lock_torture_cleanup(void) else lock_torture_print_module_parms(cur_ops, "End of test: SUCCESS"); + torture_cleanup_end(); } static int __init lock_torture_init(void) diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 6d15095..04c4b5a 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -1514,7 +1514,7 @@ rcu_torture_cleanup(void) int i; rcutorture_record_test_transition(); - if (torture_cleanup()) { + if (torture_cleanup_begin()) { if (cur_ops->cb_barrier != NULL) cur_ops->cb_barrier(); return; @@ -1566,6 +1566,7 @@ rcu_torture_cleanup(void) "End of test: RCU_HOTPLUG"); else rcu_torture_print_module_parms(cur_ops, "End of test: SUCCESS"); + torture_cleanup_end(); } #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD diff --git a/kernel/torture.c b/kernel/torture.c index ede8b25..dd70993 100644 --- a/kernel/torture.c +++ b/kernel/torture.c @@ -633,8 +633,13 @@ EXPORT_SYMBOL_GPL(torture_init_end); * * This must be called before the caller starts shutting down its own * kthreads. + * + * Both torture_cleanup_begin() and torture_cleanup_end() must be paired, + * in order to correctly perform the cleanup. They are separated because + * threads can still need to reference the torture_type type, thus nullify + * only after completing all other relevant calls. */ -bool torture_cleanup(void) +bool torture_cleanup_begin(void) { mutex_lock(&fullstop_mutex); if (ACCESS_ONCE(fullstop) == FULLSTOP_SHUTDOWN) { @@ -649,12 +654,17 @@ bool torture_cleanup(void) torture_shuffle_cleanup(); torture_stutter_cleanup(); torture_onoff_cleanup(); + return false; +} +EXPORT_SYMBOL_GPL(torture_cleanup_begin); + +void torture_cleanup_end(void) +{ mutex_lock(&fullstop_mutex); torture_type = NULL; mutex_unlock(&fullstop_mutex); - return false; } -EXPORT_SYMBOL_GPL(torture_cleanup); +EXPORT_SYMBOL_GPL(torture_cleanup_end); /* * Is it time for the current torture test to stop? -- cgit v0.10.2 From 4f6332c1dce9c64ef6bf93842067250dd850e482 Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Thu, 11 Sep 2014 21:40:41 -0700 Subject: locktorture: Add infrastructure for torturing read locks Most of it is based on what we already have for writers. This allows readers to be very independent (and thus configurable), enabling future module parameters to control things such as rw distribution. Furthermore, readers have their own delaying function, allowing us to test different rw critical region latencies, and stress locking internals. Similarly, statistics, for now will only serve for the number of lock acquisitions -- as opposed to writers, readers have no failure detection. In addition, introduce a new nreaders_stress module parameter. The default number of readers will be the same number of writers threads. Writer threads are interleaved with readers. Documentation is updated, respectively. Signed-off-by: Davidlohr Bueso Signed-off-by: Paul E. McKenney diff --git a/Documentation/locking/locktorture.txt b/Documentation/locking/locktorture.txt index f2a905b..7a72621 100644 --- a/Documentation/locking/locktorture.txt +++ b/Documentation/locking/locktorture.txt @@ -29,6 +29,11 @@ nwriters_stress Number of kernel threads that will stress exclusive lock ownership (writers). The default value is twice the number of online CPUs. +nreaders_stress Number of kernel threads that will stress shared lock + ownership (readers). The default is the same amount of writer + locks. If the user did not specify nwriters_stress, then + both readers and writers be the amount of online CPUs. + torture_type Type of lock to torture. By default, only spinlocks will be tortured. This module can torture the following locks, with string values as follows: @@ -97,15 +102,18 @@ STATISTICS Statistics are printed in the following format: spin_lock-torture: Writes: Total: 93746064 Max/Min: 0/0 Fail: 0 - (A) (B) (C) (D) + (A) (B) (C) (D) (E) (A): Lock type that is being tortured -- torture_type parameter. -(B): Number of times the lock was acquired. +(B): Number of writer lock acquisitions. If dealing with a read/write primitive + a second "Reads" statistics line is printed. + +(C): Number of times the lock was acquired. -(C): Min and max number of times threads failed to acquire the lock. +(D): Min and max number of times threads failed to acquire the lock. -(D): true/false values if there were errors acquiring the lock. This should +(E): true/false values if there were errors acquiring the lock. This should -only- be positive if there is a bug in the locking primitive's implementation. Otherwise a lock should never fail (i.e., spin_lock()). Of course, the same applies for (C), above. A dummy example of this is diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index 988267c..c1073d7 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -52,6 +52,8 @@ MODULE_AUTHOR("Paul E. McKenney "); torture_param(int, nwriters_stress, -1, "Number of write-locking stress-test threads"); +torture_param(int, nreaders_stress, -1, + "Number of read-locking stress-test threads"); torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)"); torture_param(int, onoff_interval, 0, "Time between CPU hotplugs (s), 0=disable"); @@ -74,15 +76,19 @@ static atomic_t n_lock_torture_errors; static struct task_struct *stats_task; static struct task_struct **writer_tasks; +static struct task_struct **reader_tasks; static int nrealwriters_stress; static bool lock_is_write_held; +static int nrealreaders_stress; +static bool lock_is_read_held; struct lock_stress_stats { long n_lock_fail; long n_lock_acquired; }; static struct lock_stress_stats *lwsa; /* writer statistics */ +static struct lock_stress_stats *lrsa; /* reader statistics */ #if defined(MODULE) #define LOCKTORTURE_RUNNABLE_INIT 1 @@ -104,6 +110,9 @@ struct lock_torture_ops { int (*writelock)(void); void (*write_delay)(struct torture_random_state *trsp); void (*writeunlock)(void); + int (*readlock)(void); + void (*read_delay)(struct torture_random_state *trsp); + void (*readunlock)(void); unsigned long flags; const char *name; }; @@ -142,6 +151,9 @@ static struct lock_torture_ops lock_busted_ops = { .writelock = torture_lock_busted_write_lock, .write_delay = torture_lock_busted_write_delay, .writeunlock = torture_lock_busted_write_unlock, + .readlock = NULL, + .read_delay = NULL, + .readunlock = NULL, .name = "lock_busted" }; @@ -182,6 +194,9 @@ static struct lock_torture_ops spin_lock_ops = { .writelock = torture_spin_lock_write_lock, .write_delay = torture_spin_lock_write_delay, .writeunlock = torture_spin_lock_write_unlock, + .readlock = NULL, + .read_delay = NULL, + .readunlock = NULL, .name = "spin_lock" }; @@ -205,6 +220,9 @@ static struct lock_torture_ops spin_lock_irq_ops = { .writelock = torture_spin_lock_write_lock_irq, .write_delay = torture_spin_lock_write_delay, .writeunlock = torture_lock_spin_write_unlock_irq, + .readlock = NULL, + .read_delay = NULL, + .readunlock = NULL, .name = "spin_lock_irq" }; @@ -241,6 +259,9 @@ static struct lock_torture_ops mutex_lock_ops = { .writelock = torture_mutex_lock, .write_delay = torture_mutex_delay, .writeunlock = torture_mutex_unlock, + .readlock = NULL, + .read_delay = NULL, + .readunlock = NULL, .name = "mutex_lock" }; @@ -274,28 +295,57 @@ static int lock_torture_writer(void *arg) } /* + * Lock torture reader kthread. Repeatedly acquires and releases + * the reader lock. + */ +static int lock_torture_reader(void *arg) +{ + struct lock_stress_stats *lrsp = arg; + static DEFINE_TORTURE_RANDOM(rand); + + VERBOSE_TOROUT_STRING("lock_torture_reader task started"); + set_user_nice(current, MAX_NICE); + + do { + if ((torture_random(&rand) & 0xfffff) == 0) + schedule_timeout_uninterruptible(1); + cur_ops->readlock(); + lock_is_read_held = 1; + lrsp->n_lock_acquired++; + cur_ops->read_delay(&rand); + lock_is_read_held = 0; + cur_ops->readunlock(); + stutter_wait("lock_torture_reader"); + } while (!torture_must_stop()); + torture_kthread_stopping("lock_torture_reader"); + return 0; +} + +/* * Create an lock-torture-statistics message in the specified buffer. */ -static void lock_torture_printk(char *page) +static void __torture_print_stats(char *page, + struct lock_stress_stats *statp, bool write) { bool fail = 0; - int i; + int i, n_stress; long max = 0; - long min = lwsa[0].n_lock_acquired; + long min = statp[0].n_lock_acquired; long long sum = 0; - for (i = 0; i < nrealwriters_stress; i++) { - if (lwsa[i].n_lock_fail) + n_stress = write ? nrealwriters_stress : nrealreaders_stress; + for (i = 0; i < n_stress; i++) { + if (statp[i].n_lock_fail) fail = true; - sum += lwsa[i].n_lock_acquired; - if (max < lwsa[i].n_lock_fail) - max = lwsa[i].n_lock_fail; - if (min > lwsa[i].n_lock_fail) - min = lwsa[i].n_lock_fail; + sum += statp[i].n_lock_acquired; + if (max < statp[i].n_lock_fail) + max = statp[i].n_lock_fail; + if (min > statp[i].n_lock_fail) + min = statp[i].n_lock_fail; } - page += sprintf(page, "%s%s ", torture_type, TORTURE_FLAG); page += sprintf(page, - "Writes: Total: %lld Max/Min: %ld/%ld %s Fail: %d %s\n", + "%s: Total: %lld Max/Min: %ld/%ld %s Fail: %d %s\n", + write ? "Writes" : "Reads ", sum, max, min, max / 2 > min ? "???" : "", fail, fail ? "!!!" : ""); if (fail) @@ -315,15 +365,32 @@ static void lock_torture_stats_print(void) int size = nrealwriters_stress * 200 + 8192; char *buf; + if (cur_ops->readlock) + size += nrealreaders_stress * 200 + 8192; + buf = kmalloc(size, GFP_KERNEL); if (!buf) { pr_err("lock_torture_stats_print: Out of memory, need: %d", size); return; } - lock_torture_printk(buf); + + __torture_print_stats(buf, lwsa, true); pr_alert("%s", buf); kfree(buf); + + if (cur_ops->readlock) { + buf = kmalloc(size, GFP_KERNEL); + if (!buf) { + pr_err("lock_torture_stats_print: Out of memory, need: %d", + size); + return; + } + + __torture_print_stats(buf, lrsa, false); + pr_alert("%s", buf); + kfree(buf); + } } /* @@ -350,10 +417,10 @@ lock_torture_print_module_parms(struct lock_torture_ops *cur_ops, const char *tag) { pr_alert("%s" TORTURE_FLAG - "--- %s%s: nwriters_stress=%d stat_interval=%d verbose=%d shuffle_interval=%d stutter=%d shutdown_secs=%d onoff_interval=%d onoff_holdoff=%d\n", + "--- %s%s: nwriters_stress=%d nreaders_stress=%d stat_interval=%d verbose=%d shuffle_interval=%d stutter=%d shutdown_secs=%d onoff_interval=%d onoff_holdoff=%d\n", torture_type, tag, debug_lock ? " [debug]": "", - nrealwriters_stress, stat_interval, verbose, - shuffle_interval, stutter, shutdown_secs, + nrealwriters_stress, nrealreaders_stress, stat_interval, + verbose, shuffle_interval, stutter, shutdown_secs, onoff_interval, onoff_holdoff); } @@ -372,6 +439,14 @@ static void lock_torture_cleanup(void) writer_tasks = NULL; } + if (reader_tasks) { + for (i = 0; i < nrealreaders_stress; i++) + torture_stop_kthread(lock_torture_reader, + reader_tasks[i]); + kfree(reader_tasks); + reader_tasks = NULL; + } + torture_stop_kthread(lock_torture_stats, stats_task); lock_torture_stats_print(); /* -After- the stats thread is stopped! */ @@ -389,7 +464,7 @@ static void lock_torture_cleanup(void) static int __init lock_torture_init(void) { - int i; + int i, j; int firsterr = 0; static struct lock_torture_ops *torture_ops[] = { &lock_busted_ops, &spin_lock_ops, &spin_lock_irq_ops, &mutex_lock_ops, @@ -430,7 +505,6 @@ static int __init lock_torture_init(void) if (strncmp(torture_type, "spin", 4) == 0) debug_lock = true; #endif - lock_torture_print_module_parms(cur_ops, "Start of test"); /* Initialize the statistics so that each run gets its own numbers. */ @@ -446,8 +520,37 @@ static int __init lock_torture_init(void) lwsa[i].n_lock_acquired = 0; } - /* Start up the kthreads. */ + if (cur_ops->readlock) { + if (nreaders_stress >= 0) + nrealreaders_stress = nreaders_stress; + else { + /* + * By default distribute evenly the number of + * readers and writers. We still run the same number + * of threads as the writer-only locks default. + */ + if (nwriters_stress < 0) /* user doesn't care */ + nrealwriters_stress = num_online_cpus(); + nrealreaders_stress = nrealwriters_stress; + } + + lock_is_read_held = 0; + lrsa = kmalloc(sizeof(*lrsa) * nrealreaders_stress, GFP_KERNEL); + if (lrsa == NULL) { + VERBOSE_TOROUT_STRING("lrsa: Out of memory"); + firsterr = -ENOMEM; + kfree(lwsa); + goto unwind; + } + for (i = 0; i < nrealreaders_stress; i++) { + lrsa[i].n_lock_fail = 0; + lrsa[i].n_lock_acquired = 0; + } + } + lock_torture_print_module_parms(cur_ops, "Start of test"); + + /* Prepare torture context. */ if (onoff_interval > 0) { firsterr = torture_onoff_init(onoff_holdoff * HZ, onoff_interval * HZ); @@ -478,11 +581,44 @@ static int __init lock_torture_init(void) firsterr = -ENOMEM; goto unwind; } - for (i = 0; i < nrealwriters_stress; i++) { + + if (cur_ops->readlock) { + reader_tasks = kzalloc(nrealreaders_stress * sizeof(reader_tasks[0]), + GFP_KERNEL); + if (reader_tasks == NULL) { + VERBOSE_TOROUT_ERRSTRING("reader_tasks: Out of memory"); + firsterr = -ENOMEM; + goto unwind; + } + } + + /* + * Create the kthreads and start torturing (oh, those poor little locks). + * + * TODO: Note that we interleave writers with readers, giving writers a + * slight advantage, by creating its kthread first. This can be modified + * for very specific needs, or even let the user choose the policy, if + * ever wanted. + */ + for (i = 0, j = 0; i < nrealwriters_stress || + j < nrealreaders_stress; i++, j++) { + if (i >= nrealwriters_stress) + goto create_reader; + + /* Create writer. */ firsterr = torture_create_kthread(lock_torture_writer, &lwsa[i], writer_tasks[i]); if (firsterr) goto unwind; + + create_reader: + if (cur_ops->readlock == NULL || (j >= nrealreaders_stress)) + continue; + /* Create reader. */ + firsterr = torture_create_kthread(lock_torture_reader, &lrsa[j], + reader_tasks[j]); + if (firsterr) + goto unwind; } if (stat_interval > 0) { firsterr = torture_create_kthread(lock_torture_stats, NULL, -- cgit v0.10.2 From 4a3b427f0b27c7e15edfa607524ff012a155337a Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Thu, 11 Sep 2014 21:41:30 -0700 Subject: locktorture: Support rwsems We can easily do so with our new reader lock support. Just an arbitrary design default: readers have higher (5x) critical region latencies than writers: 50 ms and 10 ms, respectively. Signed-off-by: Davidlohr Bueso Signed-off-by: Paul E. McKenney diff --git a/Documentation/locking/locktorture.txt b/Documentation/locking/locktorture.txt index 7a72621..be71501 100644 --- a/Documentation/locking/locktorture.txt +++ b/Documentation/locking/locktorture.txt @@ -47,6 +47,8 @@ torture_type Type of lock to torture. By default, only spinlocks will o "mutex_lock": mutex_lock() and mutex_unlock() pairs. + o "rwsem_lock": read/write down() and up() semaphore pairs. + torture_runnable Start locktorture at boot time in the case where the module is built into the kernel, otherwise wait for torture_runnable to be set via sysfs before starting. diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index c1073d7..8480118 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -265,6 +265,71 @@ static struct lock_torture_ops mutex_lock_ops = { .name = "mutex_lock" }; +static DECLARE_RWSEM(torture_rwsem); +static int torture_rwsem_down_write(void) __acquires(torture_rwsem) +{ + down_write(&torture_rwsem); + return 0; +} + +static void torture_rwsem_write_delay(struct torture_random_state *trsp) +{ + const unsigned long longdelay_ms = 100; + + /* We want a long delay occasionally to force massive contention. */ + if (!(torture_random(trsp) % + (nrealwriters_stress * 2000 * longdelay_ms))) + mdelay(longdelay_ms * 10); + else + mdelay(longdelay_ms / 10); +#ifdef CONFIG_PREEMPT + if (!(torture_random(trsp) % (nrealwriters_stress * 20000))) + preempt_schedule(); /* Allow test to be preempted. */ +#endif +} + +static void torture_rwsem_up_write(void) __releases(torture_rwsem) +{ + up_write(&torture_rwsem); +} + +static int torture_rwsem_down_read(void) __acquires(torture_rwsem) +{ + down_read(&torture_rwsem); + return 0; +} + +static void torture_rwsem_read_delay(struct torture_random_state *trsp) +{ + const unsigned long longdelay_ms = 100; + + /* We want a long delay occasionally to force massive contention. */ + if (!(torture_random(trsp) % + (nrealwriters_stress * 2000 * longdelay_ms))) + mdelay(longdelay_ms * 2); + else + mdelay(longdelay_ms / 2); +#ifdef CONFIG_PREEMPT + if (!(torture_random(trsp) % (nrealreaders_stress * 20000))) + preempt_schedule(); /* Allow test to be preempted. */ +#endif +} + +static void torture_rwsem_up_read(void) __releases(torture_rwsem) +{ + up_read(&torture_rwsem); +} + +static struct lock_torture_ops rwsem_lock_ops = { + .writelock = torture_rwsem_down_write, + .write_delay = torture_rwsem_write_delay, + .writeunlock = torture_rwsem_up_write, + .readlock = torture_rwsem_down_read, + .read_delay = torture_rwsem_read_delay, + .readunlock = torture_rwsem_up_read, + .name = "rwsem_lock" +}; + /* * Lock torture writer kthread. Repeatedly acquires and releases * the lock, checking for duplicate acquisitions. @@ -467,7 +532,8 @@ static int __init lock_torture_init(void) int i, j; int firsterr = 0; static struct lock_torture_ops *torture_ops[] = { - &lock_busted_ops, &spin_lock_ops, &spin_lock_irq_ops, &mutex_lock_ops, + &lock_busted_ops, &spin_lock_ops, &spin_lock_irq_ops, + &mutex_lock_ops, &rwsem_lock_ops, }; if (!torture_init_begin(torture_type, verbose, &torture_runnable)) -- cgit v0.10.2 From 630952c22b04ada7e88ad93b87ad893cd818cc6b Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Thu, 11 Sep 2014 21:42:25 -0700 Subject: locktorture: Introduce torture context The amount of global variables is getting pretty ugly. Group variables related to the execution (ie: not parameters) in a new context structure. Signed-off-by: Davidlohr Bueso Signed-off-by: Paul E. McKenney diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index 8480118..540d5df 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -66,29 +66,22 @@ torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable"); torture_param(bool, verbose, true, "Enable verbose debugging printk()s"); -static bool debug_lock = false; static char *torture_type = "spin_lock"; module_param(torture_type, charp, 0444); MODULE_PARM_DESC(torture_type, "Type of lock to torture (spin_lock, spin_lock_irq, mutex_lock, ...)"); -static atomic_t n_lock_torture_errors; - static struct task_struct *stats_task; static struct task_struct **writer_tasks; static struct task_struct **reader_tasks; -static int nrealwriters_stress; static bool lock_is_write_held; -static int nrealreaders_stress; static bool lock_is_read_held; struct lock_stress_stats { long n_lock_fail; long n_lock_acquired; }; -static struct lock_stress_stats *lwsa; /* writer statistics */ -static struct lock_stress_stats *lrsa; /* reader statistics */ #if defined(MODULE) #define LOCKTORTURE_RUNNABLE_INIT 1 @@ -117,8 +110,18 @@ struct lock_torture_ops { const char *name; }; -static struct lock_torture_ops *cur_ops; - +struct lock_torture_cxt { + int nrealwriters_stress; + int nrealreaders_stress; + bool debug_lock; + atomic_t n_lock_torture_errors; + struct lock_torture_ops *cur_ops; + struct lock_stress_stats *lwsa; /* writer statistics */ + struct lock_stress_stats *lrsa; /* reader statistics */ +}; +static struct lock_torture_cxt cxt = { 0, 0, false, + ATOMIC_INIT(0), + NULL, NULL}; /* * Definitions for lock torture testing. */ @@ -134,10 +137,10 @@ static void torture_lock_busted_write_delay(struct torture_random_state *trsp) /* We want a long delay occasionally to force massive contention. */ if (!(torture_random(trsp) % - (nrealwriters_stress * 2000 * longdelay_us))) + (cxt.nrealwriters_stress * 2000 * longdelay_us))) mdelay(longdelay_us); #ifdef CONFIG_PREEMPT - if (!(torture_random(trsp) % (nrealwriters_stress * 20000))) + if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) preempt_schedule(); /* Allow test to be preempted. */ #endif } @@ -174,13 +177,13 @@ static void torture_spin_lock_write_delay(struct torture_random_state *trsp) * we want a long delay occasionally to force massive contention. */ if (!(torture_random(trsp) % - (nrealwriters_stress * 2000 * longdelay_us))) + (cxt.nrealwriters_stress * 2000 * longdelay_us))) mdelay(longdelay_us); if (!(torture_random(trsp) % - (nrealwriters_stress * 2 * shortdelay_us))) + (cxt.nrealwriters_stress * 2 * shortdelay_us))) udelay(shortdelay_us); #ifdef CONFIG_PREEMPT - if (!(torture_random(trsp) % (nrealwriters_stress * 20000))) + if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) preempt_schedule(); /* Allow test to be preempted. */ #endif } @@ -206,14 +209,14 @@ __acquires(torture_spinlock_irq) unsigned long flags; spin_lock_irqsave(&torture_spinlock, flags); - cur_ops->flags = flags; + cxt.cur_ops->flags = flags; return 0; } static void torture_lock_spin_write_unlock_irq(void) __releases(torture_spinlock) { - spin_unlock_irqrestore(&torture_spinlock, cur_ops->flags); + spin_unlock_irqrestore(&torture_spinlock, cxt.cur_ops->flags); } static struct lock_torture_ops spin_lock_irq_ops = { @@ -240,12 +243,12 @@ static void torture_mutex_delay(struct torture_random_state *trsp) /* We want a long delay occasionally to force massive contention. */ if (!(torture_random(trsp) % - (nrealwriters_stress * 2000 * longdelay_ms))) + (cxt.nrealwriters_stress * 2000 * longdelay_ms))) mdelay(longdelay_ms * 5); else mdelay(longdelay_ms / 5); #ifdef CONFIG_PREEMPT - if (!(torture_random(trsp) % (nrealwriters_stress * 20000))) + if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) preempt_schedule(); /* Allow test to be preempted. */ #endif } @@ -278,12 +281,12 @@ static void torture_rwsem_write_delay(struct torture_random_state *trsp) /* We want a long delay occasionally to force massive contention. */ if (!(torture_random(trsp) % - (nrealwriters_stress * 2000 * longdelay_ms))) + (cxt.nrealwriters_stress * 2000 * longdelay_ms))) mdelay(longdelay_ms * 10); else mdelay(longdelay_ms / 10); #ifdef CONFIG_PREEMPT - if (!(torture_random(trsp) % (nrealwriters_stress * 20000))) + if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) preempt_schedule(); /* Allow test to be preempted. */ #endif } @@ -305,12 +308,12 @@ static void torture_rwsem_read_delay(struct torture_random_state *trsp) /* We want a long delay occasionally to force massive contention. */ if (!(torture_random(trsp) % - (nrealwriters_stress * 2000 * longdelay_ms))) + (cxt.nrealwriters_stress * 2000 * longdelay_ms))) mdelay(longdelay_ms * 2); else mdelay(longdelay_ms / 2); #ifdef CONFIG_PREEMPT - if (!(torture_random(trsp) % (nrealreaders_stress * 20000))) + if (!(torture_random(trsp) % (cxt.nrealreaders_stress * 20000))) preempt_schedule(); /* Allow test to be preempted. */ #endif } @@ -345,14 +348,14 @@ static int lock_torture_writer(void *arg) do { if ((torture_random(&rand) & 0xfffff) == 0) schedule_timeout_uninterruptible(1); - cur_ops->writelock(); + cxt.cur_ops->writelock(); if (WARN_ON_ONCE(lock_is_write_held)) lwsp->n_lock_fail++; lock_is_write_held = 1; lwsp->n_lock_acquired++; - cur_ops->write_delay(&rand); + cxt.cur_ops->write_delay(&rand); lock_is_write_held = 0; - cur_ops->writeunlock(); + cxt.cur_ops->writeunlock(); stutter_wait("lock_torture_writer"); } while (!torture_must_stop()); torture_kthread_stopping("lock_torture_writer"); @@ -374,12 +377,12 @@ static int lock_torture_reader(void *arg) do { if ((torture_random(&rand) & 0xfffff) == 0) schedule_timeout_uninterruptible(1); - cur_ops->readlock(); + cxt.cur_ops->readlock(); lock_is_read_held = 1; lrsp->n_lock_acquired++; - cur_ops->read_delay(&rand); + cxt.cur_ops->read_delay(&rand); lock_is_read_held = 0; - cur_ops->readunlock(); + cxt.cur_ops->readunlock(); stutter_wait("lock_torture_reader"); } while (!torture_must_stop()); torture_kthread_stopping("lock_torture_reader"); @@ -398,7 +401,7 @@ static void __torture_print_stats(char *page, long min = statp[0].n_lock_acquired; long long sum = 0; - n_stress = write ? nrealwriters_stress : nrealreaders_stress; + n_stress = write ? cxt.nrealwriters_stress : cxt.nrealreaders_stress; for (i = 0; i < n_stress; i++) { if (statp[i].n_lock_fail) fail = true; @@ -414,7 +417,7 @@ static void __torture_print_stats(char *page, sum, max, min, max / 2 > min ? "???" : "", fail, fail ? "!!!" : ""); if (fail) - atomic_inc(&n_lock_torture_errors); + atomic_inc(&cxt.n_lock_torture_errors); } /* @@ -427,11 +430,11 @@ static void __torture_print_stats(char *page, */ static void lock_torture_stats_print(void) { - int size = nrealwriters_stress * 200 + 8192; + int size = cxt.nrealwriters_stress * 200 + 8192; char *buf; - if (cur_ops->readlock) - size += nrealreaders_stress * 200 + 8192; + if (cxt.cur_ops->readlock) + size += cxt.nrealreaders_stress * 200 + 8192; buf = kmalloc(size, GFP_KERNEL); if (!buf) { @@ -440,11 +443,11 @@ static void lock_torture_stats_print(void) return; } - __torture_print_stats(buf, lwsa, true); + __torture_print_stats(buf, cxt.lwsa, true); pr_alert("%s", buf); kfree(buf); - if (cur_ops->readlock) { + if (cxt.cur_ops->readlock) { buf = kmalloc(size, GFP_KERNEL); if (!buf) { pr_err("lock_torture_stats_print: Out of memory, need: %d", @@ -452,7 +455,7 @@ static void lock_torture_stats_print(void) return; } - __torture_print_stats(buf, lrsa, false); + __torture_print_stats(buf, cxt.lrsa, false); pr_alert("%s", buf); kfree(buf); } @@ -483,8 +486,8 @@ lock_torture_print_module_parms(struct lock_torture_ops *cur_ops, { pr_alert("%s" TORTURE_FLAG "--- %s%s: nwriters_stress=%d nreaders_stress=%d stat_interval=%d verbose=%d shuffle_interval=%d stutter=%d shutdown_secs=%d onoff_interval=%d onoff_holdoff=%d\n", - torture_type, tag, debug_lock ? " [debug]": "", - nrealwriters_stress, nrealreaders_stress, stat_interval, + torture_type, tag, cxt.debug_lock ? " [debug]": "", + cxt.nrealwriters_stress, cxt.nrealreaders_stress, stat_interval, verbose, shuffle_interval, stutter, shutdown_secs, onoff_interval, onoff_holdoff); } @@ -497,7 +500,7 @@ static void lock_torture_cleanup(void) return; if (writer_tasks) { - for (i = 0; i < nrealwriters_stress; i++) + for (i = 0; i < cxt.nrealwriters_stress; i++) torture_stop_kthread(lock_torture_writer, writer_tasks[i]); kfree(writer_tasks); @@ -505,7 +508,7 @@ static void lock_torture_cleanup(void) } if (reader_tasks) { - for (i = 0; i < nrealreaders_stress; i++) + for (i = 0; i < cxt.nrealreaders_stress; i++) torture_stop_kthread(lock_torture_reader, reader_tasks[i]); kfree(reader_tasks); @@ -515,14 +518,14 @@ static void lock_torture_cleanup(void) torture_stop_kthread(lock_torture_stats, stats_task); lock_torture_stats_print(); /* -After- the stats thread is stopped! */ - if (atomic_read(&n_lock_torture_errors)) - lock_torture_print_module_parms(cur_ops, + if (atomic_read(&cxt.n_lock_torture_errors)) + lock_torture_print_module_parms(cxt.cur_ops, "End of test: FAILURE"); else if (torture_onoff_failures()) - lock_torture_print_module_parms(cur_ops, + lock_torture_print_module_parms(cxt.cur_ops, "End of test: LOCK_HOTPLUG"); else - lock_torture_print_module_parms(cur_ops, + lock_torture_print_module_parms(cxt.cur_ops, "End of test: SUCCESS"); torture_cleanup_end(); } @@ -541,8 +544,8 @@ static int __init lock_torture_init(void) /* Process args and tell the world that the torturer is on the job. */ for (i = 0; i < ARRAY_SIZE(torture_ops); i++) { - cur_ops = torture_ops[i]; - if (strcmp(torture_type, cur_ops->name) == 0) + cxt.cur_ops = torture_ops[i]; + if (strcmp(torture_type, cxt.cur_ops->name) == 0) break; } if (i == ARRAY_SIZE(torture_ops)) { @@ -555,40 +558,40 @@ static int __init lock_torture_init(void) torture_init_end(); return -EINVAL; } - if (cur_ops->init) - cur_ops->init(); /* no "goto unwind" prior to this point!!! */ + if (cxt.cur_ops->init) + cxt.cur_ops->init(); /* no "goto unwind" prior to this point!!! */ if (nwriters_stress >= 0) - nrealwriters_stress = nwriters_stress; + cxt.nrealwriters_stress = nwriters_stress; else - nrealwriters_stress = 2 * num_online_cpus(); + cxt.nrealwriters_stress = 2 * num_online_cpus(); #ifdef CONFIG_DEBUG_MUTEXES if (strncmp(torture_type, "mutex", 5) == 0) - debug_lock = true; + cxt.debug_lock = true; #endif #ifdef CONFIG_DEBUG_SPINLOCK if (strncmp(torture_type, "spin", 4) == 0) - debug_lock = true; + cxt.debug_lock = true; #endif /* Initialize the statistics so that each run gets its own numbers. */ lock_is_write_held = 0; - lwsa = kmalloc(sizeof(*lwsa) * nrealwriters_stress, GFP_KERNEL); - if (lwsa == NULL) { - VERBOSE_TOROUT_STRING("lwsa: Out of memory"); + cxt.lwsa = kmalloc(sizeof(*cxt.lwsa) * cxt.nrealwriters_stress, GFP_KERNEL); + if (cxt.lwsa == NULL) { + VERBOSE_TOROUT_STRING("cxt.lwsa: Out of memory"); firsterr = -ENOMEM; goto unwind; } - for (i = 0; i < nrealwriters_stress; i++) { - lwsa[i].n_lock_fail = 0; - lwsa[i].n_lock_acquired = 0; + for (i = 0; i < cxt.nrealwriters_stress; i++) { + cxt.lwsa[i].n_lock_fail = 0; + cxt.lwsa[i].n_lock_acquired = 0; } - if (cur_ops->readlock) { + if (cxt.cur_ops->readlock) { if (nreaders_stress >= 0) - nrealreaders_stress = nreaders_stress; + cxt.nrealreaders_stress = nreaders_stress; else { /* * By default distribute evenly the number of @@ -596,25 +599,25 @@ static int __init lock_torture_init(void) * of threads as the writer-only locks default. */ if (nwriters_stress < 0) /* user doesn't care */ - nrealwriters_stress = num_online_cpus(); - nrealreaders_stress = nrealwriters_stress; + cxt.nrealwriters_stress = num_online_cpus(); + cxt.nrealreaders_stress = cxt.nrealwriters_stress; } lock_is_read_held = 0; - lrsa = kmalloc(sizeof(*lrsa) * nrealreaders_stress, GFP_KERNEL); - if (lrsa == NULL) { - VERBOSE_TOROUT_STRING("lrsa: Out of memory"); + cxt.lrsa = kmalloc(sizeof(*cxt.lrsa) * cxt.nrealreaders_stress, GFP_KERNEL); + if (cxt.lrsa == NULL) { + VERBOSE_TOROUT_STRING("cxt.lrsa: Out of memory"); firsterr = -ENOMEM; - kfree(lwsa); + kfree(cxt.lwsa); goto unwind; } - for (i = 0; i < nrealreaders_stress; i++) { - lrsa[i].n_lock_fail = 0; - lrsa[i].n_lock_acquired = 0; + for (i = 0; i < cxt.nrealreaders_stress; i++) { + cxt.lrsa[i].n_lock_fail = 0; + cxt.lrsa[i].n_lock_acquired = 0; } } - lock_torture_print_module_parms(cur_ops, "Start of test"); + lock_torture_print_module_parms(cxt.cur_ops, "Start of test"); /* Prepare torture context. */ if (onoff_interval > 0) { @@ -640,7 +643,7 @@ static int __init lock_torture_init(void) goto unwind; } - writer_tasks = kzalloc(nrealwriters_stress * sizeof(writer_tasks[0]), + writer_tasks = kzalloc(cxt.nrealwriters_stress * sizeof(writer_tasks[0]), GFP_KERNEL); if (writer_tasks == NULL) { VERBOSE_TOROUT_ERRSTRING("writer_tasks: Out of memory"); @@ -648,8 +651,8 @@ static int __init lock_torture_init(void) goto unwind; } - if (cur_ops->readlock) { - reader_tasks = kzalloc(nrealreaders_stress * sizeof(reader_tasks[0]), + if (cxt.cur_ops->readlock) { + reader_tasks = kzalloc(cxt.nrealreaders_stress * sizeof(reader_tasks[0]), GFP_KERNEL); if (reader_tasks == NULL) { VERBOSE_TOROUT_ERRSTRING("reader_tasks: Out of memory"); @@ -666,22 +669,22 @@ static int __init lock_torture_init(void) * for very specific needs, or even let the user choose the policy, if * ever wanted. */ - for (i = 0, j = 0; i < nrealwriters_stress || - j < nrealreaders_stress; i++, j++) { - if (i >= nrealwriters_stress) + for (i = 0, j = 0; i < cxt.nrealwriters_stress || + j < cxt.nrealreaders_stress; i++, j++) { + if (i >= cxt.nrealwriters_stress) goto create_reader; /* Create writer. */ - firsterr = torture_create_kthread(lock_torture_writer, &lwsa[i], + firsterr = torture_create_kthread(lock_torture_writer, &cxt.lwsa[i], writer_tasks[i]); if (firsterr) goto unwind; create_reader: - if (cur_ops->readlock == NULL || (j >= nrealreaders_stress)) + if (cxt.cur_ops->readlock == NULL || (j >= cxt.nrealreaders_stress)) continue; /* Create reader. */ - firsterr = torture_create_kthread(lock_torture_reader, &lrsa[j], + firsterr = torture_create_kthread(lock_torture_reader, &cxt.lrsa[j], reader_tasks[j]); if (firsterr) goto unwind; -- cgit v0.10.2 From 0acf0153169768a5d672fdcb163279bd05f94ef2 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 12 Sep 2014 09:19:29 -0700 Subject: locktorture: Make torture scripting account for new _runnable name Signed-off-by: Paul E. McKenney diff --git a/tools/testing/selftests/rcutorture/configs/lock/ver_functions.sh b/tools/testing/selftests/rcutorture/configs/lock/ver_functions.sh index 9746ea1..252aae6 100644 --- a/tools/testing/selftests/rcutorture/configs/lock/ver_functions.sh +++ b/tools/testing/selftests/rcutorture/configs/lock/ver_functions.sh @@ -38,6 +38,6 @@ per_version_boot_params () { echo $1 `locktorture_param_onoff "$1" "$2"` \ locktorture.stat_interval=15 \ locktorture.shutdown_secs=$3 \ - locktorture.locktorture_runnable=1 \ + locktorture.torture_runnable=1 \ locktorture.verbose=1 } -- cgit v0.10.2 From 862917a52b5f108200c1aa2a4f5a35c9156c84b9 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 12 Sep 2014 09:36:53 -0700 Subject: locktorture: Add test scenario for mutex_lock Signed-off-by: Paul E. McKenney diff --git a/tools/testing/selftests/rcutorture/configs/lock/CFLIST b/tools/testing/selftests/rcutorture/configs/lock/CFLIST index a061b22..901bafd 100644 --- a/tools/testing/selftests/rcutorture/configs/lock/CFLIST +++ b/tools/testing/selftests/rcutorture/configs/lock/CFLIST @@ -1 +1,2 @@ LOCK01 +LOCK02 diff --git a/tools/testing/selftests/rcutorture/configs/lock/LOCK02 b/tools/testing/selftests/rcutorture/configs/lock/LOCK02 new file mode 100644 index 0000000..1d1da14 --- /dev/null +++ b/tools/testing/selftests/rcutorture/configs/lock/LOCK02 @@ -0,0 +1,6 @@ +CONFIG_SMP=y +CONFIG_NR_CPUS=4 +CONFIG_HOTPLUG_CPU=y +CONFIG_PREEMPT_NONE=n +CONFIG_PREEMPT_VOLUNTARY=n +CONFIG_PREEMPT=y diff --git a/tools/testing/selftests/rcutorture/configs/lock/LOCK02.boot b/tools/testing/selftests/rcutorture/configs/lock/LOCK02.boot new file mode 100644 index 0000000..5aa44b4 --- /dev/null +++ b/tools/testing/selftests/rcutorture/configs/lock/LOCK02.boot @@ -0,0 +1 @@ +locktorture.torture_type=mutex_lock -- cgit v0.10.2 From aaa693e3d8030e4cc531c71facb650ae0880f2fb Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 12 Sep 2014 09:41:27 -0700 Subject: locktorture: Add test scenario for rwsem_lock Signed-off-by: Paul E. McKenney diff --git a/tools/testing/selftests/rcutorture/configs/lock/CFLIST b/tools/testing/selftests/rcutorture/configs/lock/CFLIST index 901bafd..6108137 100644 --- a/tools/testing/selftests/rcutorture/configs/lock/CFLIST +++ b/tools/testing/selftests/rcutorture/configs/lock/CFLIST @@ -1,2 +1,3 @@ LOCK01 LOCK02 +LOCK03 diff --git a/tools/testing/selftests/rcutorture/configs/lock/LOCK03 b/tools/testing/selftests/rcutorture/configs/lock/LOCK03 new file mode 100644 index 0000000..1d1da14 --- /dev/null +++ b/tools/testing/selftests/rcutorture/configs/lock/LOCK03 @@ -0,0 +1,6 @@ +CONFIG_SMP=y +CONFIG_NR_CPUS=4 +CONFIG_HOTPLUG_CPU=y +CONFIG_PREEMPT_NONE=n +CONFIG_PREEMPT_VOLUNTARY=n +CONFIG_PREEMPT=y diff --git a/tools/testing/selftests/rcutorture/configs/lock/LOCK03.boot b/tools/testing/selftests/rcutorture/configs/lock/LOCK03.boot new file mode 100644 index 0000000..a67bbe0 --- /dev/null +++ b/tools/testing/selftests/rcutorture/configs/lock/LOCK03.boot @@ -0,0 +1 @@ +locktorture.torture_type=rwsem_lock -- cgit v0.10.2 From 59da22a02032cf1a069ec431f93d403b321ff6b4 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 12 Sep 2014 10:36:15 -0700 Subject: rcutorture: Rename rcutorture_runnable parameter This commit changes rcutorture_runnable to torture_runnable, which is consistent with the names of the other parameters and is a bit shorter as well. Signed-off-by: Paul E. McKenney diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index e1147bc..7aba744 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2938,7 +2938,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. Set time (s) between CPU-hotplug operations, or zero to disable CPU-hotplug testing. - rcutorture.rcutorture_runnable= [BOOT] + rcutorture.torture_runnable= [BOOT] Start rcutorture running at boot time. rcutorture.shuffle_interval= [KNL] diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 5cafd60..a4a819f 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -47,9 +47,6 @@ #include extern int rcu_expedited; /* for sysctl */ -#ifdef CONFIG_RCU_TORTURE_TEST -extern int rcutorture_runnable; /* for sysctl */ -#endif /* #ifdef CONFIG_RCU_TORTURE_TEST */ enum rcutorture_type { RCU_FLAVOR, diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 04c4b5a..240fa90 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -168,9 +168,9 @@ static int rcu_torture_writer_state; #else #define RCUTORTURE_RUNNABLE_INIT 0 #endif -int rcutorture_runnable = RCUTORTURE_RUNNABLE_INIT; -module_param(rcutorture_runnable, int, 0444); -MODULE_PARM_DESC(rcutorture_runnable, "Start rcutorture at boot"); +static int torture_runnable = RCUTORTURE_RUNNABLE_INIT; +module_param(torture_runnable, int, 0444); +MODULE_PARM_DESC(torture_runnable, "Start rcutorture at boot"); #if defined(CONFIG_RCU_BOOST) && !defined(CONFIG_HOTPLUG_CPU) #define rcu_can_boost() 1 @@ -1636,7 +1636,7 @@ rcu_torture_init(void) RCUTORTURE_TASKS_OPS }; - if (!torture_init_begin(torture_type, verbose, &rcutorture_runnable)) + if (!torture_init_begin(torture_type, verbose, &torture_runnable)) return -EBUSY; /* Process args and tell the world that the torturer is on the job. */ diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 75875a7..ab45666 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1055,15 +1055,6 @@ static struct ctl_table kern_table[] = { .child = key_sysctls, }, #endif -#ifdef CONFIG_RCU_TORTURE_TEST - { - .procname = "rcutorture_runnable", - .data = &rcutorture_runnable, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec, - }, -#endif #ifdef CONFIG_PERF_EVENTS /* * User-space scripts rely on the existence of this file diff --git a/tools/testing/selftests/rcutorture/configs/rcu/ver_functions.sh b/tools/testing/selftests/rcutorture/configs/rcu/ver_functions.sh index 8977d8d..ffb85ed 100644 --- a/tools/testing/selftests/rcutorture/configs/rcu/ver_functions.sh +++ b/tools/testing/selftests/rcutorture/configs/rcu/ver_functions.sh @@ -51,7 +51,7 @@ per_version_boot_params () { `rcutorture_param_n_barrier_cbs "$1"` \ rcutorture.stat_interval=15 \ rcutorture.shutdown_secs=$3 \ - rcutorture.rcutorture_runnable=1 \ + rcutorture.torture_runnable=1 \ rcutorture.test_no_idle_hz=1 \ rcutorture.verbose=1 } -- cgit v0.10.2 From ec4518aad8329364af373f4bf7f4eff25a01a339 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 12 Sep 2014 10:50:01 -0700 Subject: locktorture: Document boot/module parameters Signed-off-by: Paul E. McKenney diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 7aba744..e95b0f4 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1704,6 +1704,49 @@ bytes respectively. Such letter suffixes can also be entirely omitted. lockd.nlm_udpport=M [NFS] Assign UDP port. Format: + locktorture.nreaders_stress= [KNL] + Set the number of locking read-acquisition kthreads. + Defaults to being automatically set based on the + number of online CPUs. + + locktorture.nwriters_stress= [KNL] + Set the number of locking write-acquisition kthreads. + + locktorture.onoff_holdoff= [KNL] + Set time (s) after boot for CPU-hotplug testing. + + locktorture.onoff_interval= [KNL] + Set time (s) between CPU-hotplug operations, or + zero to disable CPU-hotplug testing. + + locktorture.shuffle_interval= [KNL] + Set task-shuffle interval (jiffies). Shuffling + tasks allows some CPUs to go into dyntick-idle + mode during the locktorture test. + + locktorture.shutdown_secs= [KNL] + Set time (s) after boot system shutdown. This + is useful for hands-off automated testing. + + locktorture.stat_interval= [KNL] + Time (s) between statistics printk()s. + + locktorture.stutter= [KNL] + Time (s) to stutter testing, for example, + specifying five seconds causes the test to run for + five seconds, wait for five seconds, and so on. + This tests the locking primitive's ability to + transition abruptly to and from idle. + + locktorture.torture_runnable= [BOOT] + Start locktorture running at boot time. + + locktorture.torture_type= [KNL] + Specify the locking implementation to test. + + locktorture.verbose= [KNL] + Enable additional printk() statements. + logibm.irq= [HW,MOUSE] Logitech Bus Mouse Driver Format: -- cgit v0.10.2 From dd56af42bd829c6e770ed69812bd65a04eaeb1e4 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 25 Aug 2014 20:25:06 -0700 Subject: rcu: Eliminate deadlock between CPU hotplug and expedited grace periods Currently, the expedited grace-period primitives do get_online_cpus(). This greatly simplifies their implementation, but means that calls to them holding locks that are acquired by CPU-hotplug notifiers (to say nothing of calls to these primitives from CPU-hotplug notifiers) can deadlock. But this is starting to become inconvenient, as can be seen here: https://lkml.org/lkml/2014/8/5/754. The problem in this case is that some developers need to acquire a mutex from a CPU-hotplug notifier, but also need to hold it across a synchronize_rcu_expedited(). As noted above, this currently results in deadlock. This commit avoids the deadlock and retains the simplicity by creating a try_get_online_cpus(), which returns false if the get_online_cpus() reference count could not immediately be incremented. If a call to try_get_online_cpus() returns true, the expedited primitives operate as before. If a call returns false, the expedited primitives fall back to normal grace-period operations. This falling back of course results in increased grace-period latency, but only during times when CPU hotplug operations are actually in flight. The effect should therefore be negligible during normal operation. Signed-off-by: Paul E. McKenney Cc: Josh Triplett Cc: "Rafael J. Wysocki" Tested-by: Lan Tianyu diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 95978ad..b2d9a43 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -213,6 +213,7 @@ extern struct bus_type cpu_subsys; extern void cpu_hotplug_begin(void); extern void cpu_hotplug_done(void); extern void get_online_cpus(void); +extern bool try_get_online_cpus(void); extern void put_online_cpus(void); extern void cpu_hotplug_disable(void); extern void cpu_hotplug_enable(void); @@ -230,6 +231,7 @@ int cpu_down(unsigned int cpu); static inline void cpu_hotplug_begin(void) {} static inline void cpu_hotplug_done(void) {} #define get_online_cpus() do { } while (0) +#define try_get_online_cpus() true #define put_online_cpus() do { } while (0) #define cpu_hotplug_disable() do { } while (0) #define cpu_hotplug_enable() do { } while (0) diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 008388f9..4f86465 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -505,6 +505,7 @@ static inline void print_irqtrace_events(struct task_struct *curr) #define lock_map_acquire(l) lock_acquire_exclusive(l, 0, 0, NULL, _THIS_IP_) #define lock_map_acquire_read(l) lock_acquire_shared_recursive(l, 0, 0, NULL, _THIS_IP_) +#define lock_map_acquire_tryread(l) lock_acquire_shared_recursive(l, 0, 1, NULL, _THIS_IP_) #define lock_map_release(l) lock_release(l, 1, _THIS_IP_) #ifdef CONFIG_PROVE_LOCKING diff --git a/kernel/cpu.c b/kernel/cpu.c index 81e2a38..356450f 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -79,6 +79,8 @@ static struct { /* Lockdep annotations for get/put_online_cpus() and cpu_hotplug_begin/end() */ #define cpuhp_lock_acquire_read() lock_map_acquire_read(&cpu_hotplug.dep_map) +#define cpuhp_lock_acquire_tryread() \ + lock_map_acquire_tryread(&cpu_hotplug.dep_map) #define cpuhp_lock_acquire() lock_map_acquire(&cpu_hotplug.dep_map) #define cpuhp_lock_release() lock_map_release(&cpu_hotplug.dep_map) @@ -91,10 +93,22 @@ void get_online_cpus(void) mutex_lock(&cpu_hotplug.lock); cpu_hotplug.refcount++; mutex_unlock(&cpu_hotplug.lock); - } EXPORT_SYMBOL_GPL(get_online_cpus); +bool try_get_online_cpus(void) +{ + if (cpu_hotplug.active_writer == current) + return true; + if (!mutex_trylock(&cpu_hotplug.lock)) + return false; + cpuhp_lock_acquire_tryread(); + cpu_hotplug.refcount++; + mutex_unlock(&cpu_hotplug.lock); + return true; +} +EXPORT_SYMBOL_GPL(try_get_online_cpus); + void put_online_cpus(void) { if (cpu_hotplug.active_writer == current) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index d7a3b13..133e472 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2940,11 +2940,6 @@ static int synchronize_sched_expedited_cpu_stop(void *data) * restructure your code to batch your updates, and then use a single * synchronize_sched() instead. * - * Note that it is illegal to call this function while holding any lock - * that is acquired by a CPU-hotplug notifier. And yes, it is also illegal - * to call this function from a CPU-hotplug notifier. Failing to observe - * these restriction will result in deadlock. - * * This implementation can be thought of as an application of ticket * locking to RCU, with sync_sched_expedited_started and * sync_sched_expedited_done taking on the roles of the halves @@ -2994,7 +2989,12 @@ void synchronize_sched_expedited(void) */ snap = atomic_long_inc_return(&rsp->expedited_start); firstsnap = snap; - get_online_cpus(); + if (!try_get_online_cpus()) { + /* CPU hotplug operation in flight, fall back to normal GP. */ + wait_rcu_gp(call_rcu_sched); + atomic_long_inc(&rsp->expedited_normal); + return; + } WARN_ON_ONCE(cpu_is_offline(raw_smp_processor_id())); /* @@ -3041,7 +3041,12 @@ void synchronize_sched_expedited(void) * and they started after our first try, so their grace * period works for us. */ - get_online_cpus(); + if (!try_get_online_cpus()) { + /* CPU hotplug operation in flight, use normal GP. */ + wait_rcu_gp(call_rcu_sched); + atomic_long_inc(&rsp->expedited_normal); + return; + } snap = atomic_long_read(&rsp->expedited_start); smp_mb(); /* ensure read is before try_stop_cpus(). */ } diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index e2c5910..387dd45 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -793,11 +793,6 @@ sync_rcu_preempt_exp_init(struct rcu_state *rsp, struct rcu_node *rnp) * In fact, if you are using synchronize_rcu_expedited() in a loop, * please restructure your code to batch your updates, and then Use a * single synchronize_rcu() instead. - * - * Note that it is illegal to call this function while holding any lock - * that is acquired by a CPU-hotplug notifier. And yes, it is also illegal - * to call this function from a CPU-hotplug notifier. Failing to observe - * these restriction will result in deadlock. */ void synchronize_rcu_expedited(void) { @@ -819,7 +814,11 @@ void synchronize_rcu_expedited(void) * being boosted. This simplifies the process of moving tasks * from leaf to root rcu_node structures. */ - get_online_cpus(); + if (!try_get_online_cpus()) { + /* CPU-hotplug operation in flight, fall back to normal GP. */ + wait_rcu_gp(call_rcu); + return; + } /* * Acquire lock, falling back to synchronize_rcu() if too many -- cgit v0.10.2 From e34191fad8e5d9fe4e76f6d03b5e29e3eae7535a Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Mon, 29 Sep 2014 06:14:23 -0700 Subject: locktorture: Support rwlocks Add a "rw_lock" torture test to stress kernel rwlocks and their irq variant. Reader critical regions are 5x longer than writers. As such a similar ratio of lock acquisitions is seen in the statistics. In the case of massive contention, both hold the lock for 1/10 of a second. Signed-off-by: Davidlohr Bueso Signed-off-by: Paul E. McKenney diff --git a/Documentation/locking/locktorture.txt b/Documentation/locking/locktorture.txt index be71501..619f2bb 100644 --- a/Documentation/locking/locktorture.txt +++ b/Documentation/locking/locktorture.txt @@ -45,6 +45,11 @@ torture_type Type of lock to torture. By default, only spinlocks will o "spin_lock_irq": spin_lock_irq() and spin_unlock_irq() pairs. + o "rw_lock": read/write lock() and unlock() rwlock pairs. + + o "rw_lock_irq": read/write lock_irq() and unlock_irq() + rwlock pairs. + o "mutex_lock": mutex_lock() and mutex_unlock() pairs. o "rwsem_lock": read/write down() and up() semaphore pairs. diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index 540d5df..0762b25 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -229,6 +230,110 @@ static struct lock_torture_ops spin_lock_irq_ops = { .name = "spin_lock_irq" }; +static DEFINE_RWLOCK(torture_rwlock); + +static int torture_rwlock_write_lock(void) __acquires(torture_rwlock) +{ + write_lock(&torture_rwlock); + return 0; +} + +static void torture_rwlock_write_delay(struct torture_random_state *trsp) +{ + const unsigned long shortdelay_us = 2; + const unsigned long longdelay_ms = 100; + + /* We want a short delay mostly to emulate likely code, and + * we want a long delay occasionally to force massive contention. + */ + if (!(torture_random(trsp) % + (cxt.nrealwriters_stress * 2000 * longdelay_ms))) + mdelay(longdelay_ms); + else + udelay(shortdelay_us); +} + +static void torture_rwlock_write_unlock(void) __releases(torture_rwlock) +{ + write_unlock(&torture_rwlock); +} + +static int torture_rwlock_read_lock(void) __acquires(torture_rwlock) +{ + read_lock(&torture_rwlock); + return 0; +} + +static void torture_rwlock_read_delay(struct torture_random_state *trsp) +{ + const unsigned long shortdelay_us = 10; + const unsigned long longdelay_ms = 100; + + /* We want a short delay mostly to emulate likely code, and + * we want a long delay occasionally to force massive contention. + */ + if (!(torture_random(trsp) % + (cxt.nrealreaders_stress * 2000 * longdelay_ms))) + mdelay(longdelay_ms); + else + udelay(shortdelay_us); +} + +static void torture_rwlock_read_unlock(void) __releases(torture_rwlock) +{ + read_unlock(&torture_rwlock); +} + +static struct lock_torture_ops rw_lock_ops = { + .writelock = torture_rwlock_write_lock, + .write_delay = torture_rwlock_write_delay, + .writeunlock = torture_rwlock_write_unlock, + .readlock = torture_rwlock_read_lock, + .read_delay = torture_rwlock_read_delay, + .readunlock = torture_rwlock_read_unlock, + .name = "rw_lock" +}; + +static int torture_rwlock_write_lock_irq(void) __acquires(torture_rwlock) +{ + unsigned long flags; + + write_lock_irqsave(&torture_rwlock, flags); + cxt.cur_ops->flags = flags; + return 0; +} + +static void torture_rwlock_write_unlock_irq(void) +__releases(torture_rwlock) +{ + write_unlock_irqrestore(&torture_rwlock, cxt.cur_ops->flags); +} + +static int torture_rwlock_read_lock_irq(void) __acquires(torture_rwlock) +{ + unsigned long flags; + + read_lock_irqsave(&torture_rwlock, flags); + cxt.cur_ops->flags = flags; + return 0; +} + +static void torture_rwlock_read_unlock_irq(void) +__releases(torture_rwlock) +{ + write_unlock_irqrestore(&torture_rwlock, cxt.cur_ops->flags); +} + +static struct lock_torture_ops rw_lock_irq_ops = { + .writelock = torture_rwlock_write_lock_irq, + .write_delay = torture_rwlock_write_delay, + .writeunlock = torture_rwlock_write_unlock_irq, + .readlock = torture_rwlock_read_lock_irq, + .read_delay = torture_rwlock_read_delay, + .readunlock = torture_rwlock_read_unlock_irq, + .name = "rw_lock_irq" +}; + static DEFINE_MUTEX(torture_mutex); static int torture_mutex_lock(void) __acquires(torture_mutex) @@ -535,8 +640,11 @@ static int __init lock_torture_init(void) int i, j; int firsterr = 0; static struct lock_torture_ops *torture_ops[] = { - &lock_busted_ops, &spin_lock_ops, &spin_lock_irq_ops, - &mutex_lock_ops, &rwsem_lock_ops, + &lock_busted_ops, + &spin_lock_ops, &spin_lock_irq_ops, + &rw_lock_ops, &rw_lock_irq_ops, + &mutex_lock_ops, + &rwsem_lock_ops, }; if (!torture_init_begin(torture_type, verbose, &torture_runnable)) @@ -571,7 +679,8 @@ static int __init lock_torture_init(void) cxt.debug_lock = true; #endif #ifdef CONFIG_DEBUG_SPINLOCK - if (strncmp(torture_type, "spin", 4) == 0) + if ((strncmp(torture_type, "spin", 4) == 0) || + (strncmp(torture_type, "rw_lock", 7) == 0)) cxt.debug_lock = true; #endif diff --git a/tools/testing/selftests/rcutorture/configs/lock/CFLIST b/tools/testing/selftests/rcutorture/configs/lock/CFLIST index 6108137..6910b73 100644 --- a/tools/testing/selftests/rcutorture/configs/lock/CFLIST +++ b/tools/testing/selftests/rcutorture/configs/lock/CFLIST @@ -1,3 +1,4 @@ LOCK01 LOCK02 LOCK03 +LOCK04 \ No newline at end of file diff --git a/tools/testing/selftests/rcutorture/configs/lock/LOCK04 b/tools/testing/selftests/rcutorture/configs/lock/LOCK04 new file mode 100644 index 0000000..1d1da14 --- /dev/null +++ b/tools/testing/selftests/rcutorture/configs/lock/LOCK04 @@ -0,0 +1,6 @@ +CONFIG_SMP=y +CONFIG_NR_CPUS=4 +CONFIG_HOTPLUG_CPU=y +CONFIG_PREEMPT_NONE=n +CONFIG_PREEMPT_VOLUNTARY=n +CONFIG_PREEMPT=y diff --git a/tools/testing/selftests/rcutorture/configs/lock/LOCK04.boot b/tools/testing/selftests/rcutorture/configs/lock/LOCK04.boot new file mode 100644 index 0000000..48c04fe --- /dev/null +++ b/tools/testing/selftests/rcutorture/configs/lock/LOCK04.boot @@ -0,0 +1 @@ +locktorture.torture_type=rw_lock -- cgit v0.10.2 From 219f800f99db6f4e43a582cb9e0d98931f13c012 Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Mon, 29 Sep 2014 06:14:24 -0700 Subject: locktorture: Fix __acquire annotation for spinlock irq Its quite easy to get mixed up with the names -- 'torture_spinlock_irq' is not actually a valid spinlock name. Signed-off-by: Davidlohr Bueso Signed-off-by: Paul E. McKenney diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index 0762b25..9e9cd11 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -205,7 +205,7 @@ static struct lock_torture_ops spin_lock_ops = { }; static int torture_spin_lock_write_lock_irq(void) -__acquires(torture_spinlock_irq) +__acquires(torture_spinlock) { unsigned long flags; -- cgit v0.10.2 From a1229491006a3d55cc0d7e6d496be39915ccefdd Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Mon, 29 Sep 2014 06:14:25 -0700 Subject: locktorture: Cannot hold read and write lock ... trigger an error if so. Signed-off-by: Davidlohr Bueso Signed-off-by: Paul E. McKenney diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index 9e9cd11..b05dc46 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -453,14 +453,19 @@ static int lock_torture_writer(void *arg) do { if ((torture_random(&rand) & 0xfffff) == 0) schedule_timeout_uninterruptible(1); + cxt.cur_ops->writelock(); if (WARN_ON_ONCE(lock_is_write_held)) lwsp->n_lock_fail++; lock_is_write_held = 1; + if (WARN_ON_ONCE(lock_is_read_held)) + lwsp->n_lock_fail++; /* rare, but... */ + lwsp->n_lock_acquired++; cxt.cur_ops->write_delay(&rand); lock_is_write_held = 0; cxt.cur_ops->writeunlock(); + stutter_wait("lock_torture_writer"); } while (!torture_must_stop()); torture_kthread_stopping("lock_torture_writer"); @@ -482,12 +487,17 @@ static int lock_torture_reader(void *arg) do { if ((torture_random(&rand) & 0xfffff) == 0) schedule_timeout_uninterruptible(1); + cxt.cur_ops->readlock(); lock_is_read_held = 1; + if (WARN_ON_ONCE(lock_is_write_held)) + lrsp->n_lock_fail++; /* rare, but... */ + lrsp->n_lock_acquired++; cxt.cur_ops->read_delay(&rand); lock_is_read_held = 0; cxt.cur_ops->readunlock(); + stutter_wait("lock_torture_reader"); } while (!torture_must_stop()); torture_kthread_stopping("lock_torture_reader"); -- cgit v0.10.2 From c98fed9fc6a7449affd941d8a8e9fcb0c72977d6 Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Mon, 29 Sep 2014 06:14:26 -0700 Subject: locktorture: Cleanup header usage Remove some unnecessary ones and explicitly include rwsem.h Signed-off-by: Davidlohr Bueso Signed-off-by: Paul E. McKenney diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index b05dc46..ec8cce2 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -20,32 +20,20 @@ * Author: Paul E. McKenney * Based on kernel/rcu/torture.c. */ -#include #include -#include #include #include -#include #include #include #include +#include #include #include #include #include -#include -#include #include -#include -#include -#include -#include -#include #include -#include #include -#include -#include #include MODULE_LICENSE("GPL"); -- cgit v0.10.2 From 789cbbeca4eb7141cbd748ee93772471101b507b Mon Sep 17 00:00:00 2001 From: Joe Lawrence Date: Sun, 5 Oct 2014 13:24:21 -0400 Subject: workqueue: Add quiescent state between work items Similar to the stop_machine deadlock scenario on !PREEMPT kernels addressed in b22ce2785d97 "workqueue: cond_resched() after processing each work item", kworker threads requeueing back-to-back with zero jiffy delay can stall RCU. The cond_resched call introduced in that fix will yield only iff there are other higher priority tasks to run, so force a quiescent RCU state between work items. Signed-off-by: Joe Lawrence Link: https://lkml.kernel.org/r/20140926105227.01325697@jlaw-desktop.mno.stratus.com Link: https://lkml.kernel.org/r/20140929115445.40221d8e@jlaw-desktop.mno.stratus.com Fixes: b22ce2785d97 ("workqueue: cond_resched() after processing each work item") Cc: Acked-by: Tejun Heo Signed-off-by: Paul E. McKenney diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 5dbe22a..345bec9 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2043,8 +2043,10 @@ __acquires(&pool->lock) * kernels, where a requeueing work item waiting for something to * happen could deadlock with stop_machine as such work item could * indefinitely requeue itself while all other CPUs are trapped in - * stop_machine. + * stop_machine. At the same time, report a quiescent RCU state so + * the same condition doesn't freeze RCU. */ + rcu_note_voluntary_context_switch(current); cond_resched(); spin_lock_irq(&pool->lock); -- cgit v0.10.2 From 3e28e377204badfc3c4119ff2abda473127ee0ff Mon Sep 17 00:00:00 2001 From: Joe Lawrence Date: Sun, 5 Oct 2014 13:24:22 -0400 Subject: workqueue: Use cond_resched_rcu_qs macro Tidy up and use cond_resched_rcu_qs when calling cond_resched and reporting potential quiescent state to RCU. Splitting this change in this way allows easy backporting to -stable for kernel versions not having cond_resched_rcu_qs(). Signed-off-by: Joe Lawrence Acked-by: Tejun Heo Signed-off-by: Paul E. McKenney diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 345bec9..09b685d 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2046,8 +2046,7 @@ __acquires(&pool->lock) * stop_machine. At the same time, report a quiescent RCU state so * the same condition doesn't freeze RCU. */ - rcu_note_voluntary_context_switch(current); - cond_resched(); + cond_resched_rcu_qs(); spin_lock_irq(&pool->lock); -- cgit v0.10.2