cpu/rt: Rework cpu down for PREEMPT_RT

Bringing a CPU down is a pain with the PREEMPT_RT kernel because tasks can be preempted in many more places than in non-RT. In order to handle per_cpu variables, tasks may be pinned to a CPU for a while, and even sleep. But these tasks need to be off the CPU if that CPU is going down. Several synchronization methods have been tried, but when stressed they failed. This is a new approach. A sync_tsk thread is still created and tasks may still block on a lock when the CPU is going down, but how that works is a bit different. When cpu_down() starts, it will create the sync_tsk and wait on it to inform that current tasks that are pinned on the CPU are no longer pinned. But new tasks that are about to be pinned will still be allowed to do so at this time. Then the notifiers are called. Several notifiers will bring down tasks that will enter these locations. Some of these tasks will take locks of other tasks that are on the CPU. If we don't let those other tasks continue, but make them block until CPU down is done, the tasks that the notifiers are waiting on will never complete as they are waiting for the locks held by the tasks that are blocked. Thus we still let the task pin the CPU until the notifiers are done. After the notifiers run, we then make new tasks entering the pinned CPU sections grab a mutex and wait. This mutex is now a per CPU mutex in the hotplug_pcp descriptor. To help things along, a new function in the scheduler code is created called migrate_me(). This function will try to migrate the current task off the CPU this is going down if possible. When the sync_tsk is created, all tasks will then try to migrate off the CPU going down. There are several cases that this wont work, but it helps in most cases. After the notifiers are called and if a task can't migrate off but enters the pin CPU sections, it will be forced to wait on the hotplug_pcp mutex until the CPU down is complete. Then the scheduler will force the migration anyway. Also, I found that THREAD_BOUND need to also be accounted for in the pinned CPU, and the migrate_disable no longer treats them special. This helps fix issues with ksoftirqd and workqueue that unbind on CPU down. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
author: Steven Rostedt <srostedt@redhat.com> 2012-07-16 08:07:43 (GMT)
committer: Scott Wood <scottwood@freescale.com> 2014-05-14 18:38:48 (GMT)
commit: 822ed10ccdf84967279bb975d17131d7e07c2c7f (patch)
tree: 09699ea339e3182752ba1516204c218f0b78e5a1 /kernel/sched
parent: 52929348fe5a0dc0d29d0ae105748892e273f724 (diff)
download: linux-fsl-qoriq-822ed10ccdf84967279bb975d17131d7e07c2c7f.tar.xz
1 files changed, 80 insertions, 2 deletions
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fe06ca2..387a4c6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2371,7 +2371,7 @@ void migrate_disable(void)
 {
 	struct task_struct *p = current;
 
-	if (in_atomic() || p->flags & PF_NO_SETAFFINITY) {
+	if (in_atomic()) {
 #ifdef CONFIG_SCHED_DEBUG
 		p->migrate_disable_atomic++;
 #endif
@@ -2401,7 +2401,7 @@ void migrate_enable(void)
 	unsigned long flags;
 	struct rq *rq;
 
-	if (in_atomic() || p->flags & PF_NO_SETAFFINITY) {
+	if (in_atomic()) {
 #ifdef CONFIG_SCHED_DEBUG
 		p->migrate_disable_atomic--;
 #endif
@@ -4440,6 +4440,84 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 	cpumask_copy(&p->cpus_allowed, new_mask);
 }
 
+static DEFINE_PER_CPU(struct cpumask, sched_cpumasks);
+static DEFINE_MUTEX(sched_down_mutex);
+static cpumask_t sched_down_cpumask;
+
+void tell_sched_cpu_down_begin(int cpu)
+{
+	mutex_lock(&sched_down_mutex);
+	cpumask_set_cpu(cpu, &sched_down_cpumask);
+	mutex_unlock(&sched_down_mutex);
+}
+
+void tell_sched_cpu_down_done(int cpu)
+{
+	mutex_lock(&sched_down_mutex);
+	cpumask_clear_cpu(cpu, &sched_down_cpumask);
+	mutex_unlock(&sched_down_mutex);
+}
+
+/**
+ * migrate_me - try to move the current task off this cpu
+ *
+ * Used by the pin_current_cpu() code to try to get tasks
+ * to move off the current CPU as it is going down.
+ * It will only move the task if the task isn't pinned to
+ * the CPU (with migrate_disable, affinity or NO_SETAFFINITY)
+ * and the task has to be in a RUNNING state. Otherwise the
+ * movement of the task will wake it up (change its state
+ * to running) when the task did not expect it.
+ *
+ * Returns 1 if it succeeded in moving the current task
+ *         0 otherwise.
+ */
+int migrate_me(void)
+{
+	struct task_struct *p = current;
+	struct migration_arg arg;
+	struct cpumask *cpumask;
+	struct cpumask *mask;
+	unsigned long flags;
+	unsigned int dest_cpu;
+	struct rq *rq;
+
+	/*
+	 * We can not migrate tasks bounded to a CPU or tasks not
+	 * running. The movement of the task will wake it up.
+	 */
+	if (p->flags & PF_NO_SETAFFINITY || p->state)
+		return 0;
+
+	mutex_lock(&sched_down_mutex);
+	rq = task_rq_lock(p, &flags);
+
+	cpumask = &__get_cpu_var(sched_cpumasks);
+	mask = &p->cpus_allowed;
+
+	cpumask_andnot(cpumask, mask, &sched_down_cpumask);
+
+	if (!cpumask_weight(cpumask)) {
+		/* It's only on this CPU? */
+		task_rq_unlock(rq, p, &flags);
+		mutex_unlock(&sched_down_mutex);
+		return 0;
+	}
+
+	dest_cpu = cpumask_any_and(cpu_active_mask, cpumask);
+
+	arg.task = p;
+	arg.dest_cpu = dest_cpu;
+
+	task_rq_unlock(rq, p, &flags);
+
+	stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
+	tlb_migrate_finish(p->mm);
+	mutex_unlock(&sched_down_mutex);
+
+	return 1;
+}
+
 /*
  * This is how migration works:
  *
author	Steven Rostedt <srostedt@redhat.com>	2012-07-16 08:07:43 (GMT)
committer	Scott Wood <scottwood@freescale.com>	2014-05-14 18:38:48 (GMT)
commit	822ed10ccdf84967279bb975d17131d7e07c2c7f (patch)
tree	09699ea339e3182752ba1516204c218f0b78e5a1 /kernel/sched
parent	52929348fe5a0dc0d29d0ae105748892e273f724 (diff)
download	linux-fsl-qoriq-822ed10ccdf84967279bb975d17131d7e07c2c7f.tar.xz