summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-02-19sched/rt: Add <linux/sched/rt.h> header to <linux/init_task.h>Ingo Molnar
IA64 relied on it through sched.h inclusion: arch/ia64/kernel/init_task.c:38:11: error: 'MAX_PRIO' undeclared here (not in a function) arch/ia64/kernel/init_task.c:38:11: error: 'RR_TIMESLICE' undeclared here (not in a function) Reported-by: kbuild test robot <fengguang.wu@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-xaan1twswggedMR0airtpjui@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-19cputime: Remove irqsave from seqlock readersThomas Gleixner
The reader side code has no requirement to disable interrupts while sampling data. The sequence counter is enough to ensure consistency. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-11sched, powerpc: Fix sched.h split-up build failureIngo Molnar
Fix PowerPC/Cell build fallout from: 8bd75c77b7c6 sched/rt: Move rt specific bits into new header file Reported-by: Michael Ellerman <michael@ellerman.id.au> Cc: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20130207094707.7b9f825f@riff.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-08cputime: Restore CPU_ACCOUNTING config defaults for PPC64Stephen Rothwell
Commit abf917cd91cb ("cputime: Generic on-demand virtual cputime accounting") inadvertantly changed the default CPU_ACCOUNTING config for PPC64. Repair that. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: ppc-dev <linuxppc-dev@lists.ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Link: http://lkml.kernel.org/r/20130208141938.f31b7b9e1acac5bbe769ee4c@canb.auug.org.au Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-07sched/rt: Move rt specific bits into new header fileClark Williams
Move rt scheduler definitions out of include/linux/sched.h into new file include/linux/sched/rt.h Signed-off-by: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20130207094707.7b9f825f@riff.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-07sched/rt: Add a tuning knob to allow changing SCHED_RR timesliceClark Williams
Add a /proc/sys/kernel scheduler knob named sched_rr_timeslice_ms that allows global changing of the SCHED_RR timeslice value. User visable value is in milliseconds but is stored as jiffies. Setting to 0 (zero) resets to the default (currently 100ms). Signed-off-by: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20130207094704.13751796@riff.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-07sched: Move sched.h sysctl bits into separate headerClark Williams
Move the sysctl-related bits from include/linux/sched.h into a new file: include/linux/sched/sysctl.h. Then update source files requiring access to those bits by including the new header file. Signed-off-by: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20130207094659.06dced96@riff.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-05Merge tag 'full-dynticks-cputime-for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/core Pull full-dynticks (user-space execution is undisturbed and receives no timer IRQs) preparation changes that convert the cputime accounting code to be full-dynticks ready, from Frederic Weisbecker: "This implements the cputime accounting on full dynticks CPUs. Typical cputime stats infrastructure relies on the timer tick and its periodic polling on the CPU to account the amount of time spent by the CPUs and the tasks per high level domains such as userspace, kernelspace, guest, ... Now we are preparing to implement full dynticks capability on Linux for Real Time and HPC users who want full CPU isolation. This feature requires a cputime accounting that doesn't depend on the timer tick. To implement it, this new cputime infrastructure plugs into kernel/user/guest boundaries to take snapshots of cputime and flush these to the stats when needed. This performs pretty much like CONFIG_VIRT_CPU_ACCOUNTING except that context location and cputime snaphots are synchronized between write and read side such that the latter can safely retrieve the pending tickless cputime of a task and add it to its latest cputime snapshot to return the correct result to the user." Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-05sched: Fix signedness bug in yield_to()Dan Carpenter
In 7b270f6099 "sched: Bail out of yield_to when source and target runqueue has one task" we changed this to store -ESRCH so it needs to be signed. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kbuild@01.org Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <efault@gmx.de> Link: http://lkml.kernel.org/r/20130205113751.GA20521@elgon.mountain Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-04sched: Fix select_idle_sibling() bouncing cow syndromeMike Galbraith
If the previous CPU is cache affine and idle, select it. The current implementation simply traverses the sd_llc domain, taking the first idle CPU encountered, which walks buddy pairs hand in hand over the package, inflicting excruciating pain. 1 tbench pair (worst case) in a 10 core + SMT package: pre 15.22 MB/sec 1 procs post 252.01 MB/sec 1 procs Signed-off-by: Mike Galbraith <bitbucket@online.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1359371965.5783.127.camel@marge.simpson.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-03sched/rt: Further simplify pick_rt_task()Kirill Tkhai
Function next_prio() has been removed and pull_rt_task() is the only user of pick_next_highest_task_rt() at the moment. pull_rt_task is not interested in p->nr_cpus_allowed, its only interest is the fact that cpu is allowed to execute p. If nr_cpus_allowed == 1, cpu != task_cpu(p) and cpu is allowed then it means that task p is in the middle of the migration techniques; the task waits until it is moved by migration thread. So, lets pull it earlier. Signed-off-by: Kirill V Tkhai <tkhai@yandex.ru> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> CC: linux-rt-users <linux-rt-users@vger.kernel.org> Link: http://lkml.kernel.org/r/70871359644177@web16d.yandex.ru Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-31sched/rt: Do not account zero delta_exec in update_curr_rt()Kirill Tkhai
There are several places of consecutive calls of dequeue_task_rt() and put_prev_task_rt() in the scheduler. For example, function rt_mutex_setprio() does it. The both calls lead to update_curr_rt(), the second of it receives zeroed delta_exec. The only effective action in this case is call of sched_rt_avg_update(), which can change rq->age_stamp and rq->rt_avg. But it is possible in case of ""floating"" rq->clock. This fact is not reasonable to be accounted. Another actions do nothing. Signed-off-by: Kirill V Tkhai <tkhai@yandex.ru> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> CC: linux-rt-users <linux-rt-users@vger.kernel.org> Link: http://lkml.kernel.org/r/931541359550236@web1g.yandex.ru Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-27cputime: Safely read cputime of full dynticks CPUsFrederic Weisbecker
While remotely reading the cputime of a task running in a full dynticks CPU, the values stored in utime/stime fields of struct task_struct may be stale. Its values may be those of the last kernel <-> user transition time snapshot and we need to add the tickless time spent since this snapshot. To fix this, flush the cputime of the dynticks CPUs on kernel <-> user transition and record the time / context where we did this. Then on top of this snapshot and the current time, perform the fixup on the reader side from task_times() accessors. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> [fixed kvm module related build errors] Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
2013-01-27kvm: Prepare to add generic guest entry/exit callbacksFrederic Weisbecker
Do some ground preparatory work before adding guest_enter() and guest_exit() context tracking callbacks. Those will be later used to read the guest cputime safely when we run in full dynticks mode. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Gleb Natapov <gleb@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-01-27cputime: Use accessors to read task cputime statsFrederic Weisbecker
This is in preparation for the full dynticks feature. While remotely reading the cputime of a task running in a full dynticks CPU, we'll need to do some extra-computation. This way we can account the time it spent tickless in userspace since its last cputime snapshot. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-01-27cputime: Allow dynamic switch between tick/virtual based cputime accountingFrederic Weisbecker
Allow to dynamically switch between tick and virtual based cputime accounting. This way we can provide a kind of "on-demand" virtual based cputime accounting. In this mode, the kernel relies on the context tracking subsystem to dynamically probe on kernel boundaries. This is in preparation for being able to stop the timer tick in more places than just the idle state. Doing so will depend on CONFIG_VIRT_CPU_ACCOUNTING_GEN which makes it possible to account the cputime without the tick by hooking on kernel/user boundaries. Depending whether the tick is stopped or not, we can switch between tick and vtime based accounting anytime in order to minimize the overhead associated to user hooks. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-01-27cputime: Generic on-demand virtual cputime accountingFrederic Weisbecker
If we want to stop the tick further idle, we need to be able to account the cputime without using the tick. Virtual based cputime accounting solves that problem by hooking into kernel/user boundaries. However implementing CONFIG_VIRT_CPU_ACCOUNTING require low level hooks and involves more overhead. But we already have a generic context tracking subsystem that is required for RCU needs by archs which plan to shut down the tick outside idle. This patch implements a generic virtual based cputime accounting that relies on these generic kernel/user hooks. There are some upsides of doing this: - This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING if context tracking is already built (already necessary for RCU in full tickless mode). - We can rely on the generic context tracking subsystem to dynamically (de)activate the hooks, so that we can switch anytime between virtual and tick based accounting. This way we don't have the overhead of the virtual accounting when the tick is running periodically. And one downside: - There is probably more overhead than a native virtual based cputime accounting. But this relies on hooks that are already set anyway. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-01-27cputime: Move default nsecs_to_cputime() to jiffies based cputime fileFrederic Weisbecker
If the architecture doesn't provide an implementation of nsecs_to_cputime(), the cputime accounting core uses a default one that converts the nanoseconds to jiffies. However this only makes sense if we use the jiffies based cputime. For now it doesn't matter much because this API is only called on code that uses jiffies based cputime accounting. But the code may evolve and this API may be used more broadly in the future. Keeping this default implementation around is very error prone as it may introduce a bug and hide it on architectures that don't override this API. Fix this by moving this definition to the jiffies based cputime headers as it is the only place where it belongs to. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-01-27cputime: Librarize per nsecs resolution cputime definitionsFrederic Weisbecker
The full dynticks cputime accounting that we'll soon introduce will rely on sched_clock(). And its clock can have a per nanosecond granularity. To prepare for this, we need to have a cputime_t implementation that has this precision. ia64 virtual cputime accounting already uses that granularity so all we need is to librarize its implementation in the asm generic headers. Also librarize the default per jiffy granularity cputime_t as well so that we can easily pick either implementation depending on the cputime accounting config we choose. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-01-27cputime: Avoid multiplication overflow on utime scalingFrederic Weisbecker
We scale stime, utime values based on rtime (sum_exec_runtime converted to jiffies). During scaling we multiple rtime * utime, which seems to be fine, since both values are converted to u64, but it's not. Let assume HZ is 1000 - 1ms tick. Process consist of 64 threads, run for 1 day, threads utilize 100% cpu on user space. Machine has 64 cpus. Process rtime = utime will be 64 * 24 * 60 * 60 * 1000 jiffies, which is 0x149970000. Multiplication rtime * utime result is 0x1a855771100000000, which can not be covered in 64 bits. Result of overflow is stall of utime values visible in user space (prev_utime in kernel), even if application still consume lot of CPU time. A solution to solve this is to perform the multiplication on stime instead of utime. It's easy to grow the utime value fast with a CPU bound thread in userspace for example. Now we assume that doing so with stime is much harder. In most cases a task shouldn't ever spend much time in kernel space as it tends to sleep waiting for jobs completion when they take long to achieve. IO is the typical example of that. Hence scaling the cputime by performing the multiplication on stime instead of utime should considerably reduce the chances of an overflow on most workloads. This is largely inspired by a patch from Stanislaw Gruszka: http://lkml.kernel.org/r/20130107113144.GA7544@redhat.com Inspired-by: Stanislaw Gruszka <sgruszka@redhat.com> Reported-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1359217182-25184-1-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-26context_tracking: Export context state for generic vtimeFrederic Weisbecker
Export the context state: whether we run in user / kernel from the context tracking subsystem point of view. This is going to be used by the generic virtual cputime accounting subsystem that is needed to implement the full dynticks. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-01-25sched/rt: Avoid updating RT entry timeout twice within one tick periodYing Xue
The issue below was found in 2.6.34-rt rather than mainline rt kernel, but the issue still exists upstream as well. So please let me describe how it was noticed on 2.6.34-rt: On this version, each softirq has its own thread, it means there is at least one RT FIFO task per cpu. The priority of these tasks is set to 49 by default. If user launches an RT FIFO task with priority lower than 49 of softirq RT tasks, it's possible there are two RT FIFO tasks enqueued one cpu runqueue at one moment. By current strategy of balancing RT tasks, when it comes to RT tasks, we really need to put them off to a CPU that they can run on as soon as possible. Even if it means a bit of cache line flushing, we want RT tasks to be run with the least latency. When the user RT FIFO task which just launched before is running, the sched timer tick of the current cpu happens. In this tick period, the timeout value of the user RT task will be updated once. Subsequently, we try to wake up one softirq RT task on its local cpu. As the priority of current user RT task is lower than the softirq RT task, the current task will be preempted by the higher priority softirq RT task. Before preemption, we check to see if current can readily move to a different cpu. If so, we will reschedule to allow the RT push logic to try to move current somewhere else. Whenever the woken softirq RT task runs, it first tries to migrate the user FIFO RT task over to a cpu that is running a task of lesser priority. If migration is done, it will send a reschedule request to the found cpu by IPI interrupt. Once the target cpu responds the IPI interrupt, it will pick the migrated user RT task to preempt its current task. When the user RT task is running on the new cpu, the sched timer tick of the cpu fires. So it will tick the user RT task again. This also means the RT task timeout value will be updated again. As the migration may be done in one tick period, it means the user RT task timeout value will be updated twice within one tick. If we set a limit on the amount of cpu time for the user RT task by setrlimit(RLIMIT_RTTIME), the SIGXCPU signal should be posted upon reaching the soft limit. But exactly when the SIGXCPU signal should be sent depends on the RT task timeout value. In fact the timeout mechanism of sending the SIGXCPU signal assumes the RT task timeout is increased once every tick. However, currently the timeout value may be added twice per tick. So it results in the SIGXCPU signal being sent earlier than expected. To solve this issue, we prevent the timeout value from increasing twice within one tick time by remembering the jiffies value of last updating the timeout. As long as the RT task's jiffies is different with the global jiffies value, we allow its timeout to be updated. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Fan Du <fan.du@windriver.com> Reviewed-by: Yong Zhang <yong.zhang0@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: <peterz@infradead.org> Link: http://lkml.kernel.org/r/1342508623-2887-1-git-send-email-ying.xue@windriver.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24sched/fair: Set se->vruntime directly in place_entity()Viresh Kumar
We are first storing the new vruntime in a variable and then storing it in se->vruntime. Simply update se->vruntime directly. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linaro-dev@lists.linaro.org Cc: patches@linaro.org Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/ae59db1945518d6f6250920d46eb1f1a9cc0024e.1352361704.git.viresh.kumar@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24sched/rt: Add reschedule check to switched_from_rt()Kirill Tkhai
Reschedule rq->curr if the first RT task has just been pulled to the rq. Signed-off-by: Kirill V Tkhai <tkhai@yandex.ru> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tkhai Kirill <tkhai@yandex.ru> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/118761353614535@web28f.yandex.ru Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24sched: Fix the broken sched_rr_get_interval()Zhu Yanhai
The caller of sched_sliced() should pass se.cfs_rq and se as the arguments, however in sched_rr_get_interval() we gave it rq.cfs_rq and se, which made the following computation obviously wrong. The change was introduced by commit: 77034937dc45 sched: fix crash in sys_sched_rr_get_interval() ... 5 years ago, while it had been the correct 'cfs_rq_of' before the commit. The change seems to be irrelevant to the commit msg, which was to return a 0 timeslice for tasks that are on an idle runqueue. So I believe that was just a plain typo. Signed-off-by: Zhu Yanhai <gaoyang.zyh@taobao.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1357621012-15039-1-git-send-email-gaoyang.zyh@taobao.com [ Since this is an ABI and an old bug, we'll test this via a slow upstream route, to hopefully discover any app breakage. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24Merge tag 'usb-3.8-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull more USB fixes from Greg Kroah-Hartman: "Here are some more USB fixes for the 3.8-rc4 tree. Some gadget driver fixes, and finally resolved the ehci-mxc driver build issues (it's just some code moving around and being deleted)." * tag 'usb-3.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: USB: EHCI: fix build error in ehci-mxc USB: EHCI: add a name for the platform-private field USB: EHCI: fix incorrect configuration test USB: EHCI: Move definition of EHCI_STATS to ehci.h USB: UHCI: fix IRQ race during initialization usb: gadget: FunctionFS: Fix missing braces in parse_opts usb: dwc3: gadget: fix ep->maxburst for ep0 ARM: i.MX clock: Change the connection-id for fsl-usb2-udc usb: gadget: fsl_mxc_udc: replace MX35_IO_ADDRESS to ioremap usb: gadget: fsl-mxc-udc: replace cpu_is_xxx() with platform_device_id usb: musb: cppi_dma: drop '__init' annotation
2013-01-24Merge tag 'char-misc-3.8-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull drivers/misc fix from Greg Kroah-Hartman: "Here is a single revert for the ti-st misc driver, fixing problem that was introduced in 3.7-rc1 that has been bothering people." * tag 'char-misc-3.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: Revert "drivers/misc/ti-st: remove gpio handling"
2013-01-24Merge tag 'tty-3.8-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull a TTY maintainer patch from Greg Kroah-Hartman: "Just a MAINTAINERS update, now that Alan has left for a bit, I'll continue to watch over the serial drivers." * tag 'tty-3.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: MAINTAINERS: Someone needs to watch over the serial drivers
2013-01-24Merge branch 'v4l_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: - gspca: add needed delay for I2C traffic for sonixb/sonixj cameras - gspca: add one missing Kinect USB ID - usbvideo: some regression fixes - omap3isp: fix some build issues - videobuf2: fix video output handling - exynos s5p/m5mols: a few regression fixes. * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: [media] uvcvideo: Set error_idx properly for S_EXT_CTRLS failures [media] uvcvideo: Cleanup leftovers of partial revert [media] uvcvideo: Return -EACCES when trying to set a read-only control [media] omap3isp: Don't include <plat/cpu.h> [media] s5p-mfc: Fix interrupt error handling routine [media] s5p-fimc: Fix return value of __fimc_md_create_flite_source_links() [media] m5mols: Fix typo in get_fmt callback [media] v4l: vb2: Set data_offset to 0 for single-plane output buffers [media] [FOR,v3.8] omap3isp: Don't include deleted OMAP plat/ header files [media] gspca_sonixj: Add a small delay after i2c_w1 [media] gspca_sonixb: Properly wait between i2c writes [media] gspca_kinect: add Kinect for Windows USB id
2013-01-23MAINTAINERS: Someone needs to watch over the serial driversGreg Kroah-Hartman
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k fixes from Geert Uytterhoeven: "The asm-generic changeset has been ack'ed by Arnd." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: Wire up finit_module asm-generic/dma-mapping-broken.h: Provide dma_alloc_attrs()/dma_free_attrs() m68k: Provide dma_alloc_attrs()/dma_free_attrs()
2013-01-23Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 Pull arm64 fixes from Catalin Marinas: - ELF coredump fix (more registers dumped than what user space expects) - SUBARCH name generation (s/aarch64/arm64/) * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64: arm64: makefile: fix uname munging when setting ARCH on native machine arm64: elf: fix core dumping to match what glibc expects
2013-01-23USB: EHCI: fix build error in ehci-mxcAlan Stern
This patch (as1643b) fixes a build error in ehci-hcd when compiling for ARM with allmodconfig: drivers/usb/host/ehci-hcd.c:1285:0: warning: "PLATFORM_DRIVER" redefined [enabled by default] drivers/usb/host/ehci-hcd.c:1255:0: note: this is the location of the previous definition drivers/usb/host/ehci-mxc.c:280:31: warning: 'ehci_mxc_driver' defined but not used [-Wunused-variable] drivers/usb/host/ehci-hcd.c:1285:0: warning: "PLATFORM_DRIVER" redefined [enabled by default] drivers/usb/host/ehci-hcd.c:1255:0: note: this is the location of the previous definition The fix is to convert ehci-mxc over to the new "ehci-hcd is a library" scheme so that it can coexist peacefully with the ehci-platform driver. As part of the conversion the ehci_mxc_priv data structure, which was allocated dynamically, is now placed where it belongs: in the private area at the end of struct ehci_hcd. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-23Merge tag 'sound-3.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Only a few small HD-audio fixes: - Addition of new Conexant codec IDs - Two one-liners to add fixups for Realtek codecs - A last-minute regression fix for auto-mute with power-saving mode (regressed since 3.8-rc1)" * tag 'sound-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Fix inconsistent pin states after resume ALSA: hda - Add Conexant CX20755/20756/20757 codec IDs ALSA: hda - Add fixup for Acer AO725 laptop ALSA: hda - Fix mute led for another HP machine
2013-01-23MAINTAINERS: remove meAlan Cox
Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-23ALSA: hda - Fix inconsistent pin states after resumeTakashi Iwai
The commit [26a6cb6c: ALSA: hda - Implement a poll loop for jacks as a module parameter] introduced the polling jack detection code, but it also moved the call of snd_hda_jack_set_dirty_all() in the resume path after resume/init ops call. This caused a regression when the jack state has been changed during power-down (e.g. in the power save mode). Since the driver doesn't probe the new jack state but keeps using the cached value due to no dirty flag, the pin state remains also as if the jack is still plugged. The fix is simply moving snd_hda_jack_set_dirty_all() to the original position. Reported-by: Manolo Díaz <diaz.manolo@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2013-01-23Revert "drivers/misc/ti-st: remove gpio handling"Luciano Coelho
This reverts commit eccf2979b2c034b516e01b8a104c3739f7ef07d1. The reason is that it broke TI WiLink shared transport on Panda. Also, callback functions should not be added to board files anymore, so revert to implementing the power functions in the driver itself. Additionally, changed a variable name ('status' to 'err') so that this revert compiles properly. Cc: stable <stable@vger.kernel.org> [3.7] Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Luciano Coelho <coelho@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-23Merge tag '3.8-pci-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "The most important is a fix for a pciehp deadlock that occurs when unplugging a Thunderbolt adapter. We also applied the same fix to shpchp, removed CONFIG_EXPERIMENTAL dependencies, fixed a pcie_aspm=force problem, and fixed a refcount leak. Details: - Hotplug PCI: pciehp: Use per-slot workqueues to avoid deadlock PCI: shpchp: Make shpchp_wq non-ordered PCI: shpchp: Handle push button event asynchronously PCI: shpchp: Use per-slot workqueues to avoid deadlock - Power management PCI: Allow pcie_aspm=force even when FADT indicates it is unsupported - Misc PCI/AER: pci_get_domain_bus_and_slot() call missing required pci_dev_put() PCI: remove depends on CONFIG_EXPERIMENTAL" * tag '3.8-pci-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: remove depends on CONFIG_EXPERIMENTAL PCI: Allow pcie_aspm=force even when FADT indicates it is unsupported PCI: shpchp: Use per-slot workqueues to avoid deadlock PCI: shpchp: Handle push button event asynchronously PCI: shpchp: Make shpchp_wq non-ordered PCI/AER: pci_get_domain_bus_and_slot() call missing required pci_dev_put() PCI: pciehp: Use per-slot workqueues to avoid deadlock
2013-01-23async: fix __lowest_in_progress()Tejun Heo
Commit 083b804c4d3e ("async: use workqueue for worker pool") made it possible that async jobs are moved from pending to running out-of-order. While pending async jobs will be queued and dispatched for execution in the same order, nothing guarantees they'll enter "1) move self to the running queue" of async_run_entry_fn() in the same order. Before the conversion, async implemented its own worker pool. An async worker, upon being woken up, fetches the first item from the pending list, which kept the executing lists sorted. The conversion to workqueue was done by adding work_struct to each async_entry and async just schedules the work item. The queueing and dispatching of such work items are still in order but now each worker thread is associated with a specific async_entry and moves that specific async_entry to the executing list. So, depending on which worker reaches that point earlier, which is non-deterministic, we may end up moving an async_entry with larger cookie before one with smaller one. This broke __lowest_in_progress(). running->domain may not be properly sorted and is not guaranteed to contain lower cookies than pending list when not empty. Fix it by ensuring sort-inserting to the running list and always looking at both pending and running when trying to determine the lowest cookie. Over time, the async synchronization implementation became quite messy. We better restructure it such that each async_entry is linked to two lists - one global and one per domain - and not move it when execution starts. There's no reason to distinguish pending and running. They behave the same for synchronization purposes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-22Merge tag 'perf-urgent-for-mingo' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf/urgent fixes from Arnaldo Carvalho de Melo: . revert 20b279 - require exclude_guest to use PEBS - kernel side, now older binaries will continue working for things like cycles:pp without needing to pass extra modifiers, from David Ahern. . Fix building from 'make perf-*-src-pkg' tarballs, broken by UAPI, from Sebastian Andrzej Siewior [ Pulling directly, Ingo would normally pull but has been unresponsive ] * tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf tools: Fix building from 'make perf-*-src-pkg' tarballs perf x86: revert 20b279 - require exclude_guest to use PEBS - kernel side
2013-01-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: "Improve the stability of the linux kernel on the parisc architecture" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: sigaltstack doesn't round ss.ss_sp as required parisc: improve ptrace support for gdb single-step parisc: don't claim cpu irqs more than once parisc: avoid undefined shift in cnv_float.h
2013-01-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "This contain a bugfix for CUSE and miscellaneous small fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: remove unused variable in fuse_try_move_page() fuse: make fuse_file_fallocate() static fuse: Move CUSE Kconfig entry from fs/Kconfig into fs/fuse/Kconfig cuse: fix uninitialized variable warnings cuse: do not register multiple devices with identical names cuse: use mutex as registration lock instead of spinlocks
2013-01-22Merge tag 'fixes-for-v3.8-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "Here are some GPIO fixes I stacked up in my GPIO tree: - Remove a bad #include from the Samsung driver - Some Kconfig hazzle for the Samsungs - Skip gpiolib registration on EXYNOS5440 - Don't free the MVEBU label" * tag 'fixes-for-v3.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: mvebu: Don't free chip label memory gpio: samsung: skip gpio lib registration for EXYNOS5440 gpio: samsung: silent build warning for EXYNOS5 SoCs gpio: samsung: fix pinctrl condition for exynos and exynos5440 gpio: samsung: remove inclusion <mach/regs-clock.h>
2013-01-22Merge tag 'f2fs-for-3.8-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: o Support swap file and link generic_file_remap_pages o Enhance the bio streaming flow and free section control o Major bug fix on recovery routine o Minor bug/warning fixes and code cleanups * tag 'f2fs-for-3.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (22 commits) f2fs: use _safe() version of list_for_each f2fs: add comments of start_bidx_of_node f2fs: avoid issuing small bios due to several dirty node pages f2fs: support swapfile f2fs: add remap_pages as generic_file_remap_pages f2fs: add __init to functions in init_f2fs_fs f2fs: fix the debugfs entry creation path f2fs: add global mutex_lock to protect f2fs_stat_list f2fs: remove the blk_plug usage in f2fs_write_data_pages f2fs: avoid redundant time update for parent directory in f2fs_delete_entry f2fs: remove redundant call to set_blocksize in f2fs_fill_super f2fs: move f2fs_balance_fs to punch_hole f2fs: add f2fs_balance_fs in several interfaces f2fs: revisit the f2fs_gc flow f2fs: check return value during recovery f2fs: avoid null dereference in f2fs_acl_from_disk f2fs: initialize newly allocated dnode structure f2fs: update f2fs partition info about SIT/NAT layout f2fs: update f2fs document to reflect SIT/NAT layout correctly f2fs: remove unneeded INIT_LIST_HEAD at few places ...
2013-01-22Merge tag 'vfio-for-v3.8-rc5' of git://github.com/awilliam/linux-vfioLinus Torvalds
Pull vfio fix from Alex Williamson. "vfio-pci: Fix buffer overfill" * tag 'vfio-for-v3.8-rc5' of git://github.com/awilliam/linux-vfio: vfio-pci: Fix buffer overfill
2013-01-22Merge tag 'trace-3.8-rc4-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fix from Steven Rostedt: "Kprobes now uses the function tracer if it can. That is, if a probe is placed on a function mcount/nop location, and the arch supports it, instead of adding a breakpoint, kprobes will register a function callback as that is much more efficient. The function tracer requires to update modules before they run, and uses the module notifier to do so. But if something else in the module notifiers registers a kprobe at one of these locations, before ftrace can get to it, then the system could fail. The function tracer must be initialized early, otherwise module notifiers that probe will only work by chance." * tag 'trace-3.8-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Be first to run code modification on modules
2013-01-22Merge tag 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev Pull libata fixes from Jeff Garzik: 1) ahci: Fix typo that caused erronenous error handling. Thought: I wonder if sparse could have caught this, somehow. 2) ahci: support a slightly odd Enmotus variant 3) core: fix a drive detection problem by correcting the logic by which the DevSlp timing variables are obtained and used. * tag 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: [libata] replace sata_settings with devslp_timing [libata] ahci: Add support for Enmotus Bobcat device. [libata] ahci: Fix lack of command retry after a success error handler.
2013-01-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem bugfixes from James Morris. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: security/device_cgroup: lock assert fails in dev_exception_clean() evm: checking if removexattr is not a NULL
2013-01-22wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED taskOleg Nesterov
wake_up_process() should never wakeup a TASK_STOPPED/TRACED task. Change it to use TASK_NORMAL and add the WARN_ON(). TASK_ALL has no other users, probably can be killed. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-22ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILLOleg Nesterov
putreg() assumes that the tracee is not running and pt_regs_access() can safely play with its stack. However a killed tracee can return from ptrace_stop() to the low-level asm code and do RESTORE_REST, this means that debugger can actually read/modify the kernel stack until the tracee does SAVE_REST again. set_task_blockstep() can race with SIGKILL too and in some sense this race is even worse, the very fact the tracee can be woken up breaks the logic. As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace() call, this ensures that nobody can ever wakeup the tracee while the debugger looks at it. Not only this fixes the mentioned problems, we can do some cleanups/simplifications in arch_ptrace() paths. Probably ptrace_unfreeze_traced() needs more callers, for example it makes sense to make the tracee killable for oom-killer before access_process_vm(). While at it, add the comment into may_ptrace_stop() to explain why ptrace_stop() still can't rely on SIGKILL and signal_pending_state(). Reported-by: Salman Qazi <sqazi@google.com> Reported-by: Suleiman Souhlal <suleiman@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>