summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2015-02-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: "Bite-sized chunks this time, to avoid the MTA ratelimiting woes. - fs/notify updates - ocfs2 - some of MM" That laconic "some MM" is mainly the removal of remap_file_pages(), which is a big simplification of the VM, and which gets rid of a *lot* of random cruft and special cases because we no longer support the non-linear mappings that it used. From a user interface perspective, nothing has changed, because the remap_file_pages() syscall still exists, it's just done by emulating the old behavior by creating a lot of individual small mappings instead of one non-linear one. The emulation is slower than the old "native" non-linear mappings, but nobody really uses or cares about remap_file_pages(), and simplifying the VM is a big advantage. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (78 commits) memcg: zap memcg_slab_caches and memcg_slab_mutex memcg: zap memcg_name argument of memcg_create_kmem_cache memcg: zap __memcg_{charge,uncharge}_slab mm/page_alloc.c: place zone_id check before VM_BUG_ON_PAGE check mm: hugetlb: fix type of hugetlb_treat_as_movable variable mm, hugetlb: remove unnecessary lower bound on sysctl handlers"? mm: memory: merge shared-writable dirtying branches in do_wp_page() mm: memory: remove ->vm_file check on shared writable vmas xtensa: drop _PAGE_FILE and pte_file()-related helpers x86: drop _PAGE_FILE and pte_file()-related helpers unicore32: drop pte_file()-related helpers um: drop _PAGE_FILE and pte_file()-related helpers tile: drop pte_file()-related helpers sparc: drop pte_file()-related helpers sh: drop _PAGE_FILE and pte_file()-related helpers score: drop _PAGE_FILE and pte_file()-related helpers s390: drop pte_file()-related helpers parisc: drop _PAGE_FILE and pte_file()-related helpers openrisc: drop _PAGE_FILE and pte_file()-related helpers nios2: drop _PAGE_FILE and pte_file()-related helpers ...
2015-02-10Merge tag 'pm+acpi-3.20-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: "We have a few new features this time, including a new SFI-based cpufreq driver, a new devfreq driver for Tegra Activity Monitor, a new devfreq class for providing its governors with raw utilization data and a new ACPI driver for AMD SoCs. Still, the majority of changes here are reworks of existing code to make it more straightforward or to prepare it for implementing new features on top of it. The primary example is the rework of ACPI resources handling from Jiang Liu, Thomas Gleixner and Lv Zheng with support for IOAPIC hotplug implemented on top of it, but there is quite a number of changes of this kind in the cpufreq core, ACPICA, ACPI EC driver, ACPI processor driver and the generic power domains core code too. The most active developer is Viresh Kumar with his cpufreq changes. Specifics: - Rework of the core ACPI resources parsing code to fix issues in it and make using resource offsets more convenient and consolidation of some resource-handing code in a couple of places that have grown analagous data structures and code to cover the the same gap in the core (Jiang Liu, Thomas Gleixner, Lv Zheng). - ACPI-based IOAPIC hotplug support on top of the resources handling rework (Jiang Liu, Yinghai Lu). - ACPICA update to upstream release 20150204 including an interrupt handling rework that allows drivers to install raw handlers for ACPI GPEs which then become entirely responsible for the given GPE and the ACPICA core code won't touch it (Lv Zheng, David E Box, Octavian Purdila). - ACPI EC driver rework to fix several concurrency issues and other problems related to events handling on top of the ACPICA's new support for raw GPE handlers (Lv Zheng). - New ACPI driver for AMD SoCs analogous to the LPSS (Low-Power Subsystem) driver for Intel chips (Ken Xue). - Two minor fixes of the ACPI LPSS driver (Heikki Krogerus, Jarkko Nikula). - Two new blacklist entries for machines (Samsung 730U3E/740U3E and 510R) where the native backlight interface doesn't work correctly while the ACPI one does (Hans de Goede). - Rework of the ACPI processor driver's handling of idle states to make the code more straightforward and less bloated overall (Rafael J Wysocki). - Assorted minor fixes related to ACPI and SFI (Andreas Ruprecht, Andy Shevchenko, Hanjun Guo, Jan Beulich, Rafael J Wysocki, Yaowei Bai). - PCI core power management modification to avoid resuming (some) runtime-suspended devices during system suspend if they are in the right states already (Rafael J Wysocki). - New SFI-based cpufreq driver for Intel platforms using SFI (Srinidhi Kasagar). - cpufreq core fixes, cleanups and simplifications (Viresh Kumar, Doug Anderson, Wolfram Sang). - SkyLake CPU support and other updates for the intel_pstate driver (Kristen Carlson Accardi, Srinivas Pandruvada). - cpufreq-dt driver cleanup (Markus Elfring). - Init fix for the ARM big.LITTLE cpuidle driver (Sudeep Holla). - Generic power domains core code fixes and cleanups (Ulf Hansson). - Operating Performance Points (OPP) core code cleanups and kernel documentation update (Nishanth Menon). - New dabugfs interface to make the list of PM QoS constraints available to user space (Nishanth Menon). - New devfreq driver for Tegra Activity Monitor (Tomeu Vizoso). - New devfreq class (devfreq_event) to provide raw utilization data to devfreq governors (Chanwoo Choi). - Assorted minor fixes and cleanups related to power management (Andreas Ruprecht, Krzysztof Kozlowski, Rickard Strandqvist, Pavel Machek, Todd E Brandt, Wonhong Kwon). - turbostat updates (Len Brown) and cpupower Makefile improvement (Sriram Raghunathan)" * tag 'pm+acpi-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (151 commits) tools/power turbostat: relax dependency on APERF_MSR tools/power turbostat: relax dependency on invariant TSC Merge branch 'pci/host-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into acpi-resources tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS tools/power turbostat: relax dependency on root permission ACPI / video: Add disable_native_backlight quirk for Samsung 510R ACPI / PM: Remove unneeded nested #ifdef USB / PM: Remove unneeded #ifdef and associated dead code intel_pstate: provide option to only use intel_pstate with HWP ACPI / EC: Add GPE reference counting debugging messages ACPI / EC: Add query flushing support ACPI / EC: Refine command storm prevention support ACPI / EC: Add command flushing support. ACPI / EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag ACPI: add AMD ACPI2Platform device support for x86 system ACPI / table: remove duplicate NULL check for the handler of acpi_table_parse() ACPI / EC: Update revision due to raw handler mode. ACPI / EC: Reduce ec_poll() by referencing the last register access timestamp. ACPI / EC: Fix several GPE handling issues by deploying ACPI_GPE_DISPATCH_RAW_HANDLER mode. ACPICA: Events: Enable APIs to allow interrupt/polling adaptive request based GPE handling model ...
2015-02-10mm, hugetlb: remove unnecessary lower bound on sysctl handlers"?Andrey Ryabinin
Commit ed4d4902ebdd ("mm, hugetlb: remove hugetlb_zero and hugetlb_infinity") replaced 'unsigned long hugetlb_zero' with 'int zero' leading to out-of-bounds access in proc_doulongvec_minmax(). Use '.extra1 = NULL' instead of '.extra1 = &zero'. Passing NULL is equivalent to passing minimal value, which is 0 for unsigned types. Fixes: ed4d4902ebdd ("mm, hugetlb: remove hugetlb_zero and hugetlb_infinity") Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Suggested-by: Manfred Spraul <manfred@colorfullife.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10rmap: drop support of non-linear mappingsKirill A. Shutemov
We don't create non-linear mappings anymore. Let's drop code which handles them in rmap. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10Merge branches 'pm-sleep' and 'pm-runtime'Rafael J. Wysocki
* pm-sleep: PM / hibernate: exclude freed pages from allocated pages printout PM / sleep: export suspend_resume trace event PM / sleep: Mention async suspend in PM_TRACE documentation PM / hibernate: Remove unused function * pm-runtime: ACPI / PM: Remove unneeded nested #ifdef USB / PM: Remove unneeded #ifdef and associated dead code
2015-02-10Merge branches 'pm-qos', 'pm-opp' and 'pm-devfreq'Rafael J. Wysocki
* pm-qos: PM / QoS: Use lockdep asserts to find missing hold of power.lock PM / QoS: Add debugfs support to view the list of constraints * pm-opp: PM / OPP: Assert RCU lock in exported functions PM / OPP: Update kernel documentation PM / OPP: Ensure consistent naming of static functions PM / OPP: export dev_pm_opp_get_notifier * pm-devfreq: PM / devfreq: event: Add documentation for exynos-ppmu devfreq-event driver devfreq: Fix build break of devfreq-event class PM / devfreq: event: Add devfreq_event class PM / devfreq: tegra: add devfreq driver for Tegra Activity Monitor
2015-02-10Merge branch 'acpi-resources'Rafael J. Wysocki
* acpi-resources: (23 commits) Merge branch 'pci/host-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into acpi-resources x86/irq, ACPI: Implement ACPI driver to support IOAPIC hotplug ACPI: Add interfaces to parse IOAPIC ID for IOAPIC hotplug x86/PCI: Refine the way to release PCI IRQ resources x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation x86/PCI: Fix the range check for IO resources PCI: Use common resource list management code instead of private implementation resources: Move struct resource_list_entry from ACPI into resource core ACPI: Introduce helper function acpi_dev_filter_resource_type() ACPI: Add field offset to struct resource_list_entry ACPI: Translate resource into master side address for bridge window resources ACPI: Return translation offset when parsing ACPI address space resources ACPI: Enforce stricter checks for address space descriptors ACPI: Set flag IORESOURCE_UNSET for unassigned resources ACPI: Normalize return value of resource parser functions ACPI: Fix a bug in parsing ACPI Memory24 resource ACPI: Add prefetch decoding to the address space parser ACPI: Move the window flag logic to the combined parser ACPI: Unify the parsing of address_space and ext_address_space ACPI: Let the parser return false for disabled resources ...
2015-02-10Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Ingo Molnar: "The main changes in this cycle were: - rework hrtimer expiry calculation in hrtimer_interrupt(): the previous code had a subtle bug where expiry caching would miss an expiry, resulting in occasional bogus (late) expiry of hrtimers. - continuing Y2038 fixes - ktime division optimization - misc smaller fixes and cleanups" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Make __hrtimer_get_next_event() static rtc: Convert rtc_set_ntp_time() to use timespec64 rtc: Remove redundant rtc_valid_tm() from rtc_hctosys() rtc: Modify rtc_hctosys() to address y2038 issues rtc: Update rtc-dev to use y2038-safe time interfaces rtc: Update interface.c to use y2038-safe time interfaces time: Expose get_monotonic_boottime64 for in-kernel use time: Expose getboottime64 for in-kernel uses ktime: Optimize ktime_divns for constant divisors hrtimer: Prevent stale expiry time in hrtimer_interrupt() ktime.h: Introduce ktime_ms_delta
2015-02-10Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main scheduler changes in this cycle were: - various sched/deadline fixes and enhancements - rescheduling latency fixes/cleanups - rework the rq->clock code to be more consistent and more robust. - minor micro-optimizations - ->avg.decay_count fixes - add a stack overflow check to might_sleep() - idle-poll handler fix, possibly resulting in power savings - misc smaller updates and fixes" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/Documentation: Remove unneeded word sched/wait: Introduce wait_on_bit_timeout() sched: Pull resched loop to __schedule() callers sched/deadline: Remove cpu_active_mask from cpudl_find() sched: Fix hrtick_start() on UP sched/deadline: Avoid pointless __setscheduler() sched/deadline: Fix stale yield state sched/deadline: Fix hrtick for a non-leftmost task sched/deadline: Modify cpudl::free_cpus to reflect rd->online sched/idle: Add missing checks to the exit condition of cpu_idle_poll() sched: Fix missing preemption opportunity sched/rt: Reduce rq lock contention by eliminating locking of non-feasible target sched/debug: Print rq->clock_task sched/core: Rework rq->clock update skips sched/core: Validate rq_clock*() serialization sched/core: Remove check of p->sched_class sched/fair: Fix sched_entity::avg::decay_count initialization sched/debug: Fix potential call to __ffs(0) in sched_show_task() sched/debug: Check for stack overflow in ___might_sleep() sched/fair: Fix the dealing with decay_count in __synchronize_entity_decay()
2015-02-09Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side changes: - AMD range breakpoints support: Extend breakpoint tools and core to support address range through perf event with initial backend support for AMD extended breakpoints. The syntax is: perf record -e mem:addr/len:type For example set write breakpoint from 0x1000 to 0x1200 (0x1000 + 512) perf record -e mem:0x1000/512:w - event throttling/rotating fixes - various event group handling fixes, cleanups and general paranoia code to be more robust against bugs in the future. - kernel stack overhead fixes User-visible tooling side changes: - Show precise number of samples in at the end of a 'record' session, if processing build ids, since we will then traverse the whole perf.data file and see all the PERF_RECORD_SAMPLE records, otherwise stop showing the previous off-base heuristicly counted number of "samples" (Namhyung Kim). - Support to read compressed module from build-id cache (Namhyung Kim) - Enable sampling loads and stores simultaneously in 'perf mem' (Stephane Eranian) - 'perf diff' output improvements (Namhyung Kim) - Fix error reporting for evsel pgfault constructor (Arnaldo Carvalho de Melo) Tooling side infrastructure changes: - Cache eh/debug frame offset for dwarf unwind (Namhyung Kim) - Support parsing parameterized events (Cody P Schafer) - Add support for IP address formats in libtraceevent (David Ahern) Plus other misc fixes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits) perf: Decouple unthrottling and rotating perf: Drop module reference on event init failure perf: Use POLLIN instead of POLL_IN for perf poll data in flag perf: Fix put_event() ctx lock perf: Fix move_group() order perf: Fix event->ctx locking perf: Add a bit of paranoia perf symbols: Convert lseek + read to pread perf tools: Use perf_data_file__fd() consistently perf symbols: Support to read compressed module from build-id cache perf evsel: Set attr.task bit for a tracking event perf header: Set header version correctly perf record: Show precise number of samples perf tools: Do not use __perf_session__process_events() directly perf callchain: Cache eh/debug frame offset for dwarf unwind perf tools: Provide stub for missing pthread_attr_setaffinity_np perf evsel: Don't rely on malloc working for sz 0 tools lib traceevent: Add support for IP address formats perf ui/tui: Show fatal error message only if exists perf tests: Fix typo in sample-parsing.c ...
2015-02-09Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking updates from Ingo Molnar: "The main changes are: - mutex, completions and rtmutex micro-optimizations - lock debugging fix - various cleanups in the MCS and the futex code" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Optimize setting task running after being blocked locking/rwsem: Use task->state helpers sched/completion: Add lock-free checking of the blocking case sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state locking/mutex: Explicitly mark task as running after wakeup futex: Fix argument handling in futex_lock_pi() calls doc: Fix misnamed FUTEX_CMP_REQUEUE_PI op constants locking/Documentation: Update code path softirq/preempt: Add missing current->preempt_disable_ip update locking/osq: No need for load/acquire when acquire-polling locking/mcs: Better differentiate between MCS variants locking/mutex: Introduce ww_mutex_set_context_slowpath() locking/mutex: Move MCS related comments to proper location locking/mutex: Checking the stamp is WW only
2015-02-09Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main RCU changes in this cycle are: - Documentation updates. - Miscellaneous fixes. - Preemptible-RCU fixes, including fixing an old bug in the interaction of RCU priority boosting and CPU hotplug. - SRCU updates. - RCU CPU stall-warning updates. - RCU torture-test updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) rcu: Initialize tiny RCU stall-warning timeouts at boot rcu: Fix RCU CPU stall detection in tiny implementation rcu: Add GP-kthread-starvation checks to CPU stall warnings rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors rcu: Optionally run grace-period kthreads at real-time priority ksoftirqd: Use new cond_resched_rcu_qs() function ksoftirqd: Enable IRQs and call cond_resched() before poking RCU rcutorture: Add more diagnostics in rcu_barrier() test failure case torture: Flag console.log file to prevent holdovers from earlier runs torture: Add "-enable-kvm -soundhw pcspk" to qemu command line rcutorture: Handle different mpstat versions rcutorture: Check from beginning to end of grace period rcu: Remove redundant rcu_batches_completed() declaration rcutorture: Drop rcu_torture_completed() and friends rcu: Provide rcu_batches_completed_sched() for TINY_RCU rcutorture: Use unsigned for Reader Batch computations rcutorture: Make build-output parsing correctly flag RCU's warnings rcu: Make _batches_completed() functions return unsigned long rcutorture: Issue warnings on close calls due to Reader Batch blows documentation: Fix smp typo in memory-barriers.txt ...
2015-02-06Merge branches 'timers-urgent-for-linus' and 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer and x86 fix from Ingo Molnar: "A CLOCK_TAI early expiry fix and an x86 microcode driver oops fix" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Fix incorrect tai offset calculation for non high-res timer systems * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode: Return error from driver init code when loader is disabled
2015-02-06Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Misc fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Fix deadline parameter modification handling sched/wait: Remove might_sleep() from wait_event_cmd() sched: Fix crash if cpuset_cpumask_can_shrink() is passed an empty cpumask sched/fair: Avoid using uninitialized variable in preferred_group_nid()
2015-02-06Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core kernel fixes from Ingo Molnar: "Two liblockdep fixes and a CPU hotplug race fix" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools/liblockdep: don't include host headers tools/liblockdep: ignore generated .so file smpboot: Add missing get_online_cpus() in smpboot_register_percpu_thread()
2015-02-05resources: Move struct resource_list_entry from ACPI into resource coreJiang Liu
Currently ACPI, PCI and pnp all implement the same resource list management with different data structure. We need to transfer from one data structure into another when passing resources from one subsystem into another subsystem. So move struct resource_list_entry from ACPI into resource core and rename it as resource_entry, then it could be reused by different subystems and avoid the data structure conversion. Introduce dedicated header file resource_ext.h instead of embedding it into ioport.h to avoid header file inclusion order issues. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Acked-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-05hrtimer: Fix incorrect tai offset calculation for non high-res timer systemsJohn Stultz
I noticed some CLOCK_TAI timer test failures on one of my less-frequently used configurations. And after digging in I found in 76f4108892d9 (Cleanup hrtimer accessors to the timekepeing state), the hrtimer_get_softirq_time tai offset calucation was incorrectly rewritten, as the tai offset we return shold be from CLOCK_MONOTONIC, and not CLOCK_REALTIME. This results in CLOCK_TAI timers expiring early on non-highres capable machines. This patch fixes the issue, calculating the tai time properly from the monotonic base. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable <stable@vger.kernel.org> # 3.17+ Link: http://lkml.kernel.org/r/1423097126-10236-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04perf: Decouple unthrottling and rotatingMark Rutland
Currently the adjusments made as part of perf_event_task_tick() use the percpu rotation lists to iterate over any active PMU contexts, but these are not used by the context rotation code, having been replaced by separate (per-context) hrtimer callbacks. However, some manipulation of the rotation lists (i.e. removal of contexts) has remained in perf_rotate_context(). This leads to the following issues: * Contexts are not always removed from the rotation lists. Removal of PMUs which have been placed in rotation lists, but have not been removed by a hrtimer callback can result in corruption of the rotation lists (when memory backing the context is freed). This has been observed to result in hangs when PMU drivers built as modules are inserted and removed around the creation of events for said PMUs. * Contexts which do not require rotation may be removed from the rotation lists as a result of a hrtimer, and will not be considered by the unthrottling code in perf_event_task_tick. This patch fixes the issue by updating the rotation ist when events are scheduled in/out, ensuring that each rotation list stays in sync with the HW state. As each event holds a refcount on the module of its PMU, this ensures that when a PMU module is unloaded none of its CPU contexts can be in a rotation list. By maintaining a list of perf_event_contexts rather than perf_event_cpu_contexts, we don't need separate paths to handle the cpu and task contexts, which also makes the code a little simpler. As the rotation_list variables are not used for rotation, these are renamed to active_ctx_list, which better matches their current function. perf_pmu_rotate_{start,stop} are renamed to perf_pmu_ctx_{activate,deactivate}. Reported-by: Johannes Jensen <johannes.jensen@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Will Deacon <Will.Deacon@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150129134511.GR17721@leverpostej Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04perf: Drop module reference on event init failureMark Rutland
When initialising an event, perf_init_event will call try_module_get() to ensure that the PMU's module cannot be removed for the lifetime of the event, with __free_event() dropping the reference when the event is finally destroyed. If something fails after the event has been initialised, but before the event is installed, perf_event_alloc will drop the reference on the module. However, if we fail to initialise an event for some reason (e.g. we ask an uncore PMU to perform sampling, and it refuses to initialise the event), we do not drop the refcount. If we try to open such a bogus event without a precise IDR type, we will loop over each PMU in the pmus list, incrementing each of their refcounts without decrementing them. This patch adds a module_put when pmu->event_init(event) fails, ensuring that the refcounts are balanced in failure cases. As the innards of the precise and search based initialisation look very similar, this logic is hoisted out into a new helper function. While the early return for the failed try_module_get is removed from the search case, this is handled by the remaining return when ret is not -ENOENT. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1420642611-22667-1-git-send-email-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04perf: Use POLLIN instead of POLL_IN for perf poll data in flagJiri Olsa
Currently we flag available data (via poll syscall) on perf fd with POLL_IN macro, which is normally used for SIGIO interface. We've been lucky, because POLLIN (0x1) is subset of POLL_IN (0x20001) and sys_poll (do_pollfd function) cut the extra bit out (0x20000). Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422467678-22341-1-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04perf: Fix put_event() ctx lockPeter Zijlstra
So what I suspect; but I'm in zombie mode today it seems; is that while I initially thought that it was impossible for ctx to change when refcount dropped to 0, I now suspect its possible. Note that until perf_remove_from_context() the event is still active and visible on the lists. So a concurrent sys_perf_event_open() from another task into this task can race. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: mark.rutland@arm.com Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150129134434.GB26304@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04perf: Fix move_group() orderPeter Zijlstra (Intel)
Jiri reported triggering the new WARN_ON_ONCE in event_sched_out over the weekend: event_sched_out.isra.79+0x2b9/0x2d0 group_sched_out+0x69/0xc0 ctx_sched_out+0x106/0x130 task_ctx_sched_out+0x37/0x70 __perf_install_in_context+0x70/0x1a0 remote_function+0x48/0x60 generic_exec_single+0x15b/0x1d0 smp_call_function_single+0x67/0xa0 task_function_call+0x53/0x80 perf_install_in_context+0x8b/0x110 I think the below should cure this; if we install a group leader it will iterate the (still intact) group list and find its siblings and try and install those too -- even though those still have the old event->ctx -- in the new ctx. Upon installing the first group sibling we'd try and schedule out the group and trigger the above warn. Fix this by installing the group leader last, installing siblings would have no effect, they're not reachable through the group lists and therefore we don't schedule them. Also delay resetting the state until we're absolutely sure the events are quiescent. Reported-by: Jiri Olsa <jolsa@redhat.com> Reported-by: vincent.weaver@maine.edu Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150126162639.GA21418@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04perf: Fix event->ctx lockingPeter Zijlstra
There have been a few reported issues wrt. the lack of locking around changing event->ctx. This patch tries to address those. It avoids the whole rwsem thing; and while it appears to work, please give it some thought in review. What I did fail at is sensible runtime checks on the use of event->ctx, the RCU use makes it very hard. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.209535886@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04perf: Add a bit of paranoiaPeter Zijlstra
Add a few WARN()s to catch things that should never happen. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.150481799@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04Merge tag 'v3.19-rc7' into perf/core, to merge fixes before applying new changesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04locking/rtmutex: Optimize setting task running after being blockedDavidlohr Bueso
We explicitly mark the task running after returning from a __rt_mutex_slowlock() call, which does the actual sleeping via wait-wake-trylocking. As such, this patch does two things: (1) refactors the code so that setting current to TASK_RUNNING is done by __rt_mutex_slowlock(), and not by the callers. The downside to this is that it becomes a bit unclear when at what point we block. As such I've added a comment that the task blocks when calling __rt_mutex_slowlock() so readers can figure out when it is running again. (2) relaxes setting current's state through __set_current_state(), instead of it's more expensive barrier alternative. There was no need for the implied barrier as we're obviously not planning on blocking. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422857784.18096.1.camel@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04locking/rwsem: Use task->state helpersDavidlohr Bueso
Call __set_task_state() instead of assigning the new state directly. These interfaces also aid CONFIG_DEBUG_ATOMIC_SLEEP environments, keeping track of who last changed the state. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Jason Low <jason.low2@hp.com> Cc: Michel Lespinasse <walken@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422257769-14083-2-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/completion: Add lock-free checking of the blocking caseNicholas Mc Guire
The "thread would block" case can be checked without grabbing ->wait.lock. [ If the check does not return early then grab the lock and recheck. A memory barrier is not needed as complete() and complete_all() imply a barrier. The ACCESS_ONCE() is needed for calls in a loop that, if inlined, could optimize out the re-fetching of x->done. ] Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422013307-13200-1-git-send-email-der.herr@hofr.at Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/completion: Remove unnecessary ->wait.lock serialization when reading ↵Nicholas Mc Guire
completion state Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1421467534-22834-1-git-send-email-der.herr@hofr.at Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04locking/mutex: Explicitly mark task as running after wakeupDavidlohr Bueso
By the time we wake up and get the lock after being asleep in the slowpath, we better be running. As good practice, be explicit about this and avoid any mischief. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1421717961.4903.11.camel@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04Merge tag 'v3.19-rc7' into locking/core, to refresh the branch before ↵Ingo Molnar
applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/Documentation: Remove unneeded wordSharon Dvir
The second 'mutex' shouldn't be there, it can't be about the mutex, as the mutex can't be freed, but unlocked, the memory where the mutex resides however, can be freed. Signed-off-by: Sharon Dvir <sharon.dvir1@mail.huji.ac.il> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422827252-31363-1-git-send-email-sharon.dvir1@mail.huji.ac.il Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched: Pull resched loop to __schedule() callersFrederic Weisbecker
__schedule() disables preemption during its job and re-enables it afterward without doing a preemption check to avoid recursion. But if an event happens after the context switch which requires rescheduling, we need to check again if a task of a higher priority needs the CPU. A preempt irq can raise such a situation. To handle that, __schedule() loops on need_resched(). But preempt_schedule_*() functions, which call __schedule(), also loop on need_resched() to handle missed preempt irqs. Hence we end up with the same loop happening twice. Lets simplify that by attributing the need_resched() loop responsibility to all __schedule() callers. There is a risk that the outer loop now handles reschedules that used to be handled by the inner loop with the added overhead of caller details (inc/dec of PREEMPT_ACTIVE, irq save/restore) but assuming those inner rescheduling loop weren't too frequent, this shouldn't matter. Especially since the whole preemption path is now losing one loop in any case. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1422404652-29067-2-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/deadline: Remove cpu_active_mask from cpudl_find()Xunlei Pang
cpu_active_mask is rarely changed (only on hotplug), so remove this operation to gain a little performance. If there is a change in cpu_active_mask, rq_online_dl() and rq_offline_dl() should take care of it normally, so cpudl::free_cpus carries enough information for us. For the rare case when a task is put onto a dying cpu (which rq_offline_dl() can't handle in a timely fashion), it will be handled through _cpu_down()->...->multi_cpu_stop()->migration_call() ->migrate_tasks(), preventing the task from hanging on the dead cpu. Cc: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> [peterz: changelog] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1421642980-10045-2-git-send-email-pang.xunlei@linaro.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched: Fix hrtick_start() on UPWanpeng Li
The commit 177ef2a6315e ("sched/deadline: Fix a precision problem in the microseconds range") forgot to change the UP version of hrtick_start(), do so now. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Fixes: 177ef2a6315e ("sched/deadline: Fix a precision problem in the microseconds range") [ Fixed the changelog. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1416962647-76792-7-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/deadline: Avoid pointless __setscheduler()Wanpeng Li
There is no need to dequeue/enqueue and push/pull if there are no scheduling parameters changed for the DL class. Both fair and RT classes already check if parameters changed for them to avoid unnecessary overhead. This patch add the parameters changed test for the DL class in order to reduce overhead. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> [ Fixed up the changelog. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1416962647-76792-5-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/deadline: Fix stale yield statePeter Zijlstra
When we fail to start the deadline timer in update_curr_dl(), we forget to clear ->dl_yielded, resulting in wrecked time keeping. Since the natural place to clear both ->dl_yielded and ->dl_throttled is in replenish_dl_entity(); both are after all waiting for that event; make it so. Luckily since 67dfa1b756f2 ("sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()") the task_on_rq_queued() condition in dl_task_timer() must be true, and can therefore call enqueue_task_dl() unconditionally. Reported-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1416962647-76792-4-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/deadline: Fix hrtick for a non-leftmost taskWanpeng Li
After update_curr_dl() the current task might not be the leftmost task anymore. In that case do not start a new hrtick for it. In this case NEED_RESCHED will be set and the next schedule will start the hrtick for the new task if and when appropriate. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Acked-by: Juri Lelli <juri.lelli@arm.com> [ Rewrote the changelog and comment. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1416962647-76792-2-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04Merge branch 'sched/urgent' into sched/core, to merge fixes before applying ↵Ingo Molnar
new patches Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/deadline: Fix deadline parameter modification handlingPeter Zijlstra
Commit 67dfa1b756f2 ("sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()") removed the hrtimer_try_cancel() function call out from init_dl_task_timer(), which gets called from __setparam_dl(). The result is that we can now re-init the timer while its active -- this is bad and corrupts timer state. Furthermore; changing the parameters of an active deadline task is tricky in that you want to maintain guarantees, while immediately effective change would allow one to circumvent the CBS guarantees -- this too is bad, as one (bad) task should not be able to affect the others. Rework things to avoid both problems. We only need to initialize the timer once, so move that to __sched_fork() for new tasks. Then make sure __setparam_dl() doesn't affect the current running state but only updates the parameters used to calculate the next scheduling period -- this guarantees the CBS functions as expected (albeit slightly pessimistic). This however means we need to make sure __dl_clear_params() needs to reset the active state otherwise new (and tasks flipping between classes) will not properly (re)compute their first instance. Todo: close class flipping CBS hole. Todo: implement delayed BW release. Reported-by: Luca Abeni <luca.abeni@unitn.it> Acked-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Luca Abeni <luca.abeni@unitn.it> Fixes: 67dfa1b756f2 ("sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150128140803.GF23038@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-03PM / hibernate: exclude freed pages from allocated pages printoutWonhong Kwon
hibernate_preallocate_memory() prints out that how many pages are allocated, but it doesn't take into consideration the pages freed by free_unnecessary_pages(). Therefore, it always shows the count more than actually allocated. Signed-off-by: Wonhong Kwon <wonhong.kwon@lge.com> [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-01sched: don't cause task state changes in nested sleep debuggingLinus Torvalds
Commit 8eb23b9f35aa ("sched: Debug nested sleeps") added code to report on nested sleep conditions, which we generally want to avoid because the inner sleeping operation can re-set the thread state to TASK_RUNNING, but that will then cause the outer sleep loop not actually sleep when it calls schedule. However, that's actually valid traditional behavior, with the inner sleep being some fairly rare case (like taking a sleeping lock that normally doesn't actually need to sleep). And the debug code would actually change the state of the task to TASK_RUNNING internally, which makes that kind of traditional and working code not work at all, because now the nested sleep doesn't just sometimes cause the outer one to not block, but will cause it to happen every time. In particular, it will cause the cardbus kernel daemon (pccardd) to basically busy-loop doing scheduling, converting a laptop into a heater, as reported by Bruno Prémont. But there may be other legacy uses of that nested sleep model in other drivers that are also likely to never get converted to the new model. This fixes both cases: - don't set TASK_RUNNING when the nested condition happens (note: even if WARN_ONCE() only _warns_ once, the return value isn't whether the warning happened, but whether the condition for the warning was true. So despite the warning only happening once, the "if (WARN_ON(..))" would trigger for every nested sleep. - in the cases where we knowingly disable the warning by using "sched_annotate_sleep()", don't change the task state (that is used for all core scheduling decisions), instead use '->task_state_change' that is used for the debugging decision itself. (Credit for the second part of the fix goes to Oleg Nesterov: "Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the suggested change to use 'task_state_change' as part of the test) Reported-and-bisected-by: Bruno Prémont <bonbons@linux-vserver.org> Tested-by: Rafael J Wysocki <rjw@rjwysocki.net> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>, Cc: Ilya Dryomov <ilya.dryomov@inktank.com>, Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com>, Cc: Davidlohr Bueso <dave@stgolabs.net>, Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-30Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, but also an event groups fix, two PMU driver fixes and a CPU model variant addition" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Tighten (and fix) the grouping condition perf/x86/intel: Add model number for Airmont perf/rapl: Fix crash in rapl_scale() perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization perf probe: Fix probing kretprobes perf symbols: Introduce 'for' method to iterate over the symbols with a given name perf probe: Do not rely on map__load() filter to find symbols perf symbols: Introduce method to iterate symbols ordered by name perf symbols: Return the first entry with a given name in find_by_name method perf annotate: Fix memory leaks in LOCK handling perf annotate: Handle ins parsing failures perf scripting perl: Force to use stdbool perf evlist: Remove extraneous 'was' on error message
2015-01-30sched/deadline: Modify cpudl::free_cpus to reflect rd->onlineXunlei Pang
Currently, cpudl::free_cpus contains all CPUs during init, see cpudl_init(). When calling cpudl_find(), we have to add rd->span to avoid selecting the cpu outside the current root domain, because cpus_allowed cannot be depended on when performing clustered scheduling using the cpuset, see find_later_rq(). This patch adds cpudl_set_freecpu() and cpudl_clear_freecpu() for changing cpudl::free_cpus when doing rq_online_dl()/rq_offline_dl(), so we can avoid the rd->span operation when calling cpudl_find() in find_later_rq(). Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1421642980-10045-1-git-send-email-pang.xunlei@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-30sched/idle: Add missing checks to the exit condition of cpu_idle_poll()Preeti U Murthy
cpu_idle_poll() is entered into when either the cpu_idle_force_poll is set or tick_check_broadcast_expired() returns true. The exit condition from cpu_idle_poll() is tif_need_resched(). However this does not take into account scenarios where cpu_idle_force_poll changes or tick_check_broadcast_expired() returns false, without setting the resched flag. So a cpu will be caught in cpu_idle_poll() needlessly, thereby wasting power. Add an explicit check on cpu_idle_force_poll and tick_check_broadcast_expired() to the exit condition of cpu_idle_poll() to avoid this. Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150121105655.15279.59626.stgit@preeti.in.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-30sched: Fix missing preemption opportunityFrederic Weisbecker
If an interrupt fires in cond_resched(), between the call to __schedule() and the PREEMPT_ACTIVE count decrementation, and that interrupt sets TIF_NEED_RESCHED, the call to preempt_schedule_irq() will be ignored due to the PREEMPT_ACTIVE count. This kind of scenario, with irq preemption being delayed because it's interrupting a preempt-disabled area, is usually fixed up after preemption is re-enabled back with an explicit call to preempt_schedule(). This is what preempt_enable() does but a raw preempt count decrement as performed by __preempt_count_sub(PREEMPT_ACTIVE) doesn't handle delayed preemption check. Therefore when such a race happens, the rescheduling is going to be delayed until the next scheduler or preemption entrypoint. This can be a problem for scheduler latency sensitive workloads. Lets fix that by consolidating cond_resched() with preempt_schedule() internals. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Ingo Molnar <mingo@kernel.org> Original-patch-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1421946484-9298-1-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-30sched/rt: Reduce rq lock contention by eliminating locking of non-feasible ↵Tim Chen
target This patch adds checks that prevens futile attempts to move rt tasks to a CPU with active tasks of equal or higher priority. This reduces run queue lock contention and improves the performance of a well known OLTP benchmark by 0.7%. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Shawn Bohrer <sbohrer@rgmadvisors.com> Cc: Suruchi Kadu <suruchi.a.kadu@intel.com> Cc: Doug Nelson<doug.nelson@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1421430374.2399.27.camel@schen9-desk2.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-30Merge branch 'sched/urgent' into sched/coreIngo Molnar
Merge all pending fixes and refresh the tree, before applying new changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-30PM / sleep: export suspend_resume trace eventTodd E Brandt
Export the suspend_resume tracepoint so it can be used in loadable modules. Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-28Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>