summaryrefslogtreecommitdiff
path: root/include/trace
AgeCommit message (Collapse)Author
2013-05-05Merge branch 'timers-nohz-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull 'full dynticks' support from Ingo Molnar: "This tree from Frederic Weisbecker adds a new, (exciting! :-) core kernel feature to the timer and scheduler subsystems: 'full dynticks', or CONFIG_NO_HZ_FULL=y. This feature extends the nohz variable-size timer tick feature from idle to busy CPUs (running at most one task) as well, potentially reducing the number of timer interrupts significantly. This feature got motivated by real-time folks and the -rt tree, but the general utility and motivation of full-dynticks runs wider than that: - HPC workloads get faster: CPUs running a single task should be able to utilize a maximum amount of CPU power. A periodic timer tick at HZ=1000 can cause a constant overhead of up to 1.0%. This feature removes that overhead - and speeds up the system by 0.5%-1.0% on typical distro configs even on modern systems. - Real-time workload latency reduction: CPUs running critical tasks should experience as little jitter as possible. The last remaining source of kernel-related jitter was the periodic timer tick. - A single task executing on a CPU is a pretty common situation, especially with an increasing number of cores/CPUs, so this feature helps desktop and mobile workloads as well. The cost of the feature is mainly related to increased timer reprogramming overhead when a CPU switches its tick period, and thus slightly longer to-idle and from-idle latency. Configuration-wise a third mode of operation is added to the existing two NOHZ kconfig modes: - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named as a config option. This is the traditional Linux periodic tick design: there's a HZ tick going on all the time, regardless of whether a CPU is idle or not. - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the periodic tick when a CPU enters idle mode. - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the tick when a CPU is idle, also slows the tick down to 1 Hz (one timer interrupt per second) when only a single task is running on a CPU. The .config behavior is compatible: existing !CONFIG_NO_HZ and CONFIG_NO_HZ=y settings get translated to the new values, without the user having to configure anything. CONFIG_NO_HZ_FULL is turned off by default. This feature is based on a lot of infrastructure work that has been steadily going upstream in the last 2-3 cycles: related RCU support and non-periodic cputime support in particular is upstream already. This tree adds the final pieces and activates the feature. The pull request is marked RFC because: - it's marked 64-bit only at the moment - the 32-bit support patch is small but did not get ready in time. - it has a number of fresh commits that came in after the merge window. The overwhelming majority of commits are from before the merge window, but still some aspects of the tree are fresh and so I marked it RFC. - it's a pretty wide-reaching feature with lots of effects - and while the components have been in testing for some time, the full combination is still not very widely used. That it's default-off should reduce its regression abilities and obviously there are no known regressions with CONFIG_NO_HZ_FULL=y enabled either. - the feature is not completely idempotent: there is no 100% equivalent replacement for a periodic scheduler/timer tick. In particular there's ongoing work to map out and reduce its effects on scheduler load-balancing and statistics. This should not impact correctness though, there are no known regressions related to this feature at this point. - it's a pretty ambitious feature that with time will likely be enabled by most Linux distros, and we'd like you to make input on its design/implementation, if you dislike some aspect we missed. Without flaming us to crisp! :-) Future plans: - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off the periodic tick altogether when there's a single busy task on a CPU. We'd first like 1 Hz to be exposed more widely before we go for the 0 Hz target though. - once we reach 0 Hz we can remove the periodic tick assumption from nr_running>=2 as well, by essentially interrupting busy tasks only as frequently as the sched_latency constraints require us to do - once every 4-40 msecs, depending on nr_running. I am personally leaning towards biting the bullet and doing this in v3.10, like the -rt tree this effort has been going on for too long - but the final word is up to you as usual. More technical details can be found in Documentation/timers/NO_HZ.txt" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) sched: Keep at least 1 tick per second for active dynticks tasks rcu: Fix full dynticks' dependency on wide RCU nocb mode nohz: Protect smp_processor_id() in tick_nohz_task_switch() nohz_full: Add documentation. cputime_nsecs: use math64.h for nsec resolution conversion helpers nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config nohz: Reduce overhead under high-freq idling patterns nohz: Remove full dynticks' superfluous dependency on RCU tree nohz: Fix unavailable tick_stop tracepoint in dynticks idle nohz: Add basic tracing nohz: Select wide RCU nocb for full dynticks nohz: Disable the tick when irq resume in full dynticks CPU nohz: Re-evaluate the tick for the new task after a context switch nohz: Prepare to stop the tick on irq exit nohz: Implement full dynticks kick nohz: Re-evaluate the tick from the scheduler IPI sched: New helper to prevent from stopping the tick in full dynticks sched: Kick full dynticks CPU that have more than one task enqueued. perf: New helper to prevent full dynticks CPUs from stopping tick perf: Kick full dynticks CPU if events rotation is needed ...
2013-05-03Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm updates from Dave Airlie: "This is the main drm pull request for 3.10. Wierd bits: - OMAP drm changes required OMAP dss changes, in drivers/video, so I took them in here. - one more fbcon fix for font handover - VT switch avoidance in pm code - scatterlist helpers for gpu drivers - have acks from akpm Highlights: - qxl kms driver - driver for the spice qxl virtual GPU Nouveau: - fermi/kepler VRAM compression - GK110/nvf0 modesetting support. Tegra: - host1x core merged with 2D engine support i915: - vt switchless resume - more valleyview support - vblank fixes - modesetting pipe config rework radeon: - UVD engine support - SI chip tiling support - GPU registers initialisation from golden values. exynos: - device tree changes - fimc block support Otherwise: - bunches of fixes all over the place." * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (513 commits) qxl: update to new idr interfaces. drm/nouveau: fix build with nv50->nvc0 drm/radeon: fix handling of v6 power tables drm/radeon: clarify family checks in pm table parsing drm/radeon: consolidate UVD clock programming drm/radeon: fix UPLL_REF_DIV_MASK definition radeon: add bo tracking debugfs drm/radeon: add new richland pci ids drm/radeon: add some new SI PCI ids drm/radeon: fix scratch reg handling for UVD fence drm/radeon: allocate SA bo in the requested domain drm/radeon: fix possible segfault when parsing pm tables drm/radeon: fix endian bugs in atom_allocate_fb_scratch() OMAPDSS: TFP410: return EPROBE_DEFER if the i2c adapter not found OMAPDSS: VENC: Add error handling for venc_probe_pdata OMAPDSS: HDMI: Add error handling for hdmi_probe_pdata OMAPDSS: RFBI: Add error handling for rfbi_probe_pdata OMAPDSS: DSI: Add error handling for dsi_probe_pdata OMAPDSS: SDI: Add error handling for sdi_probe_pdata OMAPDSS: DPI: Add error handling for dpi_probe_pdata ...
2013-05-02Merge commit '8700c95adb03' into timers/nohzFrederic Weisbecker
The full dynticks tree needs the latest RCU and sched upstream updates in order to fix some dependencies. Merge a common upstream merge point that has these updates. Conflicts: include/linux/perf_event.h kernel/rcutree.h kernel/rcutree_plugin.h Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-05-01Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Mostly performance and bug fixes, plus some cleanups. The one new feature this merge window is a new ioctl EXT4_IOC_SWAP_BOOT which allows installation of a hidden inode designed for boot loaders." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits) ext4: fix type-widening bug in inode table readahead code ext4: add check for inodes_count overflow in new resize ioctl ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUG ext4: fix online resizing for ext3-compat file systems jbd2: trace when lock_buffer in do_get_write_access takes a long time ext4: mark metadata blocks using bh flags buffer: add BH_Prio and BH_Meta flags ext4: mark all metadata I/O with REQ_META ext4: fix readdir error in case inline_data+^dir_index. ext4: fix readdir error in the case of inline_data+dir_index jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memset ext4: mext_insert_extents should update extent block checksum ext4: move quota initialization out of inode allocation transaction ext4: reserve xattr index for Rich ACL support jbd2: reduce journal_head size ext4: clear buffer_uninit flag when submitting IO ext4: use io_end for multiple bios ext4: make ext4_bio_write_page() use BH_Async_Write flags ext4: Use kstrtoul() instead of parse_strtoul() ext4: defragmentation code cleanup ...
2013-04-30Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual stuff, mostly comment fixes, typo fixes, printk fixes and small code cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (45 commits) mm: Convert print_symbol to %pSR gfs2: Convert print_symbol to %pSR m32r: Convert print_symbol to %pSR iostats.txt: add easy-to-find description for field 6 x86 cmpxchg.h: fix wrong comment treewide: Fix typo in printk and comments doc: devicetree: Fix various typos docbook: fix 8250 naming in device-drivers pata_pdc2027x: Fix compiler warning treewide: Fix typo in printks mei: Fix comments in drivers/misc/mei treewide: Fix typos in kernel messages pm44xx: Fix comment for "CONFIG_CPU_IDLE" doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP" mmzone: correct "pags" to "pages" in comment. kernel-parameters: remove outdated 'noresidual' parameter Remove spurious _H suffixes from ifdef comments sound: Remove stray pluses from Kconfig file radio-shark: Fix printk "CONFIG_LED_CLASS" doc: put proper reference to CONFIG_MODULE_SIG_ENFORCE ...
2013-04-30Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes in this cycle are mostly related to preparatory work for the full-dynticks work: - Remove restrictions on no-CBs CPUs, make RCU_FAST_NO_HZ take advantage of numbered callbacks, do callback accelerations based on numbered callbacks. Posted to LKML at https://lkml.org/lkml/2013/3/18/960 - RCU documentation updates. Posted to LKML at https://lkml.org/lkml/2013/3/18/570 - Miscellaneous fixes. Posted to LKML at https://lkml.org/lkml/2013/3/18/594" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) rcu: Make rcu_accelerate_cbs() note need for future grace periods rcu: Abstract rcu_start_future_gp() from rcu_nocb_wait_gp() rcu: Rename n_nocb_gp_requests to need_future_gp rcu: Push lock release to rcu_start_gp()'s callers rcu: Repurpose no-CBs event tracing to future-GP events rcu: Rearrange locking in rcu_start_gp() rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks rcu: Accelerate RCU callbacks at grace-period end rcu: Export RCU_FAST_NO_HZ parameters to sysfs rcu: Distinguish "rcuo" kthreads by RCU flavor rcu: Add event tracing for no-CBs CPUs' grace periods rcu: Add event tracing for no-CBs CPUs' callback registration rcu: Introduce proper blocking to no-CBs kthreads GP waits rcu: Provide compile-time control for no-CBs CPUs rcu: Tone down debugging during boot-up and shutdown. rcu: Add softirq-stall indications to stall-warning messages rcu: Documentation update rcu: Make bugginess of code sample more evident rcu: Fix hlist_bl_set_first_rcu() annotation rcu: Delete unused rcu_node "wakemask" field ...
2013-04-30Merge branch 'akpm' (incoming from Andrew)Linus Torvalds
Merge second batch of fixes from Andrew Morton: - various misc bits - some printk updates - a new "SRAM" driver. - MAINTAINERS updates - the backlight driver queue - checkpatch updates - a few init/ changes - a huge number of drivers/rtc changes - fatfs updates - some lib/idr.c work - some renaming of the random driver interfaces * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (285 commits) net: rename random32 to prandom net/core: remove duplicate statements by do-while loop net/core: rename random32() to prandom_u32() net/netfilter: rename random32() to prandom_u32() net/sched: rename random32() to prandom_u32() net/sunrpc: rename random32() to prandom_u32() scsi: rename random32() to prandom_u32() lguest: rename random32() to prandom_u32() uwb: rename random32() to prandom_u32() video/uvesafb: rename random32() to prandom_u32() mmc: rename random32() to prandom_u32() drbd: rename random32() to prandom_u32() kernel/: rename random32() to prandom_u32() mm/: rename random32() to prandom_u32() lib/: rename random32() to prandom_u32() x86: rename random32() to prandom_u32() x86: pageattr-test: remove srandom32 call uuid: use prandom_bytes() raid6test: use prandom_bytes() sctp: convert sctp_assoc_set_id() to use idr_alloc_cyclic() ...
2013-04-30printk/tracing: rework console tracingzhangwei(Jovi)
Commit 7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer") removed start and end parameters from call_console_drivers, but those parameters still exist in include/trace/events/printk.h. Without start and end parameters handling, printk tracing became more simple as: trace_console(text, len); Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30Merge branch 'akpm' (incoming from Andrew)Linus Torvalds
Merge first batch of fixes from Andrew Morton: - A couple of kthread changes - A few minor audit patches - A number of fbdev patches. Florian remains AWOL so I'm picking up some of these. - A few kbuild things - ocfs2 updates - Almost all of the MM queue (And in the meantime, I already have the second big batch from Andrew pending in my mailbox ;^) * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (149 commits) memcg: take reference before releasing rcu_read_lock mem hotunplug: fix kfree() of bootmem memory mmKconfig: add an option to disable bounce mm, nobootmem: do memset() after memblock_reserve() mm, nobootmem: clean-up of free_low_memory_core_early() fs/buffer.c: remove unnecessary init operation after allocating buffer_head. numa, cpu hotplug: change links of CPU and node when changing node number by onlining CPU mm: fix memory_hotplug.c printk format warning mm: swap: mark swap pages writeback before queueing for direct IO swap: redirty page if page write fails on swap file mm, memcg: give exiting processes access to memory reserves thp: fix huge zero page logic for page with pfn == 0 memcg: avoid accessing memcg after releasing reference fs: fix fsync() error reporting memblock: fix missing comment of memblock_insert_region() mm: Remove unused parameter of pages_correctly_reserved() firmware, memmap: fix firmware_map_entry leak mm/vmstat: add note on safety of drain_zonestat mm: thp: add split tail pages to shrink page list in page reclaim mm: allow for outstanding swap writeback accounting ...
2013-04-29Merge tag 'regmap-v3.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap updates from Mark Brown: "In user visible terms just a couple of enhancements here, though there was a moderate amount of refactoring required in order to support the register cache sync performance improvements. - Support for block and asynchronous I/O during register cache syncing; this provides a use case dependant performance improvement. - Additional debugfs information on the memory consuption and register set" * tag 'regmap-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: (23 commits) regmap: don't corrupt work buffer in _regmap_raw_write() regmap: cache: Fix format specifier in dev_dbg regmap: cache: Make regcache_sync_block_raw static regmap: cache: Write consecutive registers in a single block write regmap: cache: Split raw and non-raw syncs regmap: cache: Factor out block sync regmap: cache: Factor out reg_present support from rbtree cache regmap: cache: Use raw I/O to sync rbtrees if we can regmap: core: Provide regmap_can_raw_write() operation regmap: cache: Provide a get address of value operation regmap: Cut down on the average # of nodes in the rbtree cache regmap: core: Make raw write available to regcache regmap: core: Warn on invalid operation combinations regmap: irq: Clarify error message when we fail to request primary IRQ regmap: rbtree Expose total memory consumption in the rbtree debugfs entry regmap: debugfs: Add a registers `range' file regmap: debugfs: Simplify calculation of `c->max_reg' regmap: cache: Store caches in native register format where possible regmap: core: Split out in place value parsing regmap: cache: Use regcache_get_value() to check if we updated ...
2013-04-29mm: trace filemap add and delRobert Jarzmik
Use the events API to trace filemap loading and unloading of file pieces into the page cache. This patch aims at tracing the eviction reload cycle of executable and shared libraries pages in a memory constrained environment. The typical usage is to spot a specific device and inode (for example /lib/libc.so) to see the eviction cycles, and find out if frequently used code is rather spread across many pages (bad) or coallesced (good). Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Cc: Dave Chinner <david@fromorbit.com> Cc: Hugh Dickins <hughd@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29Merge tag 'trace-3.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "Along with the usual minor fixes and clean ups there are a few major changes with this pull request. 1) Multiple buffers for the ftrace facility This feature has been requested by many people over the last few years. I even heard that Google was about to implement it themselves. I finally had time and cleaned up the code such that you can now create multiple instances of the ftrace buffer and have different events go to different buffers. This way, a low frequency event will not be lost in the noise of a high frequency event. Note, currently only events can go to different buffers, the tracers (ie function, function_graph and the latency tracers) still can only be written to the main buffer. 2) The function tracer triggers have now been extended. The function tracer had two triggers. One to enable tracing when a function is hit, and one to disable tracing. Now you can record a stack trace on a single (or many) function(s), take a snapshot of the buffer (copy it to the snapshot buffer), and you can enable or disable an event to be traced when a function is hit. 3) A perf clock has been added. A "perf" clock can be chosen to be used when tracing. This will cause ftrace to use the same clock as perf uses, and hopefully this will make it easier to interleave the perf and ftrace data for analysis." * tag 'trace-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (82 commits) tracepoints: Prevent null probe from being added tracing: Compare to 1 instead of zero for is_signed_type() tracing: Remove obsolete macro guard _TRACE_PROFILE_INIT ftrace: Get rid of ftrace_profile_bits tracing: Check return value of tracing_init_dentry() tracing: Get rid of unneeded key calculation in ftrace_hash_move() tracing: Reset ftrace_graph_filter_enabled if count is zero tracing: Fix off-by-one on allocating stat->pages kernel: tracing: Use strlcpy instead of strncpy tracing: Update debugfs README file tracing: Fix ftrace_dump() tracing: Rename trace_event_mutex to trace_event_sem tracing: Fix comment about prefix in arch_syscall_match_sym_name() tracing: Convert trace_destroy_fields() to static tracing: Move find_event_field() into trace_events.c tracing: Use TRACE_MAX_PRINT instead of constant tracing: Use pr_warn_once instead of open coded implementation ring-buffer: Add ring buffer startup selftest tracing: Bring Documentation/trace/ftrace.txt up to date tracing: Add "perf" trace_clock ... Conflicts: kernel/trace/ftrace.c kernel/trace/trace.c
2013-04-24nohz: Fix unavailable tick_stop tracepoint in dynticks idleFrederic Weisbecker
The trace_tick_stop() tracepoint is only available in full dynticks. But it's also used by dynticks-idle so let's build it for the latter config as well. This fixes: kernel/time/tick-sched.c: In function tick_nohz_stop_sched_tick: kernel/time/tick-sched.c:644: error: implicit declaration of function trace_tick_stop make[2]: *** [kernel/time/tick-sched.o] Erreur 1 Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-04-22nohz: Add basic tracingFrederic Weisbecker
It's not obvious to find out why the full dynticks subsystem doesn't always stop the tick: whether this is due to kthreads, posix timers, perf events, etc... These new tracepoints are here to help the user diagnose the failures and test this feature. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-04-22gpu: host1x: Add channel supportTerje Bergstrom
Add support for host1x client modules, and host1x channels to submit work to the clients. Signed-off-by: Arto Merilainen <amerilainen@nvidia.com> Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-by: Thierry Reding <thierry.reding@avionic-design.de> Tested-by: Thierry Reding <thierry.reding@avionic-design.de> Tested-by: Erik Faye-Lund <kusmabite@gmail.com> Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
2013-04-22gpu: host1x: Add host1x driverTerje Bergstrom
Add host1x, the driver for host1x and its client unit 2D. The Tegra host1x module is the DMA engine for register access to Tegra's graphics- and multimedia-related modules. The modules served by host1x are referred to as clients. host1x includes some other functionality, such as synchronization. Signed-off-by: Arto Merilainen <amerilainen@nvidia.com> Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-by: Thierry Reding <thierry.reding@avionic-design.de> Tested-by: Thierry Reding <thierry.reding@avionic-design.de> Tested-by: Erik Faye-Lund <kusmabite@gmail.com> Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
2013-04-21jbd2: trace when lock_buffer in do_get_write_access takes a long timeTheodore Ts'o
While investigating interactivity problems it was clear that processes sometimes stall for long periods of times if an attempt is made to lock a buffer which is undergoing writeback. It would stall in a trace looking something like [<ffffffff811a39de>] __lock_buffer+0x2e/0x30 [<ffffffff8123a60f>] do_get_write_access+0x43f/0x4b0 [<ffffffff8123a7cb>] jbd2_journal_get_write_access+0x2b/0x50 [<ffffffff81220f79>] __ext4_journal_get_write_access+0x39/0x80 [<ffffffff811f3198>] ext4_reserve_inode_write+0x78/0xa0 [<ffffffff811f3209>] ext4_mark_inode_dirty+0x49/0x220 [<ffffffff811f57d1>] ext4_dirty_inode+0x41/0x60 [<ffffffff8119ac3e>] __mark_inode_dirty+0x4e/0x2d0 [<ffffffff8118b9b9>] update_time+0x79/0xc0 [<ffffffff8118ba98>] file_update_time+0x98/0x100 [<ffffffff81110ffc>] __generic_file_aio_write+0x17c/0x3b0 [<ffffffff811112aa>] generic_file_aio_write+0x7a/0xf0 [<ffffffff811ea853>] ext4_file_write+0x83/0xd0 [<ffffffff81172b23>] do_sync_write+0xa3/0xe0 [<ffffffff811731ae>] vfs_write+0xae/0x180 [<ffffffff8117361d>] sys_write+0x4d/0x90 [<ffffffff8159d62d>] system_call_fastpath+0x1a/0x1f [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-18Revert "block: add missing block_bio_complete() tracepoint"Linus Torvalds
This reverts commit 3a366e614d0837d9fc23f78cdb1a1186ebc3387f. Wanlong Gao reports that it causes a kernel panic on his machine several minutes after boot. Reverting it removes the panic. Jens says: "It's not quite clear why that is yet, so I think we should just revert the commit for 3.9 final (which I'm assuming is pretty close). The wifi is crap at the LSF hotel, so sending this email instead of queueing up a revert and pull request." Reported-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Requested-by: Jens Axboe <axboe@kernel.dk> Cc: Tejun Heo <tj@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-13tracing: Remove obsolete macro guard _TRACE_PROFILE_INITzhangwei(Jovi)
The macro _TRACE_PROFILE_INIT was removed a long time ago, but an "#undef" guard was left behind. Remove it. Link: http://lkml.kernel.org/r/514684EE.6000805@huawei.com Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-12kthread: Prevent unpark race which puts threads on the wrong cpuThomas Gleixner
The smpboot threads rely on the park/unpark mechanism which binds per cpu threads on a particular core. Though the functionality is racy: CPU0 CPU1 CPU2 unpark(T) wake_up_process(T) clear(SHOULD_PARK) T runs leave parkme() due to !SHOULD_PARK bind_to(CPU2) BUG_ON(wrong CPU) We cannot let the tasks move themself to the target CPU as one of those tasks is actually the migration thread itself, which requires that it starts running on the target cpu right away. The solution to this problem is to prevent wakeups in park mode which are not from unpark(). That way we can guarantee that the association of the task to the target cpu is working correctly. Add a new task state (TASK_PARKED) which prevents other wakeups and use this state explicitly for the unpark wakeup. Peter noticed: Also, since the task state is visible to userspace and all the parked tasks are still in the PID space, its a good hint in ps and friends that these tasks aren't really there for the moment. The migration thread has another related issue. CPU0 CPU1 Bring up CPU2 create_thread(T) park(T) wait_for_completion() parkme() complete() sched_set_stop_task() schedule(TASK_PARKED) The sched_set_stop_task() call is issued while the task is on the runqueue of CPU1 and that confuses the hell out of the stop_task class on that cpu. So we need the same synchronizaion before sched_set_stop_task(). Reported-by: Dave Jones <davej@redhat.com> Reported-and-tested-by: Dave Hansen <dave@sr71.net> Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Acked-by: Peter Ziljstra <peterz@infradead.org> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: dhillf@gmail.com Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-04-10ext4: fix miscellaneous big endian warningsTheodore Ts'o
None of these result in any bug, but they makes sparse complain. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-03ext4: collapse handling of data=ordered and data=writeback codepathsTheodore Ts'o
The only difference between how we handle data=ordered and data=writeback is a single call to ext4_jbd2_file_inode(). Eliminate code duplication by factoring out redundant the code paths. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com>
2013-03-26rcu: Repurpose no-CBs event tracing to future-GP eventsPaul E. McKenney
Dyntick-idle CPUs need to be able to pre-announce their need for grace periods. This can be done using something similar to the mechanism used by no-CB CPUs to announce their need for grace periods. This commit moves in this direction by renaming the no-CBs grace-period event tracing to suit the new future-grace-period needs. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-03-26rcu: Add event tracing for no-CBs CPUs' grace periodsPaul E. McKenney
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-03-18treewide: Fix typos in printk and commentMasanari Iida
Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-03-15tracing: Add a way to soft disable trace eventsSteven Rostedt (Red Hat)
In order to let triggers enable or disable events, we need a 'soft' method for doing so. For example, if a function probe is added that lets a user enable or disable events when a function is called, that change must be done without taking locks or a mutex, and definitely it can't sleep. But the full enabling of a tracepoint is expensive. By adding a 'SOFT_DISABLE' flag, and converting the flags to be updated without the protection of a mutex (using set/clear_bit()), this soft disable flag can be used to allow critical sections to enable or disable events from being traced (after the event has been placed into "SOFT_MODE"). Some caveats though: The comm recorder (to map pids with a comm) can not be soft disabled (yet). If you disable an event with with a "soft" disable and wait a while before reading the trace, the comm cache may be replaced and you'll get a bunch of <...> for comms in the trace. Reading the "enable" file for an event that is disabled will now give you "0*" where the '*' denotes that the tracepoint is still active but the event itself is "disabled". [ fixed _BIT used in & operation : thanks to Dan Carpenter and smatch ] Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15tracing: Fix some section mismatch warningsLi Zefan
As we've added __init annotation to field-defining functions, we should add __refdata annotation to event_call variables, which reference those functions. Link: http://lkml.kernel.org/r/51343C1F.2050502@huawei.com Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15tracing: Annotate event field-defining functions with __initLi Zefan
Those functions are called either during kernel boot or module init. Before: $ dmesg | grep 'Freeing unused kernel memory' Freeing unused kernel memory: 1208k freed Freeing unused kernel memory: 1360k freed Freeing unused kernel memory: 1960k freed After: $ dmesg | grep 'Freeing unused kernel memory' Freeing unused kernel memory: 1236k freed Freeing unused kernel memory: 1388k freed Freeing unused kernel memory: 1960k freed Link: http://lkml.kernel.org/r/5125877D.5000201@huawei.com Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15tracing: Add a helper function for event print functionsLi Zefan
Move duplicate code in event print functions to a helper function. This shrinks the size of the kernel by ~13K. text data bss dec hex filename 6596137 1743966 10138672 18478775 119f6b7 vmlinux.o.old 6583002 1743849 10138672 18465523 119c2f3 vmlinux.o.new Link: http://lkml.kernel.org/r/51258746.2060304@huawei.com Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15tracing: Pass the ftrace_file to the buffer lock reserve codeSteven Rostedt
Pass the struct ftrace_event_file *ftrace_file to the trace_event_buffer_lock_reserve() (new function that replaces the trace_current_buffer_lock_reserver()). The ftrace_file holds a pointer to the trace_array that is in use. In the case of multiple buffers with different trace_arrays, this allows different events to be recorded into different buffers. Also fixed some of the stale comments in include/trace/ftrace.h Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15tracing: Separate out trace events from global variablesSteven Rostedt
The trace events for ftrace are all defined via global variables. The arrays of events and event systems are linked to a global list. This prevents multiple users of the event system (what to enable and what not to). By adding descriptors to represent the event/file relation, as well as to which trace_array descriptor they are associated with, allows for more than one set of events to be defined. Once the trace events files have a link between the trace event and the trace_array they are associated with, we can create multiple trace_arrays that can record separate events in separate buffers. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-04regmap: async: Add tracepoints for async I/OMark Brown
Trace when we start and complete async writes, and when we start and finish blocking for their completion. This is useful for performance analysis of the resulting I/O patterns. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-03-03Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bug fixes from Ted Ts'o: "Various bug fixes for ext4. The most important is a fix for the new extent cache's slab shrinker which can cause significant, user-visible pauses when the system is under memory pressure." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: enable quotas before orphan cleanup ext4: don't allow quota mount options when quota feature enabled ext4: fix a warning from sparse check for ext4_dir_llseek ext4: convert number of blocks to clusters properly ext4: fix possible memory leak in ext4_remount() jbd2: fix ERR_PTR dereference in jbd2__journal_start ext4: use percpu counter for extent cache count ext4: optimize ext4_es_shrink()
2013-03-01ext4: optimize ext4_es_shrink()Theodore Ts'o
When the system is under memory pressure, ext4_es_srhink() will get called very often. So optimize returning the number of items in the file system's extent status cache by keeping a per-filesystem count, instead of calculating it each time by scanning all of the inodes in the extent status cache. Also rename the slab used for the extent status cache to be "ext4_extent_status" so it's obviousl the slab in question is created by ext4. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Zheng Liu <gnehzuil.liu@gmail.com>
2013-02-28Merge branch 'for-3.9/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block IO core bits from Jens Axboe: "Below are the core block IO bits for 3.9. It was delayed a few days since my workstation kept crashing every 2-8h after pulling it into current -git, but turns out it is a bug in the new pstate code (divide by zero, will report separately). In any case, it contains: - The big cfq/blkcg update from Tejun and and Vivek. - Additional block and writeback tracepoints from Tejun. - Improvement of the should sort (based on queues) logic in the plug flushing. - _io() variants of the wait_for_completion() interface, using io_schedule() instead of schedule() to contribute to io wait properly. - Various little fixes. You'll get two trivial merge conflicts, which should be easy enough to fix up" Fix up the trivial conflicts due to hlist traversal cleanups (commit b67bfe0d42ca: "hlist: drop the node parameter from iterators"). * 'for-3.9/core' of git://git.kernel.dk/linux-block: (39 commits) block: remove redundant check to bd_openers() block: use i_size_write() in bd_set_size() cfq: fix lock imbalance with failed allocations drivers/block/swim3.c: fix null pointer dereference block: don't select PERCPU_RWSEM block: account iowait time when waiting for completion of IO request sched: add wait_for_completion_io[_timeout] writeback: add more tracepoints block: add block_{touch|dirty}_buffer tracepoint buffer: make touch_buffer() an exported function block: add @req to bio_{front|back}_merge tracepoints block: add missing block_bio_complete() tracepoint block: Remove should_sort judgement when flush blk_plug block,elevator: use new hashtable implementation cfq-iosched: add hierarchical cfq_group statistics cfq-iosched: collect stats from dead cfqgs cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats() blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock block: RCU free request_queue blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge() ...
2013-02-26Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Theodore Ts'o: "The one new feature added in this patch series is the ability to use the "punch hole" functionality for inodes that are not using extent maps. In the bug fix category, we fixed some races in the AIO and fstrim code, and some potential NULL pointer dereferences and memory leaks in error handling code paths. In the optimization category, we fixed a performance regression in the jbd2 layer introduced by commit d9b01934d56a ("jbd: fix fsync() tid wraparound bug", introduced in v3.0) which shows up in the AIM7 benchmark. We also further optimized jbd2 by minimize the amount of time that transaction handles are held active. This patch series also features some additional enhancement of the extent status tree, which is now used to cache extent information in a more efficient/compact form than what we use on-disk." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (65 commits) ext4: fix free clusters calculation in bigalloc filesystem ext4: no need to remove extent if len is 0 in ext4_es_remove_extent() ext4: fix xattr block allocation/release with bigalloc ext4: reclaim extents from extent status tree ext4: adjust some functions for reclaiming extents from extent status tree ext4: remove single extent cache ext4: lookup block mapping in extent status tree ext4: track all extent status in extent status tree ext4: let ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag ext4: rename and improbe ext4_es_find_extent() ext4: add physical block and status member into extent status tree ext4: refine extent status tree ext4: use ERR_PTR() abstraction for ext4_append() ext4: refactor code to read directory blocks into ext4_read_dirblock() ext4: add debugging context for warning in ext4_da_update_reserve_space() ext4: use KERN_WARNING for warning messages jbd2: use module parameters instead of debugfs for jbd_debug ext4: use module parameters instead of debugfs for mballoc_debug ext4: start handle at the last possible moment when creating inodes ext4: fix the number of credits needed for acl ops with inline data ...
2013-02-24Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Marcelo Tosatti: "KVM updates for the 3.9 merge window, including x86 real mode emulation fixes, stronger memory slot interface restrictions, mmu_lock spinlock hold time reduction, improved handling of large page faults on shadow, initial APICv HW acceleration support, s390 channel IO based virtio, amongst others" * tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits) Revert "KVM: MMU: lazily drop large spte" x86: pvclock kvm: align allocation size to page size KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr x86 emulator: fix parity calculation for AAD instruction KVM: PPC: BookE: Handle alignment interrupts booke: Added DBCR4 SPR number KVM: PPC: booke: Allow multiple exception types KVM: PPC: booke: use vcpu reference from thread_struct KVM: Remove user_alloc from struct kvm_memory_slot KVM: VMX: disable apicv by default KVM: s390: Fix handling of iscs. KVM: MMU: cleanup __direct_map KVM: MMU: remove pt_access in mmu_set_spte KVM: MMU: cleanup mapping-level KVM: MMU: lazily drop large spte KVM: VMX: cleanup vmx_set_cr0(). KVM: VMX: add missing exit names to VMX_EXIT_REASONS array KVM: VMX: disable SMEP feature when guest is in non-paging mode KVM: Remove duplicate text in api.txt Revert "KVM: MMU: split kvm_mmu_free_page" ...
2013-02-20Merge tag 'pm+acpi-3.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: - Rework of the ACPI namespace scanning code from Rafael J. Wysocki with contributions from Bjorn Helgaas, Jiang Liu, Mika Westerberg, Toshi Kani, and Yinghai Lu. - ACPI power resources handling and ACPI device PM update from Rafael J Wysocki. - ACPICA update to version 20130117 from Bob Moore and Lv Zheng with contributions from Aaron Lu, Chao Guan, Jesper Juhl, and Tim Gardner. - Support for Intel Lynxpoint LPSS from Mika Westerberg. - cpuidle update from Len Brown including Intel Haswell support, C1 state for intel_idle, removal of global pm_idle. - cpuidle fixes and cleanups from Daniel Lezcano. - cpufreq fixes and cleanups from Viresh Kumar and Fabio Baltieri with contributions from Stratos Karafotis and Rickard Andersson. - Intel P-states driver for Sandy Bridge processors from Dirk Brandewie. - cpufreq driver for Marvell Kirkwood SoCs from Andrew Lunn. - cpufreq fixes related to ordering issues between acpi-cpufreq and powernow-k8 from Borislav Petkov and Matthew Garrett. - cpufreq support for Calxeda Highbank processors from Mark Langsdorf and Rob Herring. - cpufreq driver for the Freescale i.MX6Q SoC and cpufreq-cpu0 update from Shawn Guo. - cpufreq Exynos fixes and cleanups from Jonghwan Choi, Sachin Kamat, and Inderpal Singh. - Support for "lightweight suspend" from Zhang Rui. - Removal of the deprecated power trace API from Paul Gortmaker. - Assorted updates from Andreas Fleig, Colin Ian King, Davidlohr Bueso, Joseph Salisbury, Kees Cook, Li Fei, Nishanth Menon, ShuoX Liu, Srinivas Pandruvada, Tejun Heo, Thomas Renninger, and Yasuaki Ishimatsu. * tag 'pm+acpi-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (267 commits) PM idle: remove global declaration of pm_idle unicore32 idle: delete stray pm_idle comment openrisc idle: delete pm_idle mn10300 idle: delete pm_idle microblaze idle: delete pm_idle m32r idle: delete pm_idle, and other dead idle code ia64 idle: delete pm_idle cris idle: delete idle and pm_idle ARM64 idle: delete pm_idle ARM idle: delete pm_idle blackfin idle: delete pm_idle sparc idle: rename pm_idle to sparc_idle sh idle: rename global pm_idle to static sh_idle x86 idle: rename global pm_idle to static x86_idle APM idle: register apm_cpu_idle via cpuidle cpufreq / intel_pstate: Add kernel command line option disable intel_pstate. cpufreq / intel_pstate: Change to disallow module build tools/power turbostat: display SMI count by default intel_idle: export both C1 and C1E ACPI / hotplug: Fix concurrency issues and memory leaks ...
2013-02-20Merge branch 'for-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
Pull workqueue changes from Tejun Heo: "A lot of reorganization is going on mostly to prepare for worker pools with custom attributes so that workqueue can replace custom pool implementations in places including writeback and btrfs and make CPU assignment in crypto more flexible. workqueue evolved from purely per-cpu design and implementation, so there are a lot of assumptions regarding being bound to CPUs and even unbound workqueues are implemented as an extension of the model - workqueues running on the special unbound CPU. Bulk of changes this round are about promoting worker_pools as the top level abstraction replacing global_cwq (global cpu workqueue). At this point, I'm fairly confident about getting custom worker pools working pretty soon and ready for the next merge window. Lai's patches are replacing the convoluted mb() dancing workqueue has been doing with much simpler mechanism which only depends on assignment atomicity of long. For details, please read the commit message of 0b3dae68ac ("workqueue: simplify is-work-item-queued-here test"). While the change ends up adding one pointer to struct delayed_work, the inflation in percentage is less than five percent and it decouples delayed_work logic a lot more cleaner from usual work handling, removes the unusual memory barrier dancing, and allows for further simplification, so I think the trade-off is acceptable. There will be two more workqueue related pull requests and there are some shared commits among them. I'll write further pull requests assuming this pull request is pulled first." * 'for-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (37 commits) workqueue: un-GPL function delayed_work_timer_fn() workqueue: rename cpu_workqueue to pool_workqueue workqueue: reimplement is_chained_work() using current_wq_worker() workqueue: fix is_chained_work() regression workqueue: pick cwq instead of pool in __queue_work() workqueue: make get_work_pool_id() cheaper workqueue: move nr_running into worker_pool workqueue: cosmetic update in try_to_grab_pending() workqueue: simplify is-work-item-queued-here test workqueue: make work->data point to pool after try_to_grab_pending() workqueue: add delayed_work->wq to simplify reentrancy handling workqueue: make work_busy() test WORK_STRUCT_PENDING first workqueue: replace WORK_CPU_NONE/LAST with WORK_CPU_END workqueue: post global_cwq removal cleanups workqueue: rename nr_running variables workqueue: remove global_cwq workqueue: remove worker_pool->gcwq workqueue: replace for_each_worker_pool() with for_each_std_worker_pool() workqueue: make freezing/thawing per-pool workqueue: make hotplug processing per-pool ...
2013-02-20Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "There are lots of improvements, the biggest changes are: Main kernel side changes: - Improve uprobes performance by adding 'pre-filtering' support, by Oleg Nesterov. - Make some POWER7 events available in sysfs, equivalent to what was done on x86, from Sukadev Bhattiprolu. - tracing updates by Steve Rostedt - mostly misc fixes and smaller improvements. - Use perf/event tracing to report PCI Express advanced errors, by Tony Luck. - Enable northbridge performance counters on AMD family 15h, by Jacob Shin. - This tracing commit: tracing: Remove the extra 4 bytes of padding in events changes the ABI. All involved parties (PowerTop in particular) seem to agree that it's safe to do now with the introduction of libtraceevent, but the devil is in the details ... Main tooling side changes: - Add 'event group view', from Namyung Kim: To use it, 'perf record' should group events when recording. And then perf report parses the saved group relation from file header and prints them together if --group option is provided. You can use the 'perf evlist' command to see event group information: $ perf record -e '{ref-cycles,cycles}' noploop 1 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.385 MB perf.data (~16807 samples) ] $ perf evlist --group {ref-cycles,cycles} With this example, default perf report will show you each event separately. You can use --group option to enable event group view: $ perf report --group ... # group: {ref-cycles,cycles} # ======== # Samples: 7K of event 'anon group { ref-cycles, cycles }' # Event count (approx.): 6876107743 # # Overhead Command Shared Object Symbol # ................ ....... ................. .......................... 99.84% 99.76% noploop noploop [.] main 0.07% 0.00% noploop ld-2.15.so [.] strcmp 0.03% 0.00% noploop [kernel.kallsyms] [k] timerqueue_del 0.03% 0.03% noploop [kernel.kallsyms] [k] sched_clock_cpu 0.02% 0.00% noploop [kernel.kallsyms] [k] account_user_time 0.01% 0.00% noploop [kernel.kallsyms] [k] __alloc_pages_nodemask 0.00% 0.00% noploop [kernel.kallsyms] [k] native_write_msr_safe 0.00% 0.11% noploop [kernel.kallsyms] [k] _raw_spin_lock 0.00% 0.06% noploop [kernel.kallsyms] [k] find_get_page 0.00% 0.02% noploop [kernel.kallsyms] [k] rcu_check_callbacks 0.00% 0.02% noploop [kernel.kallsyms] [k] __current_kernel_time As you can see the Overhead column now contains both of ref-cycles and cycles and header line shows group information also - 'anon group { ref-cycles, cycles }'. The output is sorted by period of group leader first. - Initial GTK+ annotate browser, from Namhyung Kim. - Add option for runtime switching perf data file in perf report, just press 's' and a menu with the valid files found in the current directory will be presented, from Feng Tang. - Add support to display whole group data for raw columns, from Jiri Olsa. - Add per processor socket count aggregation in perf stat, from Stephane Eranian. - Add interval printing in 'perf stat', from Stephane Eranian. - 'perf test' improvements - Add support for wildcards in tracepoint system name, from Jiri Olsa. - Add anonymous huge page recognition, from Joshua Zhu. - perf build-id cache now can show DSOs present in a perf.data file that are not in the cache, to integrate with build-id servers being put in place by organizations such as Fedora. - perf top now shares more of the evsel config/creation routines with 'record', paving the way for further integration like 'top' snapshots, etc. - perf top now supports DWARF callchains. - Fix mmap limitations on 32-bit, fix from David Miller. - 'perf bench numa mem' NUMA performance measurement suite - ... and lots of fixes, performance improvements, cleanups and other improvements I failed to list - see the shortlog and git log for details." * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (270 commits) perf/x86/amd: Enable northbridge performance counters on AMD family 15h perf/hwbp: Fix cleanup in case of kzalloc failure perf tools: Fix build with bison 2.3 and older. perf tools: Limit unwind support to x86 archs perf annotate: Make it to be able to skip unannotatable symbols perf gtk/annotate: Fail early if it can't annotate perf gtk/annotate: Show source lines with gray color perf gtk/annotate: Support multiple event annotation perf ui/gtk: Implement basic GTK2 annotation browser perf annotate: Fix warning message on a missing vmlinux perf buildid-cache: Add --update option uprobes/perf: Avoid uprobe_apply() whenever possible uprobes/perf: Teach trace_uprobe/perf code to use UPROBE_HANDLER_REMOVE uprobes/perf: Teach trace_uprobe/perf code to pre-filter uprobes/perf: Teach trace_uprobe/perf code to track the active perf_event's uprobes: Introduce uprobe_apply() perf: Introduce hw_perf_event->tp_target and ->tp_list uprobes/perf: Always increment trace_uprobe->nhit uprobes/tracing: Kill uprobe_trace_consumer, embed uprobe_consumer into trace_uprobe uprobes/tracing: Introduce is_trace_uprobe_enabled() ...
2013-02-18ext4: reclaim extents from extent status treeZheng Liu
Although extent status is loaded on-demand, we also need to reclaim extent from the tree when we are under a heavy memory pressure because in some cases fragmented extent tree causes status tree costs too much memory. Here we maintain a lru list in super_block. When the extent status of an inode is accessed and changed, this inode will be move to the tail of the list. The inode will be dropped from this list when it is cleared. In the inode, a counter is added to count the number of cached objects in extent status tree. Here only written/unwritten/hole extent is counted because delayed extent doesn't be reclaimed due to fiemap, bigalloc and seek_data/hole need it. The counter will be increased as a new extent is allocated, and it will be decreased as a extent is freed. In this commit we use normal shrinker framework to reclaim memory from the status tree. ext4_es_reclaim_extents_count() traverses the lru list to count the number of reclaimable extents. ext4_es_shrink() tries to reclaim written/unwritten/hole extents from extent status tree. The inode that has been shrunk is moved to the tail of lru list. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
2013-02-18ext4: lookup block mapping in extent status treeZheng Liu
After tracking all extent status, we already have a extent cache in memory. Every time we want to lookup a block mapping, we can first try to lookup it in extent status tree to avoid a potential disk I/O. A new function called ext4_es_lookup_extent is defined to finish this work. When we try to lookup a block mapping, we always call ext4_map_blocks and/or ext4_da_map_blocks. So in these functions we first try to lookup a block mapping in extent status tree. A new flag EXT4_GET_BLOCKS_NO_PUT_HOLE is used in ext4_da_map_blocks in order not to put a hole into extent status tree because this hole will be converted to delayed extent in the tree immediately. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
2013-02-18ext4: rename and improbe ext4_es_find_extent()Zheng Liu
This commit renames ext4_es_find_extent with ext4_es_find_delayed_extent and improve this function. First, we split input and output parameter. Second, this function never return the first block of the next delayed extent after 'es'. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
2013-02-18ext4: add physical block and status member into extent status treeZheng Liu
This commit adds two members in extent_status structure to let it record physical block and extent status. Here es_pblk is used to record both of them because physical block only has 48 bits. So extent status could be stashed into it so that we can save some memory. Now written, unwritten, delayed and hole are defined as status. Due to new member is added into extent status tree, all interfaces need to be adjusted. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2013-02-18ext4: refine extent status treeZheng Liu
This commit refines the extent status tree code. 1) A prefix 'es_' is added to to the extent status tree structure members. 2) Refactored es_remove_extent() so that __es_remove_extent() can be used by es_insert_extent() to remove the old extent entry(-ies) before inserting a new one. 3) Rename extent_status_end() to ext4_es_end() 4) ext4_es_can_be_merged() is define to check whether two extents can be merged or not. 5) Update and clarified comments. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2013-02-14workqueue: rename cpu_workqueue to pool_workqueueTejun Heo
workqueue has moved away from global_cwqs to worker_pools and with the scheduled custom worker pools, wforkqueues will be associated with pools which don't have anything to do with CPUs. The workqueue code went through significant amount of changes recently and mass renaming isn't likely to hurt much additionally. Let's replace 'cpu' with 'pool' so that it reflects the current design. * s/struct cpu_workqueue_struct/struct pool_workqueue/ * s/cpu_wq/pool_wq/ * s/cwq/pwq/ This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org>
2013-02-08jbd2: add tracepoints which provide per-handle statistics Theodore Ts'o
Handles which stay open a long time are problematic when it comes time to close down a transaction so it can be committed. These tracepoints will help us determine which ones are the problematic ones, and to validate whether changes makes things better or worse. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-02-07jbd2: track request delay statisticsTheodore Ts'o
Track the delay between when we first request that the commit begin and when it actually begins, so we can see how much of a gap exists. In theory, this should just be the remaining scheduling quantuum of the thread which requested the commit (assuming it was not a synchronous operation which triggered the commit request) plus scheduling overhead; however, it's possible that real time processes might get in the way of letting the kjournald thread from executing. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-29Merge branches 'doctorture.2013.01.29a', 'fixes.2013.01.26a', ↵Paul E. McKenney
'tagcb.2013.01.24a' and 'tiny.2013.01.29b' into HEAD doctorture.2013.01.11a: Changes to rcutorture and to RCU documentation. fixes.2013.01.26a: Miscellaneous fixes. tagcb.2013.01.24a: Tag RCU callbacks with grace-period number to simplify callback advancement. tiny.2013.01.29b: Enhancements to uniprocessor handling in tiny RCU.
2013-01-25PM / tracing: remove deprecated power trace APIPaul Gortmaker
The text in Documentation said it would be removed in 2.6.41; the text in the Kconfig said removal in the 3.1 release. Either way you look at it, we are well past both, so push it off a cliff. Note that the POWER_CSTATE and the POWER_PSTATE are part of the legacy tracing API. Remove all tracepoints which use these flags. As can be seen from context, most already have a trace entry via trace_cpu_idle anyways. Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as compared to the CSTATE ones which all have a clear start/stop. As part of this, the trace_power_frequency also becomes orphaned, so it too is deleted. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>