Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/dma: pci_set_dma_mask() shouldn't fail if mask fits in RAM
|
|
Fix the following 'make includecheck' warning:
arch/mn10300/include/asm/pci.h: linux/mm.h is included more than once.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
clean up type-casting twice. "size_t" is typedef as "unsigned long" in
64-bit system, and "unsigned int" in 32-bit system, and the intermediate
cast to 'long' is pointless.
Signed-off-by: Figo.zhang <figo1802@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This will register the ioctl range used by nilfs2 file system to the
table listed in Documentation/ioctl/ioctl-number.txt.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
After aligning the ftrace raw samples, there are dead bytes storing
random data from the stack. We don't want to leak these to userspace,
then zero these out.
Before:
0x2de88 [0x50]: event: 9
.
. ... raw event: size 80 bytes
. 0000: 09 00 00 00 01 00 50 00 d0 c7 00 81 ff ff ff ff ......P........
. 0010: 68 01 00 00 68 01 00 00 2c 00 00 00 00 00 00 00 h...h...,......
. 0020: 2c 00 00 00 2b 00 01 02 68 01 00 00 68 01 00 00 ,...+...h...h..
. 0030: 6b 6f 6e 64 65 6d 61 6e 64 2f 30 00 00 00 00 00 kondemand/0....
. 0040: 68 01 00 00 40 7f 46 81 ff ff ff ff 00 10 1b 7f h...@.F........
^ ^ ^ ^
Leak
After:
0x2d318 [0x50]: event: 9
.
. ... raw event: size 80 bytes
. 0000: 09 00 00 00 01 00 50 00 d0 c7 00 81 ff ff ff ff ......P........
. 0010: 68 01 00 00 68 01 00 00 68 14 00 00 00 00 00 00 h...h...h......
. 0020: 2c 00 00 00 2b 00 01 02 68 01 00 00 68 01 00 00 ,...+...h...h..
. 0030: 6b 6f 6e 64 65 6d 61 6e 64 2f 30 00 00 00 00 00 kondemand/0....
. 0040: 68 01 00 00 a0 80 46 81 ff ff ff ff 00 00 00 00 h.....F........
^ ^ ^ ^
Fixed
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1249915116-5210-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
|
|
We compute the perf raw sample size by aligning the raw ftrace
event size plus the buffer size field itself. We do that
instead of aligning only the perf raw sample size, so that we
might economize some in some cases.
But this buffer size field is not stored in the perf raw
sample, we must then substract its size from the buffer once we
computed the alignment unless we may get a useless u32 field in
the buffer.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090810141129.GA5124@nowhere>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Need to add the REQUEUE_PI checks to the compat_sys_futex API
as well to ensure 32 bit requeue's work fine on a 64 bit
system. Patch is against latest tip
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
LKML-Reference: <20090810130142.GA23619@in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Give waitqueue spinlocks their own lockdep classes when they
are initialised from init_waitqueue_head(). This means that
struct wait_queue::func functions can operate other waitqueues.
This is used by CacheFiles to catch the page from a backing fs
being unlocked and to wake up another thread to take a copy of
it.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Takashi Iwai <tiwai@suse.de>
Cc: linux-cachefs@redhat.com
Cc: torvalds@osdl.org
Cc: akpm@linux-foundation.org
LKML-Reference: <20090810113305.17284.81508.stgit@warthog.procyon.org.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
The problem is minor, but without ->cred_guard_mutex held we can race
with exec() and get the new ->mm but check old creds.
Now we do not need to re-check task->mm after ptrace_may_access(), it
can't be changed to the new mm under us.
Strictly speaking, this also fixes another very minor problem. Unless
security check fails or the task exits mm_for_maps() should never
return NULL, the caller should get either old or new ->mm.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
|
|
mm_for_maps() takes ->mmap_sem after security checks, this looks
strange and obfuscates the locking rules. Move this lock to its
single caller, m_start().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
|
|
It would be nice to kill __ptrace_may_access(). It requires task_lock(),
but this lock is only needed to read mm->flags in the middle.
Convert mm_for_maps() to use ptrace_may_access(), this also simplifies
the code a little bit.
Also, we do not need to take ->mmap_sem in advance. In fact I think
mm_for_maps() should not play with ->mmap_sem at all, the caller should
take this lock.
With or without this patch, without ->cred_guard_mutex held we can race
with exec() and get the new ->mm but check old creds.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
|
|
Without the initialization of vmaster NID, the dB information got
confused for ALC269 codec.
Reference: Novell bnc#527361
https://bugzilla.novell.com/show_bug.cgi?id=527361
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: <stable@kernel.org>
|
|
Raw tracepoint data contains various kernel internals and
data from other users, so restrict this to CAP_SYS_ADMIN.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1249896452.17467.75.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
PERF_SAMPLE_* output switches should unconditionally output the
correct format, as they are the only way to unambiguously parse
the PERF_EVENT_SAMPLE data.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1249896447.17467.74.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
futex_requeue() can acquire the lock on behalf of a waiter
early on or during the requeue loop if it is uncontended or in
the event of a lock steal or owner died. On wakeup, the waiter
(in futex_wait_requeue_pi()) cleans up the pi_state owner using
the lock_ptr to protect against concurrent access to the
pi_state. The pi_state is hung off futex_q's on the requeue
target futex hash bucket so the lock_ptr needs to be updated
accordingly.
The problem manifested by triggering the WARN_ON in
lookup_pi_state() about the pid != pi_state->owner->pid. With
this patch, the pi_state is properly guarded against concurrent
access via the requeue target hb lock.
The astute reviewer may notice that there is a window of time
between when futex_requeue() unlocks the hb locks and when
futex_wait_requeue_pi() will acquire hb2->lock. During this
time the pi_state and uval are not in sync with the underlying
rtmutex owner (but the uval does indicate there are waiters, so
no atomic changes will occur in userspace). However, this is
not a problem. Should a contending thread enter
lookup_pi_state() and acquire hb2->lock before the ownership is
fixed up, it will find the pi_state hung off a waiter's
(possibly the pending owner's) futex_q and block on the
rtmutex. Once futex_wait_requeue_pi() fixes up the owner, it
will also move the pi_state from the old owner's
task->pi_state_list to its own.
v3: Fix plist lock name for application to mainline (rather
than -rt) Compile tested against tip/v2.6.31-rc5.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@linux.vnet.ibm.com>
LKML-Reference: <4A7F4EFF.6090903@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
On an iMac G5, the b43 driver is failing to initialise because trying to
set the dma mask to 30-bit fails. Even though there's only 512MiB of RAM
in the machine anyway:
https://bugzilla.redhat.com/show_bug.cgi?id=514787
We should probably let it succeed if the available RAM in the system
doesn't exceed the requested limit.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
A recent commit:
commit 449aad3e25358812c43afc60918c5ad3819488e7
introduced the possibility of an A-B/B-A deadlock between
bd_mutex and reconfig_mutex.
__blkdev_get holds bd_mutex while calling md_open which takes
reconfig_mutex,
do_md_run is always called with reconfig_mutex held, and it now
takes bd_mutex in the call the revalidate_disk.
This potential deadlock was not caught by lockdep due to the
use of mutex_lock_interruptible_nexted which was introduced
by
commit d63a5a74dee87883fda6b7d170244acaac5b05e8
do avoid a warning of an impossible deadlock.
It is quite possible to split reconfig_mutex in to two locks.
One protects the array data structures while it is being
reconfigured, the other ensures that an array is never even partially
open while it is being deactivated.
In particular, the second lock prevents an open from completing
between the time when do_md_stop checks if there are any active opens,
and the time when the array is either set read-only, or when ->pers is
set to NULL. So we can be certain that no IO is in flight as the
array is being destroyed.
So create a new lock, open_mutex, just to ensure exclusion between
'open' and 'stop'.
This avoids the deadlock and also avoids the lockdep warning mentioned
in commit d63a5a74d
Reported-by: "Mike Snitzer" <snitzer@gmail.com>
Reported-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
* 'for-linus' of git://git.infradead.org/ubi-2.6:
UBI: compatible fallback in absense of sequence numbers
UBI: fix double free on error path
|
|
* 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Avoid redelivery of edge interrupt before next edge
KVM: MMU: limit rmap chain length
KVM: ia64: fix build failures due to ia64/unsigned long mismatches
KVM: Make KVM_HPAGES_PER_HPAGE unsigned long to avoid build error on powerpc
KVM: fix ack not being delivered when msi present
KVM: s390: fix wait_queue handling
KVM: VMX: Fix locking imbalance on emulation failure
KVM: VMX: Fix locking order in handle_invalid_guest_state
KVM: MMU: handle n_free_mmu_pages > n_alloc_mmu_pages in kvm_mmu_change_mmu_pages
KVM: SVM: force new asid on vcpu migration
KVM: x86: verify MTRR/PAT validity
KVM: PIT: fix kpit_elapsed division by zero
KVM: Fix KVM_GET_MSR_INDEX_LIST
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/i915: silence vblank warnings
drm: silence pointless vblank warning.
drm: When adding probed modes, preserve duplicate mode types
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
posix_cpu_timers_exit_group(): Do not use thread_group_cputimer()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_counter: Fix/complete ftrace event records sampling
perf_counter, ftrace: Fix perf_counter integration
tracing/filters: Always free pred on filter_add_subsystem_pred() failure
tracing/filters: Don't use pred on alloc failure
ring-buffer: Fix memleak in ring_buffer_free()
tracing: Fix recordmcount.pl to handle sections with only weak functions
ring-buffer: Fix advance of reader in rb_buffer_peek()
tracing: do not use functions starting with .L in recordmcount.pl
ring-buffer: do not disable ring buffer on oops_in_progress
ring-buffer: fix check of try_to_discard result
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix buffer overflow in efi_init()
x86: Add quirk to make Apple MacBookPro5,1 use reboot=pci
x86: Fix MSI-X initialization by using online_mask for x2apic target_cpus
x86: Fix VMI && stack protector
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep: Fix typos in documentation
lockdep: Fix file mode of lock_stat
rtmutex: Avoid deadlock in rt_mutex_start_proxy_lock()
|
|
Sometimes we get callchain branches that have a rate under the
limit given by the user.
Say you launched:
perf record -f -g -a ./hackbench 10
perf report -g fractal,10.0
And you got:
2.33% hackbench [kernel] [k] _spin_lock_irqsave
|
|--78.57%-- remove_wait_queue
| poll_freewait
| do_sys_poll
| sys_poll
| sysenter_dispatch
| 0xf7ffa430
| 0x1ffadea3c
|
|--7.14%-- __up_read
| up_read
| do_page_fault
| page_fault
| 0xf7ffa430
| 0xa0df710000000a
...
It is abnormal to get a 7.14% branch whereas we passed a 10%
filter.
The problem is that we round down the minimum threshold. This
happens mostly when we have very low number of events. If the
total amount of your branch is 4 and you have a subranch of 3
events, filtering to 90% will be computed like follows:
limit = 4 * 0.9;
The result is about 3.6, but the cast to integer will round
down to 3. It means that our filter is actually of 75%
We must then explicitly round up the minimum threshold.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: acme@redhat.com
Cc: peterz@infradead.org
Cc: efault@gmx.de
LKML-Reference: <20090809024235.GA10146@nowhere>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Due to a libz dependency in some distro's binutils package,
C++ demangle support isn't compiled in despite the necessary
libraries being available.
Fix this by adding a -lz link test to the dependency detection
rules.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1249733655.6929.5.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
A few examples of how 'perf' can be used, from an e-mail by
Ingo Molnar http://lkml.org/lkml/2009/8/4/346.
Signed-off-by: Carlos R. Mafra <crmafra2@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Valdis.Kletnieks@vt.edu
LKML-Reference: <20090805185334.GA4535@Pilar.aei.mpg.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
While extending perfcounters with BTS hw-tracing, Markus
Metzger managed to trigger this warning:
[ 995.557128] WARNING: at kernel/perf_counter.c:1191 __perf_counter_task_sched_out+0x48/0x6b()
triggers because commit
9f498cc5be7e013d8d6e4c616980ed0ffc8680d2 (perf_counter: Full
task tracing) removed clearing of tsk->perf_counter_ctxp out
from under ctx->lock which introduced a race (against
perf_lock_task_context).
Move it back and deal with the exit notification by explicitly
passing along the former task context.
Reported-by: Markus T Metzger <markus.t.metzger@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1249667341.17467.5.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Based on Peter's comments, make tracepoint sampling generic
just like all the other sampling bits are. This is a rename
with no code changes:
- PERF_SAMPLE_TP_RECORD to PERF_SAMPLE_RAW
- struct perf_tracepoint_record to perf_raw_record
We want the system in place that transport tracepoints raw
samples events into the perf ring buffer to be generalized and
usable by any type of counter.
Reported-by; Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1249698400-5441-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
unconditionally
Despite that the tracepoint record is always present when the
PERF_SAMPLE_TP_RECORD flag is set, gcc raises a warning,
thinking it might not be initialized:
kernel/perf_counter.c: In function ‘perf_counter_output’:
kernel/perf_counter.c:2650: warning: ‘tp’ may be used uninitialized in this function
Then, initialize it to NULL and always check if it's not NULL
before dereference it.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1249698400-5441-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
amount of ignored chains in fractal mode
When we filter the callchains below a given percentage, we
ignore them and the end result only shows entries that have an
upper percentage than the filter threshold.
It seems to users then that we have an imbalance in the
percentage, as if the sum inside a profiled branch doesn't
reach 100%.
Since in the past there have been real perf report bugs that
showed the same sypmtom, it would be nice to assure the user
that the data is perfect and trustable and it all sums up to
100.00%.
So fix this by displaying the remaining hits that have been
filtered but without more detail than their amount in each
branches. Example while filtering below 50%:
7.73% [k] delay_tsc
|
|--98.22%-- __const_udelay
| |
| |--86.37%-- ath5k_hw_register_timeout
| | ath5k_hw_noise_floor_calibration
| | ath5k_hw_reset
| | ath5k_reset
| | ath5k_config
| | ieee80211_hw_config
| | |
| | |--88.53%-- ieee80211_scan_work
| | | worker_thread
| | | kthread
| | | child_rip
| | --11.47%-- [...]
| --13.63%-- [...]
--1.78%-- [...]
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1249690585-9145-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
If we recorded with -g option to record the callchain, right now
we require a -g option to perf report as well - and people reported
this as unnecessary complication: the user already specified -g
once, no need to require it a second time.
So if the recording includes call-chains, display the callchain by
default from perf report.
( The user can override this default using "-g none" option from
perf report. )
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1249690585-9145-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
callchains
When the callchain tree comes to insert an empty backtrace, it
raises a spurious warning about the fact we are inserting an
empty. This is spurious because the radix tree assumes it did
something wrong to reach this state. But it didn't, we just met
an empty callchain that has to be ignored.
This happens occasionally with certain types of call-chain
recordings. If it happens it's a big nuisance as perf report
output starts with thousands of warning lines.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1249690585-9145-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
1. Ignore the -A argument if there is no perf.data file
2. Treat an empty file like a non existent file.
Else, perf will try to read the perf.data header, and fail with
an error.
Treating an empty file like a non-existent file makes sense,
since an interupted (as in SIGKILLed) perf could leave such
files around, and you don't want to annoy the user with errors
for files with no data in it.
Signed-off-by: Pierre Habouzit <pierre.habouzit@intersec.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
While toying with perf, I've noticed that perf record can
easily enter a busy loop when doing something as silly as:
$ perf record -A ls
Yeah, do_read here really wants to read a known size, not being
able to should die(), not busy-loop ;)
That was the cause for the bug.
Signed-off-by: Pierre Habouzit <pierre.habouzit@intersec.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Stop perf list from displaying tracepoints without an id file,
those are special tracepoints that are not interfaced to
perfcounters so listing them is erroneous and passing them as
events will produce no output.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
If we have the powerpc perf_counter backend compiled in, but
the cpu we are running on is one where we don't support the
PMU, we currently oops in hw_perf_group_sched_in if we try to
use any counters, because ppmu is NULL in that case, and we
unconditionally dereference ppmu.
This fixes the problem by adding a check if ppmu is NULL at the
beginning of hw_perf_group_sched_in, and also at the beginning
of the other functions that get called from the perf_counter
core, i.e. hw_perf_disable, hw_perf_enable, and
hw_perf_counter_setup.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: benh@kernel.crashing.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
We want to use a coherent flag for -S/--stat across all tools,
so free up -S in perf stat.
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
origin (DSO, build-id, kernel, etc)
Used with perf report --verbose:
[acme@doppio linux-2.6-tip]$ perf report -v | head -16
5.17% firefox /usr/lib64/xulrunner-1.9.1/libxul.so 0x00000000005d8eee f [.] imgContainer::DrawFrameTo(gfxIImageFrame*, gfxIImageFrame*, nsRect&)
2.56% firefox /lib64/libpthread-2.10.1.so 0x0000000000008e02 d [.] __pthread_mutex_lock_internal
1.94% firefox /usr/lib64/xulrunner-1.9.1/libxul.so 0x0000000000d0af8f f [.] SearchTable
1.75% firefox [kernel] 0xffffffffff60013b k [.] vread_hpet
1.63% firefox /lib64/libpthread-2.10.1.so 0x000000000000a404 d [.] __pthread_mutex_unlock
1.47% firefox /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x00000000000482ea f [.] js_Interpret
1.42% firefox /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x000000000003eda3 f [.] JS_CallTracer
1.24% firefox [kernel] 0xffffffff8102ca4a k [k] read_hpet
1.16% firefox [kernel] 0xffffffff810f3dd4 k [k] fget_light
1.11% firefox /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x00000000000567ff f [.] js_TraceObject
0.98% firefox /usr/lib64/firefox-3.5.2/firefox 0x000000000000dd23 b [.] arena_ralloc
[acme@doppio linux-2.6-tip]$
The new field is just after the symbol address. To help in
figuring out symbol resolution bugs.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Brice Goglin reported:
> I can easily sort them by thread id, but I don't know how to match
> my 4 events with each group of 4 lines.
Also report the counter id and the time running/enabled
stats (in case the counter got time-shared).
Reported-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
perf.data file header
Brice Goglin reported that only the first result from a
multi-counter perf record --stat run is accurate, the
rest looks bogus.
A silly mistake made us re-read the first attribute for
every recorded attribute.
Reported-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Cc: paulus@samba.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
The callchain fractal mode builds each new total hits in a new
branch of profiling by using the parent's hits of the current
branch plus the hits of the children.
This is wrong, the total hits of a branch should be made of the
sum of every children hits, we must ignore the parent hits in
this scope.
This patch also fixes another mistake with the hit counting.
Now the rates are correct.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
perf_counter tools: update perf top manual page to reflect
current implementation.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Pressing any key which is not currently mapped to
functionality, based on startup command line options, displays
currently mapped keys, and prompts for input.
Pressing any unmapped key at the prompt returns the user to
display mode with variables unchanged. eg, pressing ? <SPACE>
<ESC> etc displays currently available keys, the value of the
variable associated with that key, and prompts.
Pressing same again aborts input.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Reimplement the software counters to deal with fast moving
event sources (such as tracepoints). This means being able
to generate multiple overflows from a single 'event' as well
as support throttling.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
individual counter display
Add [w]eighted hotkey. Pressing [w] toggles between displaying
weighted total of all counters, and the counter selected via
[E]vent select key.
------------------------------------------------------------------------------
PerfTop: 90395 irqs/sec kernel:16.1% [cache-misses/cache-references/instructions], (all, 4 CPUs)
------------------------------------------------------------------------------
weight samples pcnt RIP kernel function
______ _______ _____ ________________ _______________
1275408.6 10881 - 5.3% - ffffffff81146f70 : copy_page_c
553683.4 43569 - 21.3% - ffffffff81146f20 : clear_page_c
74075.0 6768 - 3.3% - ffffffff81147190 : copy_user_generic_string
40602.9 7538 - 3.7% - ffffffff81284ba2 : _spin_lock
26882.1 965 - 0.5% - ffffffff8109d280 : file_ra_state_init
[w]
------------------------------------------------------------------------------
PerfTop: 91221 irqs/sec kernel:14.5% [10000Hz cache-misses], (all, 4 CPUs)
------------------------------------------------------------------------------
weight samples pcnt RIP kernel function
______ _______ _____ ________________ _______________
47320.00 - 22.3% - ffffffff81146f20 : clear_page_c
14261.00 - 6.7% - ffffffff810992f5 : __rmqueue
11046.00 - 5.2% - ffffffff81146f70 : copy_page_c
7842.00 - 3.7% - ffffffff81284ba2 : _spin_lock
7234.00 - 3.4% - ffffffff810aa1d6 : unmap_vmas
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
interactive form
perf top used to have annotation support, but it has bitrotted and
removed.
This patch restores that: it allows the user to select any symbol
in kernel space for source level annotation on the fly, switch
between event counters and alter display variables. When symbol
details are being displayed, stopping annotation reverts to normal.
known keys:
[d] select display delay.
[e] select display entries (lines).
[E] select annotation event counter.
[f] select normal display count filter.
[F] select annotation display count filter (percentage).
[qQ] quit.
[s] select annotation symbol and start annotation.
[S] stop annotation, revert to normal display.
[z] toggle event count zeroing.
Sample:
------------------------------------------------------------------------------
PerfTop: 16719 irqs/sec kernel:78.7% [cache-misses/cache-references/instructions/cycles], (all, 4 CPUs)
------------------------------------------------------------------------------
Showing cache-misses for e1000_clean_rx_irq
Events Pcnt (>=3%)
0 0.0% /* adjust length to remove Ethernet CRC */
0 0.0% if (!(adapter->flags2 & FLAG2_CRC_STRIPPING))
0 0.0% length -= 4;
436 5.0% f039: 41 f6 84 24 5c 29 00 testb $0x1,0x295c(%r12)
0 0.0% f089: 8b 4d 84 mov -0x7c(%rbp),%ecx
0 0.0% f08c: 48 83 ef 02 sub $0x2,%rdi
0 0.0% f090: 48 83 ee 02 sub $0x2,%rsi
811 9.3% f094: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
0 0.0%
0 0.0% while (rx_desc->status & E1000_RXD_STAT_DD) {
0 0.0% f114: 41 f6 47 0c 01 testb $0x1,0xc(%r15)
7226 82.6% f119: 0f 85 24 fe ff ff jne ef43 <e1000_clean_rx_irq+0x84>
Available events:
0 cache-misses
1 cache-references
2 instructions
3 cycles
Enter details event counter: 2
------------------------------------------------------------------------------
PerfTop: 15035 irqs/sec kernel:79.0% [cache-misses/cache-references/instructions/cycles], (all, 4 CPUs)
------------------------------------------------------------------------------
Showing instructions for e1000_clean_rx_irq
Events Pcnt (>=3%)
0 0.0% int *work_done, int work_to_do)
0 0.0% {
175 0.9% eebf: 55 push %rbp
1898 9.8% eec0: 48 89 e5 mov %rsp,%rbp
0 0.0%
0 0.0% i = rx_ring->next_to_clean;
140 0.7% ef0a: 0f b7 41 1a movzwl 0x1a(%rcx),%eax
670 3.4% ef0e: 89 45 ac mov %eax,-0x54(%rbp)
0 0.0% {
0 0.0% memcpy(skb->data + offset, from, len);
91 0.5% f07b: 49 8b b6 e8 00 00 00 mov 0xe8(%r14),%rsi
1153 5.9% f082: 48 8b b8 e8 00 00 00 mov 0xe8(%rax),%rdi
42 0.2% f089: 8b 4d 84 mov -0x7c(%rbp),%ecx
14 0.1% f08c: 48 83 ef 02 sub $0x2,%rdi
0 0.0% f090: 48 83 ee 02 sub $0x2,%rsi
1618 8.3% f094: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
0 0.0%
0 0.0% /* return some buffers to hardware, one at a time is too slow */
0 0.0% if (cleaned_count >= E1000_RX_BUFFER_WRITE) {
867 4.5% f0e7: 83 7d b0 0f cmpl $0xf,-0x50(%rbp)
0 0.0%
0 0.0% while (rx_desc->status & E1000_RXD_STAT_DD) {
37 0.2% f114: 41 f6 47 0c 01 testb $0x1,0xc(%r15)
4047 20.8% f119: 0f 85 24 fe ff ff jne ef43 <e1000_clean_rx_irq+0x84>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
This patch implements the kernel side support for ftrace event
record sampling.
A new counter sampling attribute is added:
PERF_SAMPLE_TP_RECORD
which requests ftrace events record sampling. In this case
if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
fires, we emit the tracepoint binary record to the
perfcounter event buffer, as a sample.
Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
record:
perf record -f -F 1 -a -e workqueue:workqueue_execution
perf report -D
0x21e18 [0x48]: event: 9
.
. ... raw event: size 72 bytes
. 0000: 09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff ......H........
. 0010: 0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00 ........!......
. 0020: 2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e +...........eve
. 0030: 74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00 ts/1...........
. 0040: e0 b1 31 81 ff ff ff ff .......
.
0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33
The raw ftrace binary record starts at offset 0020.
Translation:
struct trace_entry {
type = 0x2b = 43;
flags = 1;
preempt_count = 2;
pid = 0xa = 10;
tgid = 0xa = 10;
}
thread_comm = "events/1"
thread_pid = 0xa = 10;
func = 0xffffffff8131b1e0 = flush_to_ldisc()
What will come next?
- Userspace support ('perf trace'), 'flight data recorder' mode
for perf trace, etc.
- The unconditional copy from the profiling callback brings
some costs however if someone wants no such sampling to
occur, and needs to be fixed in the future. For that we need
to have an instant access to the perf counter attribute.
This is a matter of a flag to add in the struct ftrace_event.
- Take care of the events recursivity! Don't ever try to record
a lock event for example, it seems some locking is used in
the profiling fast path and lead to a tracing recursivity.
That will be fixed using raw spinlock or recursivity
protection.
- [...]
- Profit! :-)
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Adds possible second part to the assign argument of TP_EVENT().
TP_perf_assign(
__perf_count(foo);
__perf_addr(bar);
)
Which, when specified make the swcounter increment with @foo instead
of the usual 1, and report @bar for PERF_SAMPLE_ADDR (data address
associated with the event) when this triggers a counter overflow.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Merge reason: Merge up to almost-rc6 to pick up latest perfcounters
(on which we'll queue up a dependent fix)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|