PFMON(1) User’s command PFMON(1)
NAME
pfmon - a hardware-based performance monitoring tool
SYNOPSIS
pfmon [OPTION] [PROGNAME]
DESCRIPTION
The pfmon tool is a command line performance monitoring tool using the
perfmon interface to access to hardware performance counters of certain
processors. This version supports the following processors:
Itanium processors
Itanium, Itanium 2 (McKinley, Madison and variants), Dual-Core
Itanium 2 (Montecito). Pfmon runs with any 2.6.x kernels for
Itanium processors.
AMD X86-64 processors
You need to have a kernel with perfmon v2.2 or higher or pfmon
to work.
Intel Pentium M and P6 processors
You need to have a kernel with perfmon v2.2 or higher or pfmon
to work.
With pfmon, it is possible to monitor a single thread or the entire
system. It is also possible to monitorin multi-process and multi-
threaded programs. For each, it is possible to collect simple counts
or profiles.
The set of events that can be measured depends on the underlying pro-
cessor. Similarly certains options are specific to a processor model.
In general pfmon gives acess to all processor-specific monitoring fea-
tures.
generic options
Pfmon provides the following options on all processors:
-h or --help
display list of available options and exit
-V or --version
print pfmon version information and exit
-l[regex] or --show-events[=regex]
If regex is not provided, pfmon lists the names of all available
events for the current processor. Otherwise only the events
matching the regular expression are printed.
--long-show-events[=regex]
If regex is not provided, pfmon lists all available events for
the current processor with a abbreviated list of attributes all
one one line. Otherwise only the events matching the regular
expression are printed.
-i event or --event-info=event
Display detailed information about an event. The event parameter
can either be the event code, the event name, or a regular
expression. In case multiple events match the expression, they
are all printed.
-u, -3, or --user-level
Monitor at the user level for all events. By default, this
option is turned on.
-k, -0, or --kernel-level
Monitor at the kernel level for all events. By default, this
option is turned off.
-1 Monitor execution at privilege level 1. By default, this option
is turned off.
-2 Monitor execution at privilege level 2. By default, this option
is turned off.
-e ev1,ev2,... or --events=ev1,ev2,...
Select events to monitor. The events are specified by name or
event code. If there are multiple events, they must be passed as
a comma separated list without spaces. The maximum number of
events depends on the underlying processors. Each -e option
forms a set of events, multiple sets can be defined by specify-
ing the -e option multiple times. Events related options always
apply to the last defined sets. All events from a set are mea-
sured together. Pfmon uses the perfmon interface to multiplex
the sets on the actual processors. In case multiple sets are
used, pfmon scales the final count to provides estimates of
what the actual count would have been had all the events been
measured throughout the entire duration of the run. Pfmon does
not re-arrange events between sets in case they cannot be mea-
sured together.
-I or --info
Print information related to the pfmon version, the support pro-
cessor models and built-in sampling modules.
-t secs or --session-timeout=secs
Duration of the monitoring session expressed in seconds. Once
the timeout expires, pfmon stops monitoring and print final
counts or profiles.
-S format or --smpl-module-info=format
Display information about a sampling module.
--debug
Enable debug output (for experts).
--verbose
Print more information about the execution of pfmon.
--outfile=filename
Print final counts in the file called filename. By default, all
results (count or profiles) are printed on the terminal.
--append
Append results (counts or profile) to the current output file.
If --outfile or --smpl-outfile are not provided results are
printed on the screen.
--overflow-block
Block the monitored thread when the sampling buffer becomes
full. This option is only available in per-thread mode. By
default, this option is turned off meaning tha the monitored
thread keeps on running, with monitoring disabled, while pfmon
is processing the sampling buffer. In other words, there may be
blind spots.
--system-wide
Create a system wide monitoring session where pfmon measured all
threads running on a set of processors. By default this option
is turned off, i.e., pfmon operates in per-thread mode. By
default, system-wide mode measures the same events on all avail-
able processors. It is possible to restrict to a subset of pro-
cessor using the --cpu-list option.
--smpl-outfile=filename
Save profiles into the file called filename. By default, pro-
files are printed on the terminal.
--long-smpl-periods=val1,val2,...
Set the sampling period to reload into the overflowed counter(s)
after the last sample is recorded into the sampling buffer, i.e.
when the buffer becomes full. The values must be passed in the
same order as the events the refer to. For instance, if the
events are passed as -eev1,ev2 then sampling periods for ev1
must be the first, and for ev2, it must be the second. It is
possible to skip a period, by providing an empty element in the
list, e.g., --long-smpl-periods=,val2. Sampling periods are
expressed in the same unit as the event, they refer to. If an
event counts the number of instructions retired, then the sam-
pling period is using the same unit, i.e., instructions retired.
To sampling every 100,000 instructions, you can pass --long-
smpl-periods=100000.
--short-smpl-periods=val1,val2,...
Set the sampling to reload into the overflowed counter(s) after
a sample is recorded into the buffer and when that sample is not
the last, i.e., when the buffer still has space remaining. Other
than that, this option works exactly like --long-smpl-periods.
--smpl-entries=n
Selects the number of samples that the kernel sampling buffer
can hold. The default size is determined dynamically by pfmon
based on the size of a sample and system resource limits such as
the amount of locked memory allowed for a user process (as
reported by ulimit).
--with-header
Generates a header before printing counts or profiles. The
header contains information about the configuration of the host
systems and about the measurement being made.
--cpu-list=num,num1-num2,...
For system-wide mode, this option specifies the list of proces-
sors to monitor. Without this option, all available processors
are monitored. Processors can be specified individually with
their index, or by range.
--aggregate-results
aggregate counts and profiles output. By default, this option is
off meaning that results are per-thread or per-CPU.
--trigger-code-start-address=addr
Start monitoring the first time code executes at address addr.
The address can be specified in hexadecimal or with a symbol.
--trigger-code-stop-address=addr
Stop monitoring the first time code executes at address addr.
The address can be specified in hexadecimal or with a symbol.
--trigger-data-start-address=addr
Start monitoring when the data address at address addr is
accessed. By default, this is for any read or write access.
--trigger-data-stop-address=addr
Stop monitoring when data address at address addr is accessed.
By default, this is for any read of write access.
--trigger-code-repeat
By default, the start and stop code triggers are activated only
the first time they are reached. With this option, it is possi-
ble to repeat the start/stop behavior each time the execution
crosses the trigger address.
--trigger-code-follow
Apply the start/stop code triggers to all monitored threads. By
default, triggers are only applied to the first thread. This
option has no effect on system-wide measurements.
--trigger-data-repeat
By default, the start and stop data triggers are activated only
the first time they are reached. With this option, it is possi-
ble to repeat the start/stop behavior each time the data address
is accessed.
--trigger-data-follow
Apply the start/stop data triggers to all monitored threads. By
default, triggers are only applied to the first thread. This
option has no effect on system-wide measurements.
--trigger-data-ro
Data trigger are activated on read access only. By default, they
are activated on read or write access.
--trigger-data-wo
Data trigger activated on write access only. By default, they
are activated on read or write access.
--trigger-start-delay=secs
Number of seconds before activating monitoring. By default, mon-
itoring is activated immediatly, except when code/data triggers
are used.
Set privilege level per event. The levels apply to the current set,
i.e. the
last -e option. The levels are specified in the same order as
the events. Accepted values for privileges are: u, k, 0, 1, 2,
3 or any combinations thereof.
--us-counter-format
Print counts using commas, e.g., 1,024.
--eu-counter-format
Print count using points, e.g., 1.024.
--hex-counter-format
Print count using hexadecimal, e.g., 0x400.
--smpl-module=name
Select the sampling module. By default the first module that
matches the PMU model is used. This is typically the detailed-*
module. To figure out which modules are supports, use the -I
option.
--show-time
Show real,user, and system time for the command executed in per-
thread mode.
--symbol-file=filename
ELF image containing the symbol table for the command being mon-
itored. By default, pfmon uses the binary image on disk.
--sysmap-file=filename
System.map format file containing the kernel symbol table.
--check-events-only
Verify combination of events and exit. No measurement is per-
formed.
--smpl-periods-random=mask1:seed1,...
Apply randomization to long and short periods. For each period,
a seed and a mask value must be passed. The mask is a bitmask
representing the range of variation for randomization. The seed
can be any value.
--smpl-print-counts
When sampling, the final counts for the counters are not printed
by default. This option forces counts to be printed at the end
of a sampling measurement.
--attach-task pid
Attach to thread identified by pid that is already running. User
must have permission to attach to the thread.
--reset-non-smpl-periods
At the end of a sampling period, reset all counters.
--follow-fork
Monitoring continues across fork(). By default monitoring is not
propagated to child processes. This option has no effect in sys-
tem-wide mode.
--follow-vfork
Monitoring continues across vfork(). By default monitoring is
not propagated to child processes. This option has no effect in
system-wide mode.
--follow-pthread
Monitoring continues across pthread_create(). By default moni-
toring is not propagated to new threads. This option has no
effect in system-wide mode.
--follow-exec[=pattern]
Monitoring follows through the exec*() system call. By default
monitoring stops at exec*(). It is possible to specify a regular
expression pattern to filter out which command gets monitored.
Without the pattern all commands are monitored.
--follow-exec-exclude=pattern
Monitoring follows through the exec*() system call. By default
monitoring stops at exec*(). This option is the counter-part of
--follow-exec in that the pattern specifies the command which
must be excluded from monitoring. Depending on the monitored
workload, it may be easier to specify the commands to excludes
rather than the commands to include.
--follow-all
This option is equivalent to specifying all of --follow-fork,
--follow-vfork, --follow-pthreads, --follow-exec.
--no-cmd-output
Redirect all output of executed commands to /dev/null.
--exec-split-results
Generate separate results output for execution before and after
exec*().
--resolve-addresses
Resolve all code/data addresses in profiles using symbol table
information. If the symbol information is not present, the raw
address is printed. By default, only raw addresses are printed.
--extra-smpl-pmds=num,num1-num2,...
Specify a list of extra PMD register to include in samples.
Those PMD registers are typically virtual PMD registers not tied
to counters.
--demangle-cpp
C++ symbol demangling. By default, no symbol demangling is per-
formed.
--demangle-java
Java symbol demangling. By default, no symbol demangling is per-
formed.
--saturate-smpl-buffer
Stop collecting samples the first time the sampling buffer
becomes full. In other words, simply collect the first N entries
when --smpl-entries=N. By default, this option is off.
--pin-command
Pin executed command on the CPUs specified by --cpu-list. This
option is only relavant in system-wide mode.
--switch-timeout=milliseconds
The number of milliseconds before switching from one event set
to the next. Depending on the granularity of the underlying
operating system timer tick, the timeout may be rounded up. If
the difference with the user provided timeout exeeds 2%, pfmon
prints a warning message.
--dont-start
Do not activate monitoring. This option is useful on architec-
tures where it is possible to start/stop counters directly from
the user level.
--excl-idle
Exclude idle threads from system-wide measurement.
--cpu-set-relative
With this option, CPU identifications for --cpu--list are rela-
tive to cpu_set affinity. By default, they are relative to
actual CPU0.
SEE ALSO
Visit http://perfmon2.sf.net for more detailed documentation including
processor specific options.
AUTHOR
Stephane Eranian <eranian@hpl.hp.com>
pfmon 3.2 April 2006 PFMON(1)