catalogs
- I. Overview
- 1、perf role
- 2. Commonly used tool sets
- II. Use of the perf tool
- 1、perf list
- 2、perf stat
- 3、perf top
- 4、perf record/report
- 4.1 perf record
- 4.2 perf report
- 5、perf annotate
I. Overview
1、perf role
perf is a performance analysis tool (based on performance events provided by the Linux kernel).perf_event
port) for performance tuning and performance analysis of Linux systems. It can provide detailed performance statistics by collecting hardware performance counters, tracking system events, and sampling program call stacks.
perf relies on events, which are counted through a sampling mechanism and not at the clock level; depending on the perf tool used, the counts are done by the type of event measured.
2. Commonly used tool sets
In addition to the perf command itself, there are a number of common toolsets that can be used with perf for more in-depth performance analysis and tuning. The following are some commonly used perf tools:
-
perf stat
: Used to collect and display performance counter statistics, which can be accessed via theperf stat
command to monitor the overall performance metrics of a process or command, such as instruction count, cache hit rate, branch prediction errors, etc. -
perf record
: Used to capture hardware performance counter data, event and call stack information during program execution and save it to a data file. This can be done using theperf record
command to initiate sampling with theperf report
command to analyze the sampled data. -
perf report
: used to analyze the data passed through theperf record
collected performance sampling data and generate performance analysis reports. It is possible to use theperf report
commands to view call stack information, function elapsed time, performance hotspots, etc. -
perf top
: Used to monitor the performance metrics of a process in real time and display current performance hotspots. It is possible to use theperf top
commands to view CPU usage, function execution counts, event counts, etc. -
perf annotate
: Used to display sampled data and call stack information as source code, with performance metrics labeled for each source code line. This can be done using theperf annotate
command to view performance hotspots and optimization recommendations. -
perf diff
: Used to compare and analyze the performance differences between two different versions of a program. This can be done using theperf diff
command to compare two perf data files and generate a performance difference report. -
perf probe
: Used to dynamically add and remove performance probes to collect performance data for specific code paths. Performance probes can be added and removed dynamically using theperf probe
command to add a probe with theperf record
respond in singingperf report
command to collect and analyze probe data.
There are also a number of targeted performance checking tools: lock for locks; sched for scheduling; kmem for slab allocator performance; and probe for customized checkpoints. They can be accessed via the command:perf
maybeperf -h
Come check it out:
[projectsauron]:~/$ perf -h ## or perf
usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]
The most commonly used perf commands are:
annotate Read (created by perf record) and display annotated code
archive Create archive with object files with build-ids found in file
bench General framework for benchmark suites
buildid-cache Manage build-id cache.
buildid-list List the buildids in a file
c2c Shared Data C2C/HITM Analyzer.
config Get and set variables in a configuration file.
daemon Run record sessions on background
data Data file related processing
diff Read files and display the differential profile
evlist List the event names in a file
ftrace simple wrapper for kernel's ftrace functionality
inject Filter to augment the events stream with additional information
iostat Show I/O performance metrics
kallsyms Searches running kernel for symbols
kmem Tool to trace/measure kernel memory properties
kvm Tool to trace/measure kvm guest os
list List all symbolic event types
lock Analyze lock events
mem Profile memory accesses
record Run a command and record its profile into
report Read (created by perf record) and display the profile
sched Tool to trace/measure scheduler properties (latencies)
script Read (created by perf record) and display trace output
stat Run a command and gather performance counter statistics
test Runs sanity tests.
timechart Tool to visualize total system behavior during a workload
top System profiling tool.
version display the version of perf binary
probe Define new dynamic tracepoints
trace strace inspired tool
See 'perf help COMMAND' for more information on a specific command.
II. Use of the perf tool
1、perf list
perf itself is based on the event counting mechanism provided by the kernel, using theperf list
command to see that there are three main types of events that make up these events:
-
Hardware event
: by PMU (Performance Monitoring Unit, performance detection unit) generated events such as L1 caching, etc. -
Software event
: Events generated by the kernel, such as process switches. -
Tracepoints event
: Events triggered by kernel static tracepoints.
2、perf stat
perf stat
Mainly counts the supported event counts during program execution, simply outputting them on the screen. This can be done using theperf stat [options] cmd
The cmd` command is executed in this way, and at the end of the execution, the statistics of each type of event are output.
perf stat
The options for the command are as follows (via the commandperf stat -h
(View):
- -a: Displays statistics on all CPUs.
- -c: Displays statistics on the specified CPU.
- -e: Specifies the event to be displayed.
- -i: Prohibit child tasks from inheriting the parent task's performance counters.
- -r: Repeat the execution of the target program n times and give the range of variation of the performance metrics over the n executions.
- -p: Specifies the ID of the process to be displayed.
- -t: Specifies the ID of the thread to be displayed.
For example, test execution script files :
The description of the parameter displayed above is as follows:
-
task-clock
: The amount of processor time in ms that the task actually occupies.(CPU occupancy = task-clock / time elapsed) -
context-switches
: The number of context switches. -
CPU-migrations
: Processor migration counts, where a task is migrated to another CPU under certain conditions in order to maintain multiprocessor load balancing. -
page-faults
: The number of missing page exceptions. A missing page exception is triggered when the page requested by the application has not yet been created, when the requested page is not in memory, or when the requested page is in memory but the mapping between physical and virtual addresses has not yet been established. Missing page exceptions are also triggered by TLB misses and page access mismatches. -
cycles
: The number of processor cycles consumed. -
instructions
IPC is the average number of instructions executed per cpu cycle. -
branches
: Number of branch instructions encountered. -
branches-misses
: is the number of branch instructions that are predicted to be incorrect.
3、perf top
perf top
The use of the tool is similar to the Linux top command, the real-time output function samples the results of sorting by a statistical event, the default event is cycles (the number of processor cycles consumed), and the default is sorted in descending order;perf top
Functions that count all user and kernel states, default is all CPUs, you can also specify a CPU monitor.
perf top
A real-time performance statistics report can be provided to show the performance bottlenecks that are currently occurring on the system. By analyzing these statistics, we can quickly locate and resolve issues.
Common Parameters (via command)perf top -h
(View):
- -a: Displays performance statistics on all CPUs.
- -c<n>: Specify the sampling period
- -C<cpu>: Displays performance statistics on the specified CPU.
- -e: Specify performance events
- -g: show the calling relationship (by moving the cursor up and down, enter to expand)
- -K: Hide kernel statistics
- -p: Specify the process PID
- -s: Specify the symbol information to be resolved
- -t: Specify the thread TID
- -U: Hide userspace statistics
4、perf record/report
This can be done byperf record cmd
to perform statistics against cmd commands. Collects performance events over time to a file (the default), which then needs to be used with theperf report
Command analysis. Individual thread, process, or CPU events can be counted. By default, events are also counted by cycles, and the default average counting frequency is 1000 cycles per second, or 1000Hz.
4.1 perf record
perf record
command is used to collect data and write the data to a data file.
perf record
Commonly used options are (via the commandperf record -h
(View):
- -a: Analyze the performance of the entire system
- -A: Write the output file as append
- -c: Sampling period for events
- -C: Collects only specified CPU data
- -e: Select performance events, either hardware or software events
- -f: Write the output file as OverWrite
- -g: Record call relationships between functions
- -o: Specify the output file, the default is
- -p: Specify a process ID to capture process-specific data.
- -t: Specify the ID of a thread to collect data from a specific thread.
For example, using a frequency of 1000 statistics, count the events on all CPUs during a sleep for 5 seconds:
root@projectsauron:~/# perf record -a -F 1000 sleep 5
[ perf record: Woken up 17 times to write data ]
[ perf record: Captured and wrote 5.204 MB (80049 samples) ]
4.2 perf report
perf report
rightperf record
The generated data files are analyzed.
perf report
Commonly used options are (via the commandperf report-h
(View):
- -c<n>: Specify the sampling period
- -C<cpu>: Display information only for the specified CPU
- -d<dos>: displays only the symbols for the specified dos
-
-g: Generate a function call relationship graph, specifically equivalent to the
perf top
The -g in the command - -i: The name of the imported data file, default is
- -M: Displayed in the specified assembly instruction style
- –sort: Categorized statistical information, such as PID, COMM, CPU, etc.
- -S: Only the specified symbols are considered
- -U: Display only resolved symbols
- -v: displays the address of each symbol
The following is a summary of the aboveperf record
The generated data files are analyzed:
root@projectsauron:~/# perf report-i
5、perf annotate
perf annotate
Used to analyze and display the performance characteristics of a given function or instruction. It provides instruction-level record file location. Files compiled with debug info -g can display assembly and its own source code information.
However, note that the annotate command does not parse the symbols in the kernel image. You must pass an uncompressed kernel image to annotate to parse the kernel symbols properly, for example:perf annotate -k /tmp/vmlinux -d symbol
。
perf annotate
It can help us gain insight into the hot code in the program, including function calls, loops, etc., as well as the performance characteristics of this code, such as execution time, cache hit rate, etc. By analyzing these performance characteristics, we can understand where the bottleneck of the program is and optimize it.
perf annotate
Commonly used options are (via the commandperf annotate-h
(View):
- -C<cpu>: Specify a CPU event
- -d: resolves only the symbols in the specified file
- -i: Specify the input file
- -k: Specify the kernel file
- -s: Specify symbol positioning
Example:
- First, write a , which reads as follows:
#include <>
#include <>
void func_a() {
unsigned int num = 1;
for (int i = 0;i < 10000000; ++i) {
num *= 2;
num = 1;
}
}
void func_b() {
unsigned int num = 1;
for (int i = 0;i < 10000000; ++i) {
num <<= 1;
num = 1;
}
}
int main() {
func_a();
func_b();
return 0;
}
-
Then use the gcc command to compile:
gcc -g -O0 -o main
. (-g is debug information, preserves symbol tables, etc.; -O0 means no optimization) -
Execute the statistics command:
perf record -a -g ./main
root@projectsauron:~# perf record -a -g ./main
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.266 MB (2474 samples) ]
- View Results
fulfillmentperf report -i
:
fulfillmentperf annotate -i
:
func_a /home/projectsauron/test/main
│ void func_a() {
│ push %rbp
│ mov %rsp,%rbp
│ unsigned int num = 1;
│ movl $0x1,-0x8(%rbp)
│ int i;
│ for (i = 0;i < 10000000; i++) {
│ movl $0x0,-0x4(%rbp)
│ ↓ jmp 22
│ num *= 2;
11.11 │14:┌─→shll -0x8(%rbp)
│ │ num = 1;
│ │ movl $0x1,-0x8(%rbp)
│ │#include <stdio.h>
│ │#include <time.h>
│ │void func_a() {
│ │ unsigned int num = 1;
│ │ int i;
│ │ for (i = 0;i < 10000000; i++) {
5.56 │ │ addl $0x1,-0x4(%rbp)
33.33 │22:│ cmpl $0x98967f,-0x4(%rbp)
50.00 │ └──jle 14
│ num *= 2;
│ num = 1;
│ }
│ }
│ pop %rbp
│ ← retq