Using perf on Linux

Updated to 15 hours ago

catalogs

I. Overview
- 1、perf role
- 2. Commonly used tool sets
II. Use of the perf tool
- 1、perf list
- 2、perf stat
- 3、perf top
- 4、perf record/report
- - 4.1 perf record
  - 4.2 perf report
- 5、perf annotate

I. Overview

1、perf role

perf is a performance analysis tool (based on performance events provided by the Linux kernel).perf_event port) for performance tuning and performance analysis of Linux systems. It can provide detailed performance statistics by collecting hardware performance counters, tracking system events, and sampling program call stacks.

perf relies on events, which are counted through a sampling mechanism and not at the clock level; depending on the perf tool used, the counts are done by the type of event measured.

2. Commonly used tool sets

In addition to the perf command itself, there are a number of common toolsets that can be used with perf for more in-depth performance analysis and tuning. The following are some commonly used perf tools:

perf stat: Used to collect and display performance counter statistics, which can be accessed via theperf stat command to monitor the overall performance metrics of a process or command, such as instruction count, cache hit rate, branch prediction errors, etc.
perf record: Used to capture hardware performance counter data, event and call stack information during program execution and save it to a data file. This can be done using theperf record command to initiate sampling with theperf report command to analyze the sampled data.
perf report: used to analyze the data passed through theperf record collected performance sampling data and generate performance analysis reports. It is possible to use theperf report commands to view call stack information, function elapsed time, performance hotspots, etc.
perf top: Used to monitor the performance metrics of a process in real time and display current performance hotspots. It is possible to use theperf top commands to view CPU usage, function execution counts, event counts, etc.
perf annotate: Used to display sampled data and call stack information as source code, with performance metrics labeled for each source code line. This can be done using theperf annotate command to view performance hotspots and optimization recommendations.
perf diff: Used to compare and analyze the performance differences between two different versions of a program. This can be done using theperf diff command to compare two perf data files and generate a performance difference report.
perf probe: Used to dynamically add and remove performance probes to collect performance data for specific code paths. Performance probes can be added and removed dynamically using theperf probe command to add a probe with theperf record respond in singingperf report command to collect and analyze probe data.

There are also a number of targeted performance checking tools: lock for locks; sched for scheduling; kmem for slab allocator performance; and probe for customized checkpoints. They can be accessed via the command:perf maybeperf -h Come check it out:

[projectsauron]:~/$ perf -h ## or perf

 usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]

 The most commonly used perf commands are:
   annotate        Read  (created by perf record) and display annotated code
   archive         Create archive with object files with build-ids found in  file
   bench           General framework for benchmark suites
   buildid-cache   Manage build-id cache.
   buildid-list    List the buildids in a  file
   c2c             Shared Data C2C/HITM Analyzer.
   config          Get and set variables in a configuration file.
   daemon          Run record sessions on background
   data            Data file related processing
   diff            Read  files and display the differential profile
   evlist          List the event names in a  file
   ftrace          simple wrapper for kernel's ftrace functionality
   inject          Filter to augment the events stream with additional information
   iostat          Show I/O performance metrics
   kallsyms        Searches running kernel for symbols
   kmem            Tool to trace/measure kernel memory properties
   kvm             Tool to trace/measure kvm guest os
   list            List all symbolic event types
   lock            Analyze lock events
   mem             Profile memory accesses
   record          Run a command and record its profile into 
   report          Read  (created by perf record) and display the profile
   sched           Tool to trace/measure scheduler properties (latencies)
   script          Read  (created by perf record) and display trace output
   stat            Run a command and gather performance counter statistics
   test            Runs sanity tests.
   timechart       Tool to visualize total system behavior during a workload
   top             System profiling tool.
   version         display the version of perf binary
   probe           Define new dynamic tracepoints
   trace           strace inspired tool

 See 'perf help COMMAND' for more information on a specific command.

II. Use of the perf tool

1、perf list

perf itself is based on the event counting mechanism provided by the kernel, using theperf list command to see that there are three main types of events that make up these events:

Hardware event: by PMU (Performance Monitoring Unit, performance detection unit) generated events such as L1 caching, etc.
Software event: Events generated by the kernel, such as process switches.
Tracepoints event: Events triggered by kernel static tracepoints.

2、perf stat

perf stat Mainly counts the supported event counts during program execution, simply outputting them on the screen. This can be done using theperf stat [options] cmd The cmd` command is executed in this way, and at the end of the execution, the statistics of each type of event are output.

perf stat The options for the command are as follows (via the commandperf stat -h (View):

-a: Displays statistics on all CPUs.
-c: Displays statistics on the specified CPU.
-e: Specifies the event to be displayed.
-i: Prohibit child tasks from inheriting the parent task's performance counters.
-r: Repeat the execution of the target program n times and give the range of variation of the performance metrics over the n executions.
-p: Specifies the ID of the process to be displayed.
-t: Specifies the ID of the thread to be displayed.

For example, test execution script files :

The description of the parameter displayed above is as follows:

task-clock: The amount of processor time in ms that the task actually occupies.(CPU occupancy = task-clock / time elapsed)
context-switches: The number of context switches.
CPU-migrations: Processor migration counts, where a task is migrated to another CPU under certain conditions in order to maintain multiprocessor load balancing.
page-faults: The number of missing page exceptions. A missing page exception is triggered when the page requested by the application has not yet been created, when the requested page is not in memory, or when the requested page is in memory but the mapping between physical and virtual addresses has not yet been established. Missing page exceptions are also triggered by TLB misses and page access mismatches.
cycles: The number of processor cycles consumed.
instructionsIPC is the average number of instructions executed per cpu cycle.
branches: Number of branch instructions encountered.
branches-misses: is the number of branch instructions that are predicted to be incorrect.

3、perf top

perf top The use of the tool is similar to the Linux top command, the real-time output function samples the results of sorting by a statistical event, the default event is cycles (the number of processor cycles consumed), and the default is sorted in descending order;perf top Functions that count all user and kernel states, default is all CPUs, you can also specify a CPU monitor.

perf top A real-time performance statistics report can be provided to show the performance bottlenecks that are currently occurring on the system. By analyzing these statistics, we can quickly locate and resolve issues.

Common Parameters (via command)perf top -h (View):

-a: Displays performance statistics on all CPUs.
-c<n>: Specify the sampling period
-C<cpu>: Displays performance statistics on the specified CPU.
-e: Specify performance events
-g: show the calling relationship (by moving the cursor up and down, enter to expand)
-K: Hide kernel statistics
-p: Specify the process PID
-s: Specify the symbol information to be resolved
-t: Specify the thread TID
-U: Hide userspace statistics

4、perf record/report

This can be done byperf record cmd to perform statistics against cmd commands. Collects performance events over time to a file (the default), which then needs to be used with theperf report Command analysis. Individual thread, process, or CPU events can be counted. By default, events are also counted by cycles, and the default average counting frequency is 1000 cycles per second, or 1000Hz.

4.1 perf record

perf recordcommand is used to collect data and write the data to a data file.

perf record Commonly used options are (via the commandperf record -h (View):

-a: Analyze the performance of the entire system
-A: Write the output file as append
-c: Sampling period for events
-C: Collects only specified CPU data
-e: Select performance events, either hardware or software events
-f: Write the output file as OverWrite
-g: Record call relationships between functions
-o: Specify the output file, the default is
-p: Specify a process ID to capture process-specific data.
-t: Specify the ID of a thread to collect data from a specific thread.

For example, using a frequency of 1000 statistics, count the events on all CPUs during a sleep for 5 seconds:

root@projectsauron:~/# perf record -a -F 1000 sleep 5
[ perf record: Woken up 17 times to write data ]
[ perf record: Captured and wrote 5.204 MB  (80049 samples) ]

4.2 perf report

perf report rightperf record The generated data files are analyzed.

perf report Commonly used options are (via the commandperf report-h (View):

-c<n>: Specify the sampling period
-C<cpu>: Display information only for the specified CPU
-d<dos>: displays only the symbols for the specified dos
-g: Generate a function call relationship graph, specifically equivalent to theperf top The -g in the command
-i: The name of the imported data file, default is
-M: Displayed in the specified assembly instruction style
–sort: Categorized statistical information, such as PID, COMM, CPU, etc.
-S: Only the specified symbols are considered
-U: Display only resolved symbols
-v: displays the address of each symbol

The following is a summary of the aboveperf record The generated data files are analyzed:

root@projectsauron:~/# perf report-i

5、perf annotate

perf annotate Used to analyze and display the performance characteristics of a given function or instruction. It provides instruction-level record file location. Files compiled with debug info -g can display assembly and its own source code information.

However, note that the annotate command does not parse the symbols in the kernel image. You must pass an uncompressed kernel image to annotate to parse the kernel symbols properly, for example:perf annotate -k /tmp/vmlinux -d symbol。

perf annotate It can help us gain insight into the hot code in the program, including function calls, loops, etc., as well as the performance characteristics of this code, such as execution time, cache hit rate, etc. By analyzing these performance characteristics, we can understand where the bottleneck of the program is and optimize it.

perf annotate Commonly used options are (via the commandperf annotate-h (View):

-C<cpu>: Specify a CPU event
-d: resolves only the symbols in the specified file
-i: Specify the input file
-k: Specify the kernel file
-s: Specify symbol positioning

Example:

First, write a , which reads as follows:

#include <>
#include <>

void func_a() {
   unsigned int num = 1;
   for (int i = 0;i < 10000000; ++i) {
      num *= 2;
      num = 1;
   }
}

void func_b() {
   unsigned int num = 1;
   for (int i = 0;i < 10000000; ++i) {
      num <<= 1;
      num = 1;
   }
}

int main() {
   func_a();
   func_b();
   return 0;
}

Then use the gcc command to compile:gcc -g -O0 -o main. (-g is debug information, preserves symbol tables, etc.; -O0 means no optimization)
Execute the statistics command:perf record -a -g ./main

root@projectsauron:~# perf record -a -g ./main
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.266 MB  (2474 samples) ]

View Results

fulfillmentperf report -i ：

fulfillmentperf annotate -i ：

func_a  /home/projectsauron/test/main           
       │    void func_a() {
       │      push   %rbp
       │      mov    %rsp,%rbp
       │       unsigned int num = 1;
       │      movl   $0x1,-0x8(%rbp)
       │       int i;
       │       for (i = 0;i < 10000000; i++) {
       │      movl   $0x0,-0x4(%rbp)
       │    ↓ jmp    22
       │          num *= 2;
 11.11 │14:┌─→shll   -0x8(%rbp)
       │   │      num = 1;
       │   │  movl   $0x1,-0x8(%rbp)
       │   │#include <stdio.h>
       │   │#include <time.h>
       │   │void func_a() {
       │   │   unsigned int num = 1;
       │   │   int i;
       │   │   for (i = 0;i < 10000000; i++) {
  5.56 │   │  addl   $0x1,-0x4(%rbp)
 33.33 │22:│  cmpl   $0x98967f,-0x4(%rbp)
 50.00 │   └──jle    14
       │          num *= 2;
       │          num = 1;
       │       }
       │    }
       │      pop    %rbp
       │    ← retq