Commit 802f0d58d52e for kernel

commit 802f0d58d52e8e34e08718479475ccdff0caffa0
Merge: 4e82c87058f4 35d13f841a3d
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Mar 31 08:52:33 2025 -0700

    Merge tag 'perf-tools-for-v6.15-2025-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

    Pull perf tools updates from Namhyung Kim:
     "perf record:

       - Introduce latency profiling using scheduler information.

         The latency profiling is to show impacts on wall-time rather than
         cpu-time. By tracking context switches, it can weight samples and
         find which part of the code contributed more to the execution
         latency.

         The value (period) of the sample is weighted by dividing it by the
         number of parallel execution at the moment. The parallelism is
         tracked in perf report with sched-switch records. This will reduce
         the portion that are run in parallel and in turn increase the
         portion of serial executions.

         For now, it's limited to profile processes, IOW system-wide
         profiling is not supported. You can add --latency option to enable
         this.

           $ perf record --latency -- make -C tools/perf

         I've run the above command for perf build which adds -j option to
         make with the number of CPUs in the system internally. Normally
         it'd show something like below:

           $ perf report -F overhead,comm
           ...
           #
           # Overhead  Command
           # ........  ...............
           #
               78.97%  cc1
                6.54%  python3
                4.21%  shellcheck
                3.28%  ld
                1.80%  as
                1.37%  cc1plus
                0.80%  sh
                0.62%  clang
                0.56%  gcc
                0.44%  perl
                0.39%  make
             ...

         The cc1 takes around 80% of the overhead as it's the actual
         compiler. However it runs in parallel so its contribution to
         latency may be less than that. Now, perf report will show both
         overhead and latency (if --latency was given at record time) like
         below:

           $ perf report -s comm
           ...
           #
           # Overhead   Latency  Command
           # ........  ........  ...............
           #
               78.97%    48.66%  cc1
                6.54%    25.68%  python3
                4.21%     0.39%  shellcheck
                3.28%    13.70%  ld
                1.80%     2.56%  as
                1.37%     3.08%  cc1plus
                0.80%     0.98%  sh
                0.62%     0.61%  clang
                0.56%     0.33%  gcc
                0.44%     1.71%  perl
                0.39%     0.83%  make
             ...

         You can see latency of cc1 goes down to around 50% and python3 and
         ld contribute a lot more than their overhead. You can use --latency
         option in perf report to get the same result but ordered by
         latency.

           $ perf report --latency -s comm

      perf report:

       - As a side effect of the latency profiling work, it adds a new
         output field 'latency' and a sort key 'parallelism'. The below is a
         result from my system with 64 CPUs. The build was well-parallelized
         but contained some serial portions.

           $ perf report -s parallelism
           ...
           #
           # Overhead   Latency  Parallelism
           # ........  ........  ...........
           #
               16.95%     1.54%           62
               13.38%     1.24%           61
               12.50%    70.47%            1
               11.81%     1.06%           63
                7.59%     0.71%           60
                4.33%    12.20%            2
                3.41%     0.33%           59
                2.05%     0.18%           64
                1.75%     1.09%            9
                1.64%     1.85%            5
                ...

       - Support Feodra mini-debuginfo which is a LZMA compressed symbol
         table inside ".gnu_debugdata" ELF section.

      perf annotate:

       - Add --code-with-type option to enable data-type profiling with the
         usual annotate output.

         Instead of focusing on data structure, it shows code annotation
         together with data type it accesses in case the instruction refers
         to a memory location (and it was able to resolve the target data
         type). Currently it only works with --stdio.

           $ perf annotate --stdio --code-with-type
           ...
            Percent |      Source code & Disassembly of vmlinux for cpu/mem-loads,ldlat=30/pp (18 samples, percent: local period)
           ----------------------------------------------------------------------------------------------------------------------
                    : 0                0xffffffff81050610 <__fdget>:
               0.00 :   ffffffff81050610:        callq   0xffffffff81c01b80 <__fentry__>           # data-type: (stack operation)
               0.00 :   ffffffff81050615:        pushq   %rbp              # data-type: (stack operation)
               0.00 :   ffffffff81050616:        movq    %rsp, %rbp
               0.00 :   ffffffff81050619:        pushq   %r15              # data-type: (stack operation)
               0.00 :   ffffffff8105061b:        pushq   %r14              # data-type: (stack operation)
               0.00 :   ffffffff8105061d:        pushq   %rbx              # data-type: (stack operation)
               0.00 :   ffffffff8105061e:        subq    $0x10, %rsp
               0.00 :   ffffffff81050622:        movl    %edi, %ebx
               0.00 :   ffffffff81050624:        movq    %gs:0x7efc4814(%rip), %rax  # 0x14e40 <current_task>              # data-type: struct task_struct* +0
               0.00 :   ffffffff8105062c:        movq    0x8d0(%rax), %r14         # data-type: struct task_struct +0x8d0 (files)
               0.00 :   ffffffff81050633:        movl    (%r14), %eax              # data-type: struct files_struct +0 (count.counter)
               0.00 :   ffffffff81050636:        cmpl    $0x1, %eax
               0.00 :   ffffffff81050639:        je      0xffffffff810506a9 <__fdget+0x99>
               0.00 :   ffffffff8105063b:        movq    0x20(%r14), %rcx          # data-type: struct files_struct +0x20 (fdt)
               0.00 :   ffffffff8105063f:        movl    (%rcx), %eax              # data-type: struct fdtable +0 (max_fds)
               0.00 :   ffffffff81050641:        cmpl    %ebx, %eax
               0.00 :   ffffffff81050643:        jbe     0xffffffff810506ef <__fdget+0xdf>
               0.00 :   ffffffff81050649:        movl    %ebx, %r15d
               5.56 :   ffffffff8105064c:        movq    0x8(%rcx), %rdx           # data-type: struct fdtable +0x8 (fd)
            ...

         The "# data-type:" part was added with this change. The first few
         entries are not very interesting. But later you can it accesses a
         couple of fields in the task_struct, files_struct and fdtable.

      perf trace:

       - Support syscall tracing for different ABI. For example it can trace
         system calls for 32-bit applications on 64-bit kernel
         transparently.

       - Add --summary-mode=total option to show global syscall summary. The
         default is 'thread' to show per-thread syscall summary.

      Python support:

       - Add more interfaces to 'perf' module to parse events, and config,
         enable or disable the event list properly so that it can implement
         basic functionalities purely in Python. There is an example code
         for these new interfaces in python/tracepoint.py.

       - Add mypy and pylint support to enable build time checking. Fix some
         code based on the findings from these tools.

      Internals:

       - Introduce io_dir__readdir() API to make directory traveral (usually
         for proc or sysfs) efficient with less memory footprint.

      JSON vendor events:

       - Add events and metrics for ARM Neoverse N3 and V3

       - Update events and metrics on various Intel CPUs

       - Add/update events for a number of SiFive processors"

    * tag 'perf-tools-for-v6.15-2025-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (229 commits)
      perf bpf-filter: Fix a parsing error with comma
      perf report: Fix a memory leak for perf_env on AMD
      perf trace: Fix wrong size to bpf_map__update_elem call
      perf tools: annotate asm_pure_loop.S
      perf python: Fix setup.py mypy errors
      perf test: Address attr.py mypy error
      perf build: Add pylint build tests
      perf build: Add mypy build tests
      perf build: Rename TEST_LOGS to SHELL_TEST_LOGS
      tools/build: Don't pass test log files to linker
      perf bench sched pipe: fix enforced blocking reads in worker_thread
      perf tools: Fix is_compat_mode build break in ppc64
      perf build: filter all combinations of -flto for libperl
      perf vendor events arm64 AmpereOneX: Fix frontend_bound calculation
      perf vendor events arm64: AmpereOne/AmpereOneX: Mark LD_RETIRED impacted by errata
      perf trace: Fix evlist memory leak
      perf trace: Fix BTF memory leak
      perf trace: Make syscall table stable
      perf syscalltbl: Mask off ABI type for MIPS system calls
      perf build: Remove Makefile.syscalls
      ...

diff --cc tools/lib/perf/Makefile
index e9a7ac2c062e,478fe57bf8ce..ffcfd777c451
--- a/tools/lib/perf/Makefile
+++ b/tools/lib/perf/Makefile
@@@ -39,15 -39,21 +39,8 @@@ libdir = $(prefix)/$(libdir_relative
  libdir_SQ = $(subst ','\'',$(libdir))
  libdir_relative_SQ = $(subst ','\'',$(libdir_relative))

 -ifeq ("$(origin V)", "command line")
 -  VERBOSE = $(V)
 -endif
 -ifndef VERBOSE
 -  VERBOSE = 0
 -endif
 -
 -ifeq ($(VERBOSE),1)
 -  Q =
 -else
 -  Q = @
 -endif
 -
  TEST_ARGS := $(if $(V),-v)

- # Set compile option CFLAGS
- ifdef EXTRA_CFLAGS
-   CFLAGS := $(EXTRA_CFLAGS)
- else
-   CFLAGS := -g -Wall
- endif
-
  INCLUDES = \
  -I$(srctree)/tools/lib/perf/include \
  -I$(srctree)/tools/lib/ \
diff --cc tools/perf/Makefile.perf
index 05c083bb1122,d335151736ed..979d4691221a
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@@ -158,9 -158,50 +158,9 @@@ ifneq ($(OUTPUT),
  VPATH += $(OUTPUT)
  export VPATH
  # create symlink to the original source
- SOURCE := $(shell ln -sf $(srctree)/tools/perf $(OUTPUT)/source)
+ SOURCE := $(shell ln -sfn $(srctree)/tools/perf $(OUTPUT)/source)
  endif

 -# Beautify output
 -# ---------------------------------------------------------------------------
 -#
 -# Most of build commands in Kbuild start with "cmd_". You can optionally define
 -# "quiet_cmd_*". If defined, the short log is printed. Otherwise, no log from
 -# that command is printed by default.
 -#
 -# e.g.)
 -#    quiet_cmd_depmod = DEPMOD  $(MODLIB)
 -#          cmd_depmod = $(srctree)/scripts/depmod.sh $(DEPMOD) $(KERNELRELEASE)
 -#
 -# A simple variant is to prefix commands with $(Q) - that's useful
 -# for commands that shall be hidden in non-verbose mode.
 -#
 -#    $(Q)$(MAKE) $(build)=scripts/basic
 -#
 -# To put more focus on warnings, be less verbose as default
 -# Use 'make V=1' to see the full commands
 -
 -ifeq ($(V),1)
 -  quiet =
 -  Q =
 -else
 -  quiet=quiet_
 -  Q=@
 -endif
 -
 -# If the user is running make -s (silent mode), suppress echoing of commands
 -# make-4.0 (and later) keep single letter options in the 1st word of MAKEFLAGS.
 -ifeq ($(filter 3.%,$(MAKE_VERSION)),)
 -short-opts := $(firstword -$(MAKEFLAGS))
 -else
 -short-opts := $(filter-out --%,$(MAKEFLAGS))
 -endif
 -
 -ifneq ($(findstring s,$(short-opts)),)
 -  quiet=silent_
 -endif
 -
 -export quiet Q
 -
  # Do not use make's built-in rules
  # (this improves performance and avoids hard-to-debug behaviour);
  MAKEFLAGS += -r