Merge tag 'perf-tools-for-v5.12-2020-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "New features:

   - Support instruction latency in 'perf report', with both memory
     latency (weight) and instruction latency information, users can
     locate expensive load instructions and understand time spent in
     different stages.

   - Extend 'perf c2c' to display the number of loads which were blocked
     by data or address conflict.

   - Add 'perf stat' support for L2 topdown events in systems such as
     Intel's Sapphire rapids server.

   - Add support for PERF_SAMPLE_CODE_PAGE_SIZE in various tools, as a
     sort key, for instance:

        perf report --stdio --sort=comm,symbol,code_page_size

   - New 'perf daemon' command to run long running sessions while
     providing a way to control the enablement of events without
     restarting a traditional 'perf record' session.

   - Enable counting events for BPF programs in 'perf stat' just like
     for other targets (tid, cgroup, cpu, etc), e.g.:

        # perf stat -e ref-cycles,cycles -b 254 -I 1000
           1.487903822            115,200      ref-cycles
           1.487903822             86,012      cycles
           2.489147029             80,560      ref-cycles
           2.489147029             73,784      cycles
        ^C

     The example above counts 'cycles' and 'ref-cycles' of BPF program
     of id 254. It is similar to bpftool-prog-profile command, but more
     flexible.

   - Support the new layout for PERF_RECORD_MMAP2 to carry the DSO
     build-id using infrastructure generalised from the eBPF subsystem,
     removing the need for traversing the perf.data file to collect
     build-ids at the end of 'perf record' sessions and helping with
     long running sessions where binaries can get replaced in updates,
     leading to possible mis-resolution of symbols.

   - Support filtering by hex address in 'perf script'.

   - Support DSO filter in 'perf script', like in other perf tools.

   - Add namespaces support to 'perf inject'

   - Add support for SDT (Dtrace Style Markers) events on ARM64.

  perf record:

   - Fix handling of eventfd() when draining a buffer in 'perf record'.

   - Improvements to the generation of metadata events for pre-existing
     threads (mmaps, comm, etc), speeding up the work done at the start
     of system wide or per CPU 'perf record' sessions.

  Hardware tracing:

   - Initial support for tracing KVM with Intel PT.

   - Intel PT fixes for IPC

   - Support Intel PT PSB (synchronization packets) events.

   - Automatically group aux-output events to overcome --filter syntax.

   - Enable PERF_SAMPLE_DATA_SRC on ARMs SPE.

   - Update ARM's CoreSight hardware tracing OpenCSD library to v1.0.0.

  perf annotate TUI:

   - Fix handling of 'k' ("show line number") hotkey

   - Fix jump parsing for C++ code.

  perf probe:

   - Add protection to avoid endless loop.

  cgroups:

   - Avoid reading cgroup mountpoint multiple times, caching it.

   - Fix handling of cgroup v1/v2 in mixed hierarchy.

  Symbol resolving:

   - Add OCaml symbol demangling.

   - Further fixes for handling PE executables when using perf with Wine
     and .exe/.dll files.

   - Fix 'perf unwind' DSO handling.

   - Resolve symbols against debug file first, to deal with artifacts
     related to LTO.

   - Fix gap between kernel end and module start on powerpc.

  Reporting tools:

   - The DSO filter shouldn't show samples in unresolved maps.

   - Improve debuginfod support in various tools.

  build ids:

   - Fix 16-byte build ids in 'perf buildid-cache', add a 'perf test'
     entry for that case.

  perf test:

   - Support for PERF_SAMPLE_WEIGHT_STRUCT.

   - Add test case for PERF_SAMPLE_CODE_PAGE_SIZE.

   - Shell based tests for 'perf daemon's commands ('start', 'stop,
     'reconfig', 'list', etc).

   - ARM cs-etm 'perf test' fixes.

   - Add parse-metric memory bandwidth testcase.

  Compiler related:

   - Fix 'perf probe' kretprobe issue caused by gcc 11 bug when used
     with -fpatchable-function-entry.

   - Fix ARM64 build with gcc 11's -Wformat-overflow.

   - Fix unaligned access in sample parsing test.

   - Fix printf conversion specifier for IP addresses on arm64, s390 and
     powerpc.

  Arch specific:

   - Support exposing Performance Monitor Counter SPRs as part of
     extended regs on powerpc.

   - Add JSON 'perf stat' metrics for ARM64's imx8mp, imx8mq and imx8mn
     DDR, fix imx8mm ones.

   - Fix common and uarch events for ARM64's A76 and Ampere eMag"

* tag 'perf-tools-for-v5.12-2020-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (148 commits)
  perf buildid-cache: Don't skip 16-byte build-ids
  perf buildid-cache: Add test for 16-byte build-id
  perf symbol: Remove redundant libbfd checks
  perf test: Output the sub testing result in cs-etm
  perf test: Suppress logs in cs-etm testing
  perf tools: Fix arm64 build error with gcc-11
  perf intel-pt: Add documentation for tracing virtual machines
  perf intel-pt: Split VM-Entry and VM-Exit branches
  perf intel-pt: Adjust sample flags for VM-Exit
  perf intel-pt: Allow for a guest kernel address filter
  perf intel-pt: Support decoding of guest kernel
  perf machine: Factor out machine__idle_thread()
  perf machine: Factor out machines__find_guest()
  perf intel-pt: Amend decoder to track the NR flag
  perf intel-pt: Retain the last PIP packet payload as is
  perf intel_pt: Add vmlaunch and vmresume as branches
  perf script: Add branch types for VM-Entry and VM-Exit
  perf auxtrace: Automatically group aux-output events
  perf test: Fix unaligned access in sample parsing test
  perf tools: Support arch specific PERF_SAMPLE_WEIGHT_STRUCT processing
  ...
diff --git a/tools/arch/powerpc/include/uapi/asm/perf_regs.h b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
index bdf5f10..578b3ee 100644
--- a/tools/arch/powerpc/include/uapi/asm/perf_regs.h
+++ b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
@@ -55,17 +55,33 @@
 	PERF_REG_POWERPC_MMCR3,
 	PERF_REG_POWERPC_SIER2,
 	PERF_REG_POWERPC_SIER3,
+	PERF_REG_POWERPC_PMC1,
+	PERF_REG_POWERPC_PMC2,
+	PERF_REG_POWERPC_PMC3,
+	PERF_REG_POWERPC_PMC4,
+	PERF_REG_POWERPC_PMC5,
+	PERF_REG_POWERPC_PMC6,
 	/* Max regs without the extended regs */
 	PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
 };
 
 #define PERF_REG_PMU_MASK	((1ULL << PERF_REG_POWERPC_MAX) - 1)
 
-/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300 */
-#define PERF_REG_PMU_MASK_300   (((1ULL << (PERF_REG_POWERPC_MMCR2 + 1)) - 1) - PERF_REG_PMU_MASK)
-/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_31 */
-#define PERF_REG_PMU_MASK_31   (((1ULL << (PERF_REG_POWERPC_SIER3 + 1)) - 1) - PERF_REG_PMU_MASK)
+/* Exclude MMCR3, SIER2, SIER3 for CPU_FTR_ARCH_300 */
+#define	PERF_EXCLUDE_REG_EXT_300	(7ULL << PERF_REG_POWERPC_MMCR3)
 
-#define PERF_REG_MAX_ISA_300   (PERF_REG_POWERPC_MMCR2 + 1)
-#define PERF_REG_MAX_ISA_31    (PERF_REG_POWERPC_SIER3 + 1)
+/*
+ * PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300
+ * includes 9 SPRS from MMCR0 to PMC6 excluding the
+ * unsupported SPRS in PERF_EXCLUDE_REG_EXT_300.
+ */
+#define PERF_REG_PMU_MASK_300   ((0xfffULL << PERF_REG_POWERPC_MMCR0) - PERF_EXCLUDE_REG_EXT_300)
+
+/*
+ * PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_31
+ * includes 12 SPRs from MMCR0 to PMC6.
+ */
+#define PERF_REG_PMU_MASK_31   (0xfffULL << PERF_REG_POWERPC_MMCR0)
+
+#define PERF_REG_EXTENDED_MAX  (PERF_REG_POWERPC_PMC6 + 1)
 #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index 8ced1655..b3073ae 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -146,6 +146,8 @@
 		     /boot/vmlinux-$(shell uname -r)
 VMLINUX_BTF ?= $(abspath $(firstword $(wildcard $(VMLINUX_BTF_PATHS))))
 
+bootstrap: $(BPFTOOL_BOOTSTRAP)
+
 ifneq ($(VMLINUX_BTF)$(VMLINUX_H),)
 ifeq ($(feature-clang-bpf-co-re),1)
 
diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index 97cbfb3..74e255d 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -99,7 +99,9 @@
          clang                          \
          libbpf                         \
          libpfm4                        \
-         libdebuginfod
+         libdebuginfod			\
+         clang-bpf-co-re
+
 
 FEATURE_TESTS ?= $(FEATURE_TESTS_BASIC)
 
diff --git a/tools/build/feature/test-libopencsd.c b/tools/build/feature/test-libopencsd.c
index 1547bc2..52c790b 100644
--- a/tools/build/feature/test-libopencsd.c
+++ b/tools/build/feature/test-libopencsd.c
@@ -4,9 +4,9 @@
 /*
  * Check OpenCSD library version is sufficient to provide required features
  */
-#define OCSD_MIN_VER ((0 << 16) | (14 << 8) | (0))
+#define OCSD_MIN_VER ((1 << 16) | (0 << 8) | (0))
 #if !defined(OCSD_VER_NUM) || (OCSD_VER_NUM < OCSD_MIN_VER)
-#error "OpenCSD >= 0.14.0 is required"
+#error "OpenCSD >= 1.0.0 is required"
 #endif
 
 int main(void)
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index b15e344..ad15e40 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -145,12 +145,14 @@
 	PERF_SAMPLE_CGROUP			= 1U << 21,
 	PERF_SAMPLE_DATA_PAGE_SIZE		= 1U << 22,
 	PERF_SAMPLE_CODE_PAGE_SIZE		= 1U << 23,
+	PERF_SAMPLE_WEIGHT_STRUCT		= 1U << 24,
 
-	PERF_SAMPLE_MAX = 1U << 24,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 25,		/* non-ABI */
 
 	__PERF_SAMPLE_CALLCHAIN_EARLY		= 1ULL << 63, /* non-ABI; internal use */
 };
 
+#define PERF_SAMPLE_WEIGHT_TYPE	(PERF_SAMPLE_WEIGHT | PERF_SAMPLE_WEIGHT_STRUCT)
 /*
  * values to program into branch_sample_type when PERF_SAMPLE_BRANCH is set
  *
@@ -386,7 +388,8 @@
 				aux_output     :  1, /* generate AUX records instead of events */
 				cgroup         :  1, /* include cgroup events */
 				text_poke      :  1, /* include text poke events */
-				__reserved_1   : 30;
+				build_id       :  1, /* use build id in mmap2 events */
+				__reserved_1   : 29;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -659,6 +662,22 @@
 	__u64	aux_size;
 };
 
+/*
+ * The current state of perf_event_header::misc bits usage:
+ * ('|' used bit, '-' unused bit)
+ *
+ *  012         CDEF
+ *  |||---------||||
+ *
+ *  Where:
+ *    0-2     CPUMODE_MASK
+ *
+ *    C       PROC_MAP_PARSE_TIMEOUT
+ *    D       MMAP_DATA / COMM_EXEC / FORK_EXEC / SWITCH_OUT
+ *    E       MMAP_BUILD_ID / EXACT_IP / SCHED_OUT_PREEMPT
+ *    F       (reserved)
+ */
+
 #define PERF_RECORD_MISC_CPUMODE_MASK		(7 << 0)
 #define PERF_RECORD_MISC_CPUMODE_UNKNOWN	(0 << 0)
 #define PERF_RECORD_MISC_KERNEL			(1 << 0)
@@ -690,6 +709,7 @@
  *
  *   PERF_RECORD_MISC_EXACT_IP           - PERF_RECORD_SAMPLE of precise events
  *   PERF_RECORD_MISC_SWITCH_OUT_PREEMPT - PERF_RECORD_SWITCH* events
+ *   PERF_RECORD_MISC_MMAP_BUILD_ID      - PERF_RECORD_MMAP2 event
  *
  *
  * PERF_RECORD_MISC_EXACT_IP:
@@ -699,9 +719,13 @@
  *
  * PERF_RECORD_MISC_SWITCH_OUT_PREEMPT:
  *   Indicates that thread was preempted in TASK_RUNNING state.
+ *
+ * PERF_RECORD_MISC_MMAP_BUILD_ID:
+ *   Indicates that mmap2 event carries build id data.
  */
 #define PERF_RECORD_MISC_EXACT_IP		(1 << 14)
 #define PERF_RECORD_MISC_SWITCH_OUT_PREEMPT	(1 << 14)
+#define PERF_RECORD_MISC_MMAP_BUILD_ID		(1 << 14)
 /*
  * Reserve the last bit to indicate some extended misc field
  */
@@ -890,7 +914,24 @@
 	 * 	  char			data[size];
 	 * 	  u64			dyn_size; } && PERF_SAMPLE_STACK_USER
 	 *
-	 *	{ u64			weight;   } && PERF_SAMPLE_WEIGHT
+	 *	{ union perf_sample_weight
+	 *	 {
+	 *		u64		full; && PERF_SAMPLE_WEIGHT
+	 *	#if defined(__LITTLE_ENDIAN_BITFIELD)
+	 *		struct {
+	 *			u32	var1_dw;
+	 *			u16	var2_w;
+	 *			u16	var3_w;
+	 *		} && PERF_SAMPLE_WEIGHT_STRUCT
+	 *	#elif defined(__BIG_ENDIAN_BITFIELD)
+	 *		struct {
+	 *			u16	var3_w;
+	 *			u16	var2_w;
+	 *			u32	var1_dw;
+	 *		} && PERF_SAMPLE_WEIGHT_STRUCT
+	 *	#endif
+	 *	 }
+	 *	}
 	 *	{ u64			data_src; } && PERF_SAMPLE_DATA_SRC
 	 *	{ u64			transaction; } && PERF_SAMPLE_TRANSACTION
 	 *	{ u64			abi; # enum perf_sample_regs_abi
@@ -915,10 +956,20 @@
 	 *	u64				addr;
 	 *	u64				len;
 	 *	u64				pgoff;
-	 *	u32				maj;
-	 *	u32				min;
-	 *	u64				ino;
-	 *	u64				ino_generation;
+	 *	union {
+	 *		struct {
+	 *			u32		maj;
+	 *			u32		min;
+	 *			u64		ino;
+	 *			u64		ino_generation;
+	 *		};
+	 *		struct {
+	 *			u8		build_id_size;
+	 *			u8		__reserved_1;
+	 *			u16		__reserved_2;
+	 *			u8		build_id[20];
+	 *		};
+	 *	};
 	 *	u32				prot, flags;
 	 *	char				filename[];
 	 * 	struct sample_id		sample_id;
@@ -1127,14 +1178,16 @@
 			mem_lvl_num:4,	/* memory hierarchy level number */
 			mem_remote:1,   /* remote */
 			mem_snoopx:2,	/* snoop mode, ext */
-			mem_rsvd:24;
+			mem_blk:3,	/* access blocked */
+			mem_rsvd:21;
 	};
 };
 #elif defined(__BIG_ENDIAN_BITFIELD)
 union perf_mem_data_src {
 	__u64 val;
 	struct {
-		__u64	mem_rsvd:24,
+		__u64	mem_rsvd:21,
+			mem_blk:3,	/* access blocked */
 			mem_snoopx:2,	/* snoop mode, ext */
 			mem_remote:1,   /* remote */
 			mem_lvl_num:4,	/* memory hierarchy level number */
@@ -1217,6 +1270,12 @@
 #define PERF_MEM_TLB_OS		0x40 /* OS fault handler */
 #define PERF_MEM_TLB_SHIFT	26
 
+/* Access blocked */
+#define PERF_MEM_BLK_NA		0x01 /* not available */
+#define PERF_MEM_BLK_DATA	0x02 /* data could not be forwarded */
+#define PERF_MEM_BLK_ADDR	0x04 /* address conflict */
+#define PERF_MEM_BLK_SHIFT	40
+
 #define PERF_MEM_S(a, s) \
 	(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
 
@@ -1248,4 +1307,23 @@
 		reserved:40;
 };
 
+union perf_sample_weight {
+	__u64		full;
+#if defined(__LITTLE_ENDIAN_BITFIELD)
+	struct {
+		__u32	var1_dw;
+		__u16	var2_w;
+		__u16	var3_w;
+	};
+#elif defined(__BIG_ENDIAN_BITFIELD)
+	struct {
+		__u16	var3_w;
+		__u16	var2_w;
+		__u32	var1_dw;
+	};
+#else
+#error "Unknown endianness"
+#endif
+};
+
 #endif /* _UAPI_LINUX_PERF_EVENT_H */
diff --git a/tools/include/uapi/linux/prctl.h b/tools/include/uapi/linux/prctl.h
index 90deb41c..667f1ae 100644
--- a/tools/include/uapi/linux/prctl.h
+++ b/tools/include/uapi/linux/prctl.h
@@ -251,5 +251,8 @@
 #define PR_SET_SYSCALL_USER_DISPATCH	59
 # define PR_SYS_DISPATCH_OFF		0
 # define PR_SYS_DISPATCH_ON		1
+/* The control values for the user space selector when dispatch is enabled */
+# define SYSCALL_DISPATCH_FILTER_ALLOW	0
+# define SYSCALL_DISPATCH_FILTER_BLOCK	1
 
 #endif /* _LINUX_PRCTL_H */
diff --git a/tools/lib/api/fs/cgroup.c b/tools/lib/api/fs/cgroup.c
index 889a6eb..1573dae 100644
--- a/tools/lib/api/fs/cgroup.c
+++ b/tools/lib/api/fs/cgroup.c
@@ -8,12 +8,29 @@
 #include <string.h>
 #include "fs.h"
 
+struct cgroupfs_cache_entry {
+	char	subsys[32];
+	char	mountpoint[PATH_MAX];
+};
+
+/* just cache last used one */
+static struct cgroupfs_cache_entry cached;
+
 int cgroupfs_find_mountpoint(char *buf, size_t maxlen, const char *subsys)
 {
 	FILE *fp;
-	char mountpoint[PATH_MAX + 1], tokens[PATH_MAX + 1], type[PATH_MAX + 1];
-	char path_v1[PATH_MAX + 1], path_v2[PATH_MAX + 2], *path;
-	char *token, *saved_ptr = NULL;
+	char *line = NULL;
+	size_t len = 0;
+	char *p, *path;
+	char mountpoint[PATH_MAX];
+
+	if (!strcmp(cached.subsys, subsys)) {
+		if (strlen(cached.mountpoint) < maxlen) {
+			strcpy(buf, cached.mountpoint);
+			return 0;
+		}
+		return -1;
+	}
 
 	fp = fopen("/proc/mounts", "r");
 	if (!fp)
@@ -22,45 +39,63 @@
 	/*
 	 * in order to handle split hierarchy, we need to scan /proc/mounts
 	 * and inspect every cgroupfs mount point to find one that has
-	 * perf_event subsystem
+	 * the given subsystem.  If we found v1, just use it.  If not we can
+	 * use v2 path as a fallback.
 	 */
-	path_v1[0] = '\0';
-	path_v2[0] = '\0';
+	mountpoint[0] = '\0';
 
-	while (fscanf(fp, "%*s %"__stringify(PATH_MAX)"s %"__stringify(PATH_MAX)"s %"
-				__stringify(PATH_MAX)"s %*d %*d\n",
-				mountpoint, type, tokens) == 3) {
+	/*
+	 * The /proc/mounts has the follow format:
+	 *
+	 *   <devname> <mount point> <fs type> <options> ...
+	 *
+	 */
+	while (getline(&line, &len, fp) != -1) {
+		/* skip devname */
+		p = strchr(line, ' ');
+		if (p == NULL)
+			continue;
 
-		if (!path_v1[0] && !strcmp(type, "cgroup")) {
+		/* save the mount point */
+		path = ++p;
+		p = strchr(p, ' ');
+		if (p == NULL)
+			continue;
 
-			token = strtok_r(tokens, ",", &saved_ptr);
+		*p++ = '\0';
 
-			while (token != NULL) {
-				if (subsys && !strcmp(token, subsys)) {
-					strcpy(path_v1, mountpoint);
-					break;
-				}
-				token = strtok_r(NULL, ",", &saved_ptr);
-			}
+		/* check filesystem type */
+		if (strncmp(p, "cgroup", 6))
+			continue;
+
+		if (p[6] == '2') {
+			/* save cgroup v2 path */
+			strcpy(mountpoint, path);
+			continue;
 		}
 
-		if (!path_v2[0] && !strcmp(type, "cgroup2"))
-			strcpy(path_v2, mountpoint);
+		/* now we have cgroup v1, check the options for subsystem */
+		p += 7;
 
-		if (path_v1[0] && path_v2[0])
-			break;
+		p = strstr(p, subsys);
+		if (p == NULL)
+			continue;
+
+		/* sanity check: it should be separated by a space or a comma */
+		if (!strchr(" ,", p[-1]) || !strchr(" ,", p[strlen(subsys)]))
+			continue;
+
+		strcpy(mountpoint, path);
+		break;
 	}
+	free(line);
 	fclose(fp);
 
-	if (path_v1[0])
-		path = path_v1;
-	else if (path_v2[0])
-		path = path_v2;
-	else
-		return -1;
+	strncpy(cached.subsys, subsys, sizeof(cached.subsys) - 1);
+	strcpy(cached.mountpoint, mountpoint);
 
-	if (strlen(path) < maxlen) {
-		strcpy(buf, path);
+	if (mountpoint[0] && strlen(mountpoint) < maxlen) {
+		strcpy(buf, mountpoint);
 		return 0;
 	}
 	return -1;
diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
index 988c539..d820542 100644
--- a/tools/lib/perf/include/perf/event.h
+++ b/tools/lib/perf/include/perf/event.h
@@ -23,10 +23,20 @@
 	__u64			 start;
 	__u64			 len;
 	__u64			 pgoff;
-	__u32			 maj;
-	__u32			 min;
-	__u64			 ino;
-	__u64			 ino_generation;
+	union {
+		struct {
+			__u32	 maj;
+			__u32	 min;
+			__u64	 ino;
+			__u64	 ino_generation;
+		};
+		struct {
+			__u8	 build_id_size;
+			__u8	 __reserved_1;
+			__u16	 __reserved_2;
+			__u8	 build_id[20];
+		};
+	};
 	__u32			 prot;
 	__u32			 flags;
 	char			 filename[PATH_MAX];
diff --git a/tools/perf/Build b/tools/perf/Build
index 5f392db..db61dbe 100644
--- a/tools/perf/Build
+++ b/tools/perf/Build
@@ -24,6 +24,7 @@
 perf-y += builtin-data.o
 perf-y += builtin-version.o
 perf-y += builtin-c2c.o
+perf-y += builtin-daemon.o
 
 perf-$(CONFIG_TRACE) += builtin-trace.o
 perf-$(CONFIG_LIBELF) += builtin-probe.o
diff --git a/tools/perf/Documentation/examples.txt b/tools/perf/Documentation/examples.txt
index a4e3921..c0d22fb 100644
--- a/tools/perf/Documentation/examples.txt
+++ b/tools/perf/Documentation/examples.txt
@@ -3,7 +3,7 @@
 		****** perf by examples ******
 		------------------------------
 
-[ From an e-mail by Ingo Molnar, http://lkml.org/lkml/2009/8/4/346 ]
+[ From an e-mail by Ingo Molnar, https://lore.kernel.org/lkml/20090804195717.GA5998@elte.hu ]
 
 
 First, discovery/enumeration of available counters can be done via
diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
index 079cdfa..0f10052 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -4,7 +4,7 @@
 		r	synthesize branches events (returns only)
 		x	synthesize transactions events
 		w	synthesize ptwrite events
-		p	synthesize power events
+		p	synthesize power events (incl. PSB events for Intel PT)
 		o	synthesize other events recorded due to the use
 			of aux-output (refer to perf record)
 		e	synthesize error events
diff --git a/tools/perf/Documentation/perf-buildid-cache.txt b/tools/perf/Documentation/perf-buildid-cache.txt
index f6de095..bb167e3 100644
--- a/tools/perf/Documentation/perf-buildid-cache.txt
+++ b/tools/perf/Documentation/perf-buildid-cache.txt
@@ -74,6 +74,12 @@
 	used when creating a uprobe for a process that resides in a
 	different mount namespace from the perf(1) utility.
 
+--debuginfod=URLs::
+	Specify debuginfod URL to be used when retrieving perf.data binaries,
+	it follows the same syntax as the DEBUGINFOD_URLS variable, like:
+
+	  buildid-cache.debuginfod=http://192.168.122.174:8002
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-buildid-list[1]
diff --git a/tools/perf/Documentation/perf-config.txt b/tools/perf/Documentation/perf-config.txt
index 5c379ad..153bde1 100644
--- a/tools/perf/Documentation/perf-config.txt
+++ b/tools/perf/Documentation/perf-config.txt
@@ -238,6 +238,13 @@
 		cache location, or to disable it altogether. If you want to disable it,
 		set buildid.dir to /dev/null. The default is $HOME/.debug
 
+buildid-cache.*::
+	buildid-cache.debuginfod=URLs
+		Specify debuginfod URLs to be used when retrieving perf.data binaries,
+		it follows the same syntax as the DEBUGINFOD_URLS variable, like:
+
+		  buildid-cache.debuginfod=http://192.168.122.174:8002
+
 annotate.*::
 	These are in control of addresses, jump function, source code
 	in lines of assembly code from a specific program.
@@ -552,11 +559,12 @@
 
 record.*::
 	record.build-id::
-		This option can be 'cache', 'no-cache' or 'skip'.
+		This option can be 'cache', 'no-cache', 'skip' or 'mmap'.
 		'cache' is to post-process data and save/update the binaries into
 		the build-id cache (in ~/.debug). This is the default.
 		But if this option is 'no-cache', it will not update the build-id cache.
 		'skip' skips post-processing and does not update the cache.
+		'mmap' skips post-processing and reads build-ids from MMAP events.
 
 	record.call-graph::
 		This is identical to 'call-graph.record-mode', except it is
@@ -695,6 +703,20 @@
 		If the directory does not exist or has the wrong file type,
 		the current directory is used.
 
+daemon.*::
+
+	daemon.base::
+		Base path for daemon data. All sessions data are stored under
+		this path.
+
+session-<NAME>.*::
+
+	session-<NAME>.run::
+
+		Defines new record session for daemon. The value is record's
+		command line without the 'record' keyword.
+
+
 SEE ALSO
 --------
 linkperf:perf[1]
diff --git a/tools/perf/Documentation/perf-daemon.txt b/tools/perf/Documentation/perf-daemon.txt
new file mode 100644
index 0000000..f558f8e
--- /dev/null
+++ b/tools/perf/Documentation/perf-daemon.txt
@@ -0,0 +1,208 @@
+perf-daemon(1)
+==============
+
+
+NAME
+----
+perf-daemon - Run record sessions on background
+
+
+SYNOPSIS
+--------
+[verse]
+'perf daemon'
+'perf daemon' [<options>]
+'perf daemon start'  [<options>]
+'perf daemon stop'   [<options>]
+'perf daemon signal' [<options>]
+'perf daemon ping'   [<options>]
+
+
+DESCRIPTION
+-----------
+This command allows to run simple daemon process that starts and
+monitors configured record sessions.
+
+You can imagine 'perf daemon' of background process with several
+'perf record' child tasks, like:
+
+  # ps axjf
+  ...
+       1  916507 ... perf daemon start
+  916507  916508 ...  \_ perf record --control=fifo:control,ack -m 10M -e cycles --overwrite --switch-output -a
+  916507  916509 ...  \_ perf record --control=fifo:control,ack -m 20M -e sched:* --overwrite --switch-output -a
+
+Not every 'perf record' session is suitable for running under daemon.
+User need perf session that either produces data on query, like the
+flight recorder sessions in above example or session that is configured
+to produce data periodically, like with --switch-output configuration
+for time and size.
+
+Each session is started with control setup (with perf record --control
+options).
+
+Sessions are configured through config file, see CONFIG FILE section
+with EXAMPLES.
+
+
+OPTIONS
+-------
+-v::
+--verbose::
+	Be more verbose.
+
+--config=<PATH>::
+	Config file path. If not provided, perf will check system and default
+	locations (/etc/perfconfig, $HOME/.perfconfig).
+
+--base=<PATH>::
+	Base directory path. Each daemon instance is running on top
+	of base directory. Only one instance of server can run on
+	top of one directory at the time.
+
+All generic options are available also under commands.
+
+
+START COMMAND
+-------------
+The start command creates the daemon process.
+
+-f::
+--foreground::
+	Do not put the process in background.
+
+
+STOP COMMAND
+------------
+The stop command stops all the session and the daemon process.
+
+
+SIGNAL COMMAND
+--------------
+The signal command sends signal to configured sessions.
+
+--session::
+	Send signal to specific session.
+
+
+PING COMMAND
+------------
+The ping command sends control ping to configured sessions.
+
+--session::
+	Send ping to specific session.
+
+
+CONFIG FILE
+-----------
+The daemon is configured within standard perf config file by
+following new variables:
+
+daemon.base:
+	Base path for daemon data. All sessions data are
+	stored under this path.
+
+session-<NAME>.run:
+	Defines new record session. The value is record's command
+	line without the 'record' keyword.
+
+Each perf record session is run in daemon.base/<NAME> directory.
+
+
+EXAMPLES
+--------
+Example with 2 record sessions:
+
+  # cat ~/.perfconfig
+  [daemon]
+  base=/opt/perfdata
+
+  [session-cycles]
+  run = -m 10M -e cycles --overwrite --switch-output -a
+
+  [session-sched]
+  run = -m 20M -e sched:* --overwrite --switch-output -a
+
+
+Starting the daemon:
+
+  # perf daemon start
+
+
+Check sessions:
+
+  # perf daemon
+  [603349:daemon] base: /opt/perfdata
+  [603350:cycles] perf record -m 10M -e cycles --overwrite --switch-output -a
+  [603351:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a
+
+First line is daemon process info with configured daemon base.
+
+
+Check sessions with more info:
+
+  # perf daemon -v
+  [603349:daemon] base: /opt/perfdata
+    output:  /opt/perfdata/output
+    lock:    /opt/perfdata/lock
+    up:      1 minutes
+  [603350:cycles] perf record -m 10M -e cycles --overwrite --switch-output -a
+    base:    /opt/perfdata/session-cycles
+    output:  /opt/perfdata/session-cycles/output
+    control: /opt/perfdata/session-cycles/control
+    ack:     /opt/perfdata/session-cycles/ack
+    up:      1 minutes
+  [603351:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a
+    base:    /opt/perfdata/session-sched
+    output:  /opt/perfdata/session-sched/output
+    control: /opt/perfdata/session-sched/control
+    ack:     /opt/perfdata/session-sched/ack
+    up:      1 minutes
+
+The 'base' path is daemon/session base.
+The 'lock' file is daemon's lock file guarding that no other
+daemon is running on top of the base.
+The 'output' file is perf record output for specific session.
+The 'control' and 'ack' files are perf control files.
+The 'up' number shows minutes daemon/session is running.
+
+
+Make sure control session is online:
+
+  # perf daemon ping
+  OK   cycles
+  OK   sched
+
+
+Send USR2 signal to session 'cycles' to generate perf.data file:
+
+  # perf daemon signal --session cycles
+  signal 12 sent to session 'cycles [603452]'
+
+  # tail -2  /opt/perfdata/session-cycles/output
+  [ perf record: dump data: Woken up 1 times ]
+  [ perf record: Dump perf.data.2020123017013149 ]
+
+
+Send USR2 signal to all sessions:
+
+  # perf daemon signal
+  signal 12 sent to session 'cycles [603452]'
+  signal 12 sent to session 'sched [603453]'
+
+  # tail -2  /opt/perfdata/session-cycles/output
+  [ perf record: dump data: Woken up 1 times ]
+  [ perf record: Dump perf.data.2020123017024689 ]
+  # tail -2  /opt/perfdata/session-sched/output
+  [ perf record: dump data: Woken up 1 times ]
+  [ perf record: Dump perf.data.2020123017024713 ]
+
+
+Stop daemon:
+
+  # perf daemon stop
+
+
+SEE ALSO
+--------
+linkperf:perf-record[1], linkperf:perf-config[1]
diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt
index cd362dc..1dcec73 100644
--- a/tools/perf/Documentation/perf-intel-pt.txt
+++ b/tools/perf/Documentation/perf-intel-pt.txt
@@ -858,7 +858,7 @@
 	b	synthesize "branches" events
 	x	synthesize "transactions" events
 	w	synthesize "ptwrite" events
-	p	synthesize "power" events
+	p	synthesize "power" events (incl. PSB events)
 	c	synthesize branches events (calls only)
 	r	synthesize branches events (returns only)
 	e	synthesize tracing error events
@@ -913,6 +913,11 @@
 For more details refer to the Intel 64 and IA-32 Architectures Software
 Developer Manuals.
 
+PSB events show when a PSB+ occurred and also the byte-offset in the trace.
+Emitting a PSB+ can cause a CPU a slight delay. When doing timing analysis
+of code with Intel PT, it is useful to know if a timing bubble was caused
+by Intel PT or not.
+
 Error events show where the decoder lost the trace.  Error events
 are quite important.  Users must know if what they are seeing is a complete
 picture or not. The "e" option may be followed by flags which affect what errors
@@ -1141,6 +1146,88 @@
 
 include::build-xed.txt[]
 
+
+Tracing Virtual Machines
+------------------------
+
+Currently, only kernel tracing is supported and only with "timeless" decoding
+i.e. no TSC timestamps
+
+Other limitations and caveats
+
+ VMX controls may suppress packets needed for decoding resulting in decoding errors
+ VMX controls may block the perf NMI to the host potentially resulting in lost trace data
+ Guest kernel self-modifying code (e.g. jump labels or JIT-compiled eBPF) will result in decoding errors
+ Guest thread information is unknown
+ Guest VCPU is unknown but may be able to be inferred from the host thread
+ Callchains are not supported
+
+Example
+
+Start VM
+
+ $ sudo virsh start kubuntu20.04
+ Domain kubuntu20.04 started
+
+Mount the guest file system.  Note sshfs needs -o direct_io to enable reading of proc files.  root access is needed to read /proc/kcore.
+
+ $ mkdir vm0
+ $ sshfs -o direct_io root@vm0:/ vm0
+
+Copy the guest /proc/kallsyms, /proc/modules and /proc/kcore
+
+ $ perf buildid-cache -v --kcore vm0/proc/kcore
+ kcore added to build-id cache directory /home/user/.debug/[kernel.kcore]/9600f316a53a0f54278885e8d9710538ec5f6a08/2021021807494306
+ $ KALLSYMS=/home/user/.debug/[kernel.kcore]/9600f316a53a0f54278885e8d9710538ec5f6a08/2021021807494306/kallsyms
+
+Find the VM process
+
+ $ ps -eLl | grep 'KVM\|PID'
+ F S   UID     PID    PPID     LWP  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
+ 3 S 64055    1430       1    1440  1  80   0 - 1921718 -    ?        00:02:47 CPU 0/KVM
+ 3 S 64055    1430       1    1441  1  80   0 - 1921718 -    ?        00:02:41 CPU 1/KVM
+ 3 S 64055    1430       1    1442  1  80   0 - 1921718 -    ?        00:02:38 CPU 2/KVM
+ 3 S 64055    1430       1    1443  2  80   0 - 1921718 -    ?        00:03:18 CPU 3/KVM
+
+Start an open-ended perf record, tracing the VM process, do something on the VM, and then ctrl-C to stop.
+TSC is not supported and tsc=0 must be specified.  That means mtc is useless, so add mtc=0.
+However, IPC can still be determined, hence cyc=1 can be added.
+Only kernel decoding is supported, so 'k' must be specified.
+Intel PT traces both the host and the guest so --guest and --host need to be specified.
+Without timestamps, --per-thread must be specified to distinguish threads.
+
+ $ sudo perf kvm --guest --host --guestkallsyms $KALLSYMS record --kcore -e intel_pt/tsc=0,mtc=0,cyc=1/k -p 1430 --per-thread
+ ^C
+ [ perf record: Woken up 1 times to write data ]
+ [ perf record: Captured and wrote 5.829 MB ]
+
+perf script can be used to provide an instruction trace
+
+ $ perf script --guestkallsyms $KALLSYMS --insn-trace --xed -F+ipc | grep -C10 vmresume | head -21
+       CPU 0/KVM  1440  ffffffff82133cdd __vmx_vcpu_run+0x3d ([kernel.kallsyms])                movq  0x48(%rax), %r9
+       CPU 0/KVM  1440  ffffffff82133ce1 __vmx_vcpu_run+0x41 ([kernel.kallsyms])                movq  0x50(%rax), %r10
+       CPU 0/KVM  1440  ffffffff82133ce5 __vmx_vcpu_run+0x45 ([kernel.kallsyms])                movq  0x58(%rax), %r11
+       CPU 0/KVM  1440  ffffffff82133ce9 __vmx_vcpu_run+0x49 ([kernel.kallsyms])                movq  0x60(%rax), %r12
+       CPU 0/KVM  1440  ffffffff82133ced __vmx_vcpu_run+0x4d ([kernel.kallsyms])                movq  0x68(%rax), %r13
+       CPU 0/KVM  1440  ffffffff82133cf1 __vmx_vcpu_run+0x51 ([kernel.kallsyms])                movq  0x70(%rax), %r14
+       CPU 0/KVM  1440  ffffffff82133cf5 __vmx_vcpu_run+0x55 ([kernel.kallsyms])                movq  0x78(%rax), %r15
+       CPU 0/KVM  1440  ffffffff82133cf9 __vmx_vcpu_run+0x59 ([kernel.kallsyms])                movq  (%rax), %rax
+       CPU 0/KVM  1440  ffffffff82133cfc __vmx_vcpu_run+0x5c ([kernel.kallsyms])                callq  0xffffffff82133c40
+       CPU 0/KVM  1440  ffffffff82133c40 vmx_vmenter+0x0 ([kernel.kallsyms])            jz 0xffffffff82133c46
+       CPU 0/KVM  1440  ffffffff82133c42 vmx_vmenter+0x2 ([kernel.kallsyms])            vmresume         IPC: 0.11 (50/445)
+           :1440  1440  ffffffffbb678b06 native_write_msr+0x6 ([guest.kernel.kallsyms])                 nopl  %eax, (%rax,%rax,1)
+           :1440  1440  ffffffffbb678b0b native_write_msr+0xb ([guest.kernel.kallsyms])                 retq     IPC: 0.04 (2/41)
+           :1440  1440  ffffffffbb666646 lapic_next_deadline+0x26 ([guest.kernel.kallsyms])             data16 nop
+           :1440  1440  ffffffffbb666648 lapic_next_deadline+0x28 ([guest.kernel.kallsyms])             xor %eax, %eax
+           :1440  1440  ffffffffbb66664a lapic_next_deadline+0x2a ([guest.kernel.kallsyms])             popq  %rbp
+           :1440  1440  ffffffffbb66664b lapic_next_deadline+0x2b ([guest.kernel.kallsyms])             retq     IPC: 0.16 (4/25)
+           :1440  1440  ffffffffbb74607f clockevents_program_event+0x8f ([guest.kernel.kallsyms])               test %eax, %eax
+           :1440  1440  ffffffffbb746081 clockevents_program_event+0x91 ([guest.kernel.kallsyms])               jz 0xffffffffbb74603c    IPC: 0.06 (2/30)
+           :1440  1440  ffffffffbb74603c clockevents_program_event+0x4c ([guest.kernel.kallsyms])               popq  %rbx
+           :1440  1440  ffffffffbb74603d clockevents_program_event+0x4d ([guest.kernel.kallsyms])               popq  %r12
+
+
+
 SEE ALSO
 --------
 
diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt
index 199ea0f..66177511 100644
--- a/tools/perf/Documentation/perf-mem.txt
+++ b/tools/perf/Documentation/perf-mem.txt
@@ -63,6 +63,9 @@
 --phys-data::
 	Record/Report sample physical addresses
 
+--data-page-size::
+	Record/Report sample data address page size
+
 RECORD OPTIONS
 --------------
 -e::
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 34cf651..f3161c9 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -296,6 +296,9 @@
 --data-page-size::
 	Record the sampled data address data page size.
 
+--code-page-size::
+	Record the sampled code address (ip) page size
+
 -T::
 --timestamp::
 	Record the sample timestamps. Use it with 'perf report -D' to see the
@@ -485,6 +488,9 @@
 --buildid-all::
 Record build-id of all DSOs regardless whether it's actually hit or not.
 
+--buildid-mmap::
+Record build ids in mmap2 events, disables build id cache (implies --no-buildid).
+
 --aio[=n]::
 Use <n> control blocks in asynchronous (Posix AIO) trace writing mode (default: 1, max: 4).
 Asynchronous mode is supported only when linking Perf tool with libc library
@@ -640,9 +646,18 @@
 Listen on ctl-fd descriptor for command to control measurement.
 
 Available commands:
-  'enable'  : enable events
-  'disable' : disable events
-  'snapshot': AUX area tracing snapshot).
+  'enable'           : enable events
+  'disable'          : disable events
+  'enable name'      : enable event 'name'
+  'disable name'     : disable event 'name'
+  'snapshot'         : AUX area tracing snapshot).
+  'stop'             : stop perf record
+  'ping'             : ping
+
+  'evlist [-v|-g|-F] : display all events
+                       -F  Show just the sample frequency used for each event.
+                       -v  Show all fields.
+                       -g  Show event group information.
 
 Measurements can be started with events disabled using --delay=-1 option. Optionally
 send control command completion ('ack\n') to ack-fd descriptor to synchronize with the
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 8f7f4e9..f546b5e 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -108,6 +108,10 @@
 	- period: Raw number of event count of sample
 	- time: Separate the samples by time stamp with the resolution specified by
 	--time-quantum (default 100ms). Specify with overhead and before it.
+	- code_page_size: the code page size of sampled code address (ip)
+	- ins_lat: Instruction latency in core cycles. This is the global instruction
+	  latency
+	- local_ins_lat: Local instruction latency version
 
 	By default, comm, dso and symbol keys are used.
 	(i.e. --sort comm,dso,symbol)
@@ -139,7 +143,7 @@
 
 	If the --mem-mode option is used, the following sort keys are also available
 	(incompatible with --branch-stack):
-	symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline.
+	symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline, blocked.
 
 	- symbol_daddr: name of data symbol being executed on at the time of sample
 	- dso_daddr: name of library or module containing the data being executed
@@ -151,9 +155,11 @@
 	- dcacheline: the cacheline the data address is on at the time of the sample
 	- phys_daddr: physical address of data being executed on at the time of sample
 	- data_page_size: the data page size of data being executed on at the time of sample
+	- blocked: reason of blocked load access for the data at the time of the sample
 
 	And the default sort keys are changed to local_weight, mem, sym, dso,
-	symbol_daddr, dso_daddr, snoop, tlb, locked, see '--mem-mode'.
+	symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat,
+	see '--mem-mode'.
 
 	If the data file has tracepoint event(s), following (dynamic) sort keys
 	are also available:
diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 44d3721..5b8b610 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -118,7 +118,7 @@
         comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff,
         srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output,
         brstackinsn, brstackoff, callindent, insn, insnlen, synth, phys_addr,
-        metric, misc, srccode, ipc, data_page_size.
+        metric, misc, srccode, ipc, data_page_size, code_page_size.
         Field list can be prepended with the type, trace, sw or hw,
         to indicate to which event type the field list applies.
         e.g., -F sw:comm,tid,time,ip,sym  and -F trace:time,cpu,trace
@@ -422,9 +422,32 @@
 	Only consider the listed symbols. Symbols are typically a name
 	but they may also be hexadecimal address.
 
+	The hexadecimal address may be the start address of a symbol or
+	any other address to filter the trace records
+
 	For example, to select the symbol noploop or the address 0x4007a0:
 	perf script --symbols=noploop,0x4007a0
 
+	Support filtering trace records by symbol name, start address of
+	symbol, any hexadecimal address and address range.
+
+	The comparison order is:
+
+	1. symbol name comparison
+	2. symbol start address comparison.
+	3. any hexadecimal address comparison.
+	4. address range comparison (see --addr-range).
+
+--addr-range::
+       Use with -S or --symbols to list traced records within address range.
+
+       For example, to list the traced records within the address range
+       [0x4007a0, 0x0x4007a9]:
+       perf script -S 0x4007a0 --addr-range 10
+
+--dsos=::
+	Only consider symbols in these DSOs.
+
 --call-trace::
 	Show call stream for intel_pt traces. The CPUs are interleaved, but
 	can be filtered with -C.
diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 5d4a673d..08a1714 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -75,6 +75,24 @@
 --tid=<tid>::
         stat events on existing thread id (comma separated list)
 
+-b::
+--bpf-prog::
+        stat events on existing bpf program id (comma separated list),
+        requiring root rights. bpftool-prog could be used to find program
+        id all bpf programs in the system. For example:
+
+  # bpftool prog | head -n 1
+  17247: tracepoint  name sys_enter  tag 192d548b9d754067  gpl
+
+  # perf stat -e cycles,instructions --bpf-prog 17247 --timeout 1000
+
+   Performance counter stats for 'BPF program(s) 17247':
+
+             85,967      cycles
+             28,982      instructions              #    0.34  insn per cycle
+
+        1.102235068 seconds time elapsed
+
 ifdef::HAVE_LIBPFM[]
 --pfm-events events::
 Select a PMU event using libpfm4 syntax (see http://perfmon2.sf.net)
@@ -358,7 +376,7 @@
 Do not aggregate counts across all monitored CPUs.
 
 --topdown::
-Print top down level 1 metrics if supported by the CPU. This allows to
+Print complete top-down metrics supported by the CPU. This allows to
 determine bottle necks in the CPU pipeline for CPU bound workloads,
 by breaking the cycles consumed down into frontend bound, backend bound,
 bad speculation and retiring.
@@ -393,6 +411,18 @@
 CPUs the workload runs on. If needed the CPUs can be forced using
 taskset.
 
+--td-level::
+Print the top-down statistics that equal to or lower than the input level.
+It allows users to print the interested top-down metrics level instead of
+the complete top-down metrics.
+
+The availability of the top-down metrics level depends on the hardware. For
+example, Ice Lake only supports L1 top-down metrics. The Sapphire Rapids
+supports both L1 and L2 top-down metrics.
+
+Default: 0 means the max level that the current hardware support.
+Error out if the input is higher than the supported max level.
+
 --no-merge::
 Do not merge results from same PMUs.
 
diff --git a/tools/perf/Documentation/topdown.txt b/tools/perf/Documentation/topdown.txt
index 3c39bb3..10f07f9 100644
--- a/tools/perf/Documentation/topdown.txt
+++ b/tools/perf/Documentation/topdown.txt
@@ -121,7 +121,7 @@
 #define RDPMC_METRIC	(1 << 29)	/* return metric counters */
 
 #define FIXED_COUNTER_SLOTS		3
-#define METRIC_COUNTER_TOPDOWN_L1	0
+#define METRIC_COUNTER_TOPDOWN_L1_L2	0
 
 static inline uint64_t read_slots(void)
 {
@@ -130,7 +130,7 @@
 
 static inline uint64_t read_metrics(void)
 {
-	return _rdpmc(RDPMC_METRIC | METRIC_COUNTER_TOPDOWN_L1);
+	return _rdpmc(RDPMC_METRIC | METRIC_COUNTER_TOPDOWN_L1_L2);
 }
 
 Then the program can be instrumented to read these metrics at different
@@ -152,11 +152,21 @@
 
 #define GET_METRIC(m, i) (((m) >> (i*8)) & 0xff)
 
+/* L1 Topdown metric events */
 #define TOPDOWN_RETIRING(val)	((float)GET_METRIC(val, 0) / 0xff)
 #define TOPDOWN_BAD_SPEC(val)	((float)GET_METRIC(val, 1) / 0xff)
 #define TOPDOWN_FE_BOUND(val)	((float)GET_METRIC(val, 2) / 0xff)
 #define TOPDOWN_BE_BOUND(val)	((float)GET_METRIC(val, 3) / 0xff)
 
+/*
+ * L2 Topdown metric events.
+ * Available on Sapphire Rapids and later platforms.
+ */
+#define TOPDOWN_HEAVY_OPS(val)		((float)GET_METRIC(val, 4) / 0xff)
+#define TOPDOWN_BR_MISPREDICT(val)	((float)GET_METRIC(val, 5) / 0xff)
+#define TOPDOWN_FETCH_LAT(val)		((float)GET_METRIC(val, 6) / 0xff)
+#define TOPDOWN_MEM_BOUND(val)		((float)GET_METRIC(val, 7) / 0xff)
+
 and then converted to percent for printing.
 
 The ratios in the metric accumulate for the time when the counter
@@ -190,8 +200,8 @@
 	fe_bound_slots = GET_METRIC(metric_b, 2) * slots_b - fe_bound_slots_a
 	be_bound_slots = GET_METRIC(metric_b, 3) * slots_b - be_bound_slots_a
 
-Later the individual ratios for the measurement period can be recreated
-from these counts.
+Later the individual ratios of L1 metric events for the measurement period can
+be recreated from these counts.
 
 	slots_delta = slots_b - slots_a
 	retiring_ratio = (float)retiring_slots / slots_delta
@@ -205,6 +215,48 @@
 		fe_bound_ratio * 100.,
 		be_bound_ratio * 100.);
 
+The individual ratios of L2 metric events for the measurement period can be
+recreated from L1 and L2 metric counters. (Available on Sapphire Rapids and
+later platforms)
+
+	# compute scaled metrics for measurement a
+	heavy_ops_slots_a = GET_METRIC(metric_a, 4) * slots_a
+	br_mispredict_slots_a = GET_METRIC(metric_a, 5) * slots_a
+	fetch_lat_slots_a = GET_METRIC(metric_a, 6) * slots_a
+	mem_bound_slots_a = GET_METRIC(metric_a, 7) * slots_a
+
+	# compute delta scaled metrics between b and a
+	heavy_ops_slots = GET_METRIC(metric_b, 4) * slots_b - heavy_ops_slots_a
+	br_mispredict_slots = GET_METRIC(metric_b, 5) * slots_b - br_mispredict_slots_a
+	fetch_lat_slots = GET_METRIC(metric_b, 6) * slots_b - fetch_lat_slots_a
+	mem_bound_slots = GET_METRIC(metric_b, 7) * slots_b - mem_bound_slots_a
+
+	slots_delta = slots_b - slots_a
+	heavy_ops_ratio = (float)heavy_ops_slots / slots_delta
+	light_ops_ratio = retiring_ratio - heavy_ops_ratio;
+
+	br_mispredict_ratio = (float)br_mispredict_slots / slots_delta
+	machine_clears_ratio = bad_spec_ratio - br_mispredict_ratio;
+
+	fetch_lat_ratio = (float)fetch_lat_slots / slots_delta
+	fetch_bw_ratio = fe_bound_ratio - fetch_lat_ratio;
+
+	mem_bound_ratio = (float)mem_bound_slots / slota_delta
+	core_bound_ratio = be_bound_ratio - mem_bound_ratio;
+
+	printf("Heavy Operations %.2f%% Light Operations %.2f%% "
+	       "Branch Mispredict %.2f%% Machine Clears %.2f%% "
+	       "Fetch Latency %.2f%% Fetch Bandwidth %.2f%% "
+	       "Mem Bound %.2f%% Core Bound %.2f%%\n",
+		heavy_ops_ratio * 100.,
+		light_ops_ratio * 100.,
+		br_mispredict_ratio * 100.,
+		machine_clears_ratio * 100.,
+		fetch_lat_ratio * 100.,
+		fetch_bw_ratio * 100.,
+		mem_bound_ratio * 100.,
+		core_bound_ratio * 100.);
+
 Resetting metrics counters
 ==========================
 
@@ -248,6 +300,24 @@
 group, the second event of the group is the sampling event.
 For example, perf record -e '{slots, $sampling_event, topdown-retiring}:S'
 
+Extension on Sapphire Rapids Server
+===================================
+The metrics counter is extended to support TMA method level 2 metrics.
+The lower half of the register is the TMA level 1 metrics (legacy).
+The upper half is also divided into four 8-bit fields for the new level 2
+metrics. Four more TopDown metric events are exposed for the end-users,
+topdown-heavy-ops, topdown-br-mispredict, topdown-fetch-lat and
+topdown-mem-bound.
+
+Each of the new level 2 metrics in the upper half is a subset of the
+corresponding level 1 metric in the lower half. Software can deduce the
+other four level 2 metrics by subtracting corresponding metrics as below.
+
+    Light_Operations = Retiring - Heavy_Operations
+    Machine_Clears = Bad_Speculation - Branch_Mispredicts
+    Fetch_Bandwidth = Frontend_Bound - Fetch_Latency
+    Core_Bound = Backend_Bound - Memory_Bound
+
 
 [1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
 [2] https://github.com/andikleen/pmu-tools/wiki/toplev-manual
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index ce8516e..d8e59d3 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -621,6 +621,15 @@
   endif
 endif
 
+ifdef BUILD_BPF_SKEL
+  $(call feature_check,clang-bpf-co-re)
+  ifeq ($(feature-clang-bpf-co-re), 0)
+    dummy := $(error Error: clang too old. Please install recent clang)
+  endif
+  $(call detected,CONFIG_PERF_BPF_SKEL)
+  CFLAGS += -DHAVE_BPF_SKEL
+endif
+
 dwarf-post-unwind := 1
 dwarf-post-unwind-text := BUG
 
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index f4df753..5345ac7 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -126,6 +126,8 @@
 #
 # Define NO_LIBDEBUGINFOD if you do not want support debuginfod
 #
+# Define BUILD_BPF_SKEL to enable BPF skeletons
+#
 
 # As per kernel Makefile, avoid funny character set dependencies
 unexport LC_ALL
@@ -175,6 +177,12 @@
 
 LD += $(EXTRA_LDFLAGS)
 
+HOSTCC  ?= gcc
+HOSTLD  ?= ld
+HOSTAR  ?= ar
+CLANG   ?= clang
+LLVM_STRIP ?= llvm-strip
+
 PKG_CONFIG = $(CROSS_COMPILE)pkg-config
 
 RM      = rm -f
@@ -730,7 +738,8 @@
 	$(x86_arch_prctl_code_array) \
 	$(rename_flags_array) \
 	$(arch_errno_name_array) \
-	$(sync_file_range_arrays)
+	$(sync_file_range_arrays) \
+	bpf-skel
 
 $(OUTPUT)%.o: %.c prepare FORCE
 	$(Q)$(MAKE) -f $(srctree)/tools/build/Makefile.build dir=$(build-dir) $@
@@ -1003,7 +1012,43 @@
 python-clean:
 	$(python-clean)
 
-clean:: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean $(LIBBPF)-clean $(LIBSUBCMD)-clean $(LIBPERF)-clean config-clean fixdep-clean python-clean
+SKEL_OUT := $(abspath $(OUTPUT)util/bpf_skel)
+SKEL_TMP_OUT := $(abspath $(SKEL_OUT)/.tmp)
+SKELETONS := $(SKEL_OUT)/bpf_prog_profiler.skel.h
+
+ifdef BUILD_BPF_SKEL
+BPFTOOL := $(SKEL_TMP_OUT)/bootstrap/bpftool
+LIBBPF_SRC := $(abspath ../lib/bpf)
+BPF_INCLUDE := -I$(SKEL_TMP_OUT)/.. -I$(BPF_PATH) -I$(LIBBPF_SRC)/..
+
+$(SKEL_TMP_OUT):
+	$(Q)$(MKDIR) -p $@
+
+$(BPFTOOL): | $(SKEL_TMP_OUT)
+	CFLAGS= $(MAKE) -C ../bpf/bpftool \
+		OUTPUT=$(SKEL_TMP_OUT)/ bootstrap
+
+$(SKEL_TMP_OUT)/%.bpf.o: util/bpf_skel/%.bpf.c $(LIBBPF) | $(SKEL_TMP_OUT)
+	$(QUIET_CLANG)$(CLANG) -g -O2 -target bpf $(BPF_INCLUDE) \
+	  -c $(filter util/bpf_skel/%.bpf.c,$^) -o $@ && $(LLVM_STRIP) -g $@
+
+$(SKEL_OUT)/%.skel.h: $(SKEL_TMP_OUT)/%.bpf.o | $(BPFTOOL)
+	$(QUIET_GENSKEL)$(BPFTOOL) gen skeleton $< > $@
+
+bpf-skel: $(SKELETONS)
+
+.PRECIOUS: $(SKEL_TMP_OUT)/%.bpf.o
+
+else # BUILD_BPF_SKEL
+
+bpf-skel:
+
+endif # BUILD_BPF_SKEL
+
+bpf-skel-clean:
+	$(call QUIET_CLEAN, bpf-skel) $(RM) -r $(SKEL_TMP_OUT) $(SKELETONS)
+
+clean:: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean $(LIBBPF)-clean $(LIBSUBCMD)-clean $(LIBPERF)-clean config-clean fixdep-clean python-clean bpf-skel-clean
 	$(call QUIET_CLEAN, core-objs)  $(RM) $(LIBPERF_A) $(OUTPUT)perf-archive $(OUTPUT)perf-with-kcore $(LANG_BINDINGS)
 	$(Q)find $(if $(OUTPUT),$(OUTPUT),.) -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
 	$(Q)$(RM) $(OUTPUT).config-detected
diff --git a/tools/perf/arch/arm/include/perf_regs.h b/tools/perf/arch/arm/include/perf_regs.h
index ed20e02..4085419 100644
--- a/tools/perf/arch/arm/include/perf_regs.h
+++ b/tools/perf/arch/arm/include/perf_regs.h
@@ -15,7 +15,7 @@
 #define PERF_REG_IP	PERF_REG_ARM_PC
 #define PERF_REG_SP	PERF_REG_ARM_SP
 
-static inline const char *perf_reg_name(int id)
+static inline const char *__perf_reg_name(int id)
 {
 	switch (id) {
 	case PERF_REG_ARM_R0:
diff --git a/tools/perf/arch/arm64/include/perf_regs.h b/tools/perf/arch/arm64/include/perf_regs.h
index baaa5e6..fa3e0745 100644
--- a/tools/perf/arch/arm64/include/perf_regs.h
+++ b/tools/perf/arch/arm64/include/perf_regs.h
@@ -15,7 +15,7 @@
 #define PERF_REG_IP	PERF_REG_ARM64_PC
 #define PERF_REG_SP	PERF_REG_ARM64_SP
 
-static inline const char *perf_reg_name(int id)
+static inline const char *__perf_reg_name(int id)
 {
 	switch (id) {
 	case PERF_REG_ARM64_X0:
diff --git a/tools/perf/arch/arm64/util/machine.c b/tools/perf/arch/arm64/util/machine.c
index d41b27e..40c5e0b 100644
--- a/tools/perf/arch/arm64/util/machine.c
+++ b/tools/perf/arch/arm64/util/machine.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 
+#include <inttypes.h>
 #include <stdio.h>
 #include <string.h>
 #include "debug.h"
@@ -23,5 +24,5 @@
 		p->end += SYMBOL_LIMIT;
 	else
 		p->end = c->start;
-	pr_debug4("%s sym:%s end:%#lx\n", __func__, p->name, p->end);
+	pr_debug4("%s sym:%s end:%#" PRIx64 "\n", __func__, p->name, p->end);
 }
diff --git a/tools/perf/arch/arm64/util/perf_regs.c b/tools/perf/arch/arm64/util/perf_regs.c
index 54efa12..2518cde 100644
--- a/tools/perf/arch/arm64/util/perf_regs.c
+++ b/tools/perf/arch/arm64/util/perf_regs.c
@@ -1,4 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
+#include <errno.h>
+#include <regex.h>
+#include <string.h>
+#include <linux/kernel.h>
+#include <linux/zalloc.h>
+
+#include "../../../util/debug.h"
+#include "../../../util/event.h"
 #include "../../../util/perf_regs.h"
 
 const struct sample_reg sample_reg_masks[] = {
@@ -37,3 +45,89 @@
 	SMPL_REG(pc, PERF_REG_ARM64_PC),
 	SMPL_REG_END
 };
+
+/* %xNUM */
+#define SDT_OP_REGEX1  "^(x[1-2]?[0-9]|3[0-1])$"
+
+/* [sp], [sp, NUM] */
+#define SDT_OP_REGEX2  "^\\[sp(, )?([0-9]+)?\\]$"
+
+static regex_t sdt_op_regex1, sdt_op_regex2;
+
+static int sdt_init_op_regex(void)
+{
+	static int initialized;
+	int ret = 0;
+
+	if (initialized)
+		return 0;
+
+	ret = regcomp(&sdt_op_regex1, SDT_OP_REGEX1, REG_EXTENDED);
+	if (ret)
+		goto error;
+
+	ret = regcomp(&sdt_op_regex2, SDT_OP_REGEX2, REG_EXTENDED);
+	if (ret)
+		goto free_regex1;
+
+	initialized = 1;
+	return 0;
+
+free_regex1:
+	regfree(&sdt_op_regex1);
+error:
+	pr_debug4("Regex compilation error.\n");
+	return ret;
+}
+
+/*
+ * SDT marker arguments on Arm64 uses %xREG or [sp, NUM], currently
+ * support these two formats.
+ */
+int arch_sdt_arg_parse_op(char *old_op, char **new_op)
+{
+	int ret, new_len;
+	regmatch_t rm[5];
+
+	ret = sdt_init_op_regex();
+	if (ret < 0)
+		return ret;
+
+	if (!regexec(&sdt_op_regex1, old_op, 3, rm, 0)) {
+		/* Extract xNUM */
+		new_len = 2;	/* % NULL */
+		new_len += (int)(rm[1].rm_eo - rm[1].rm_so);
+
+		*new_op = zalloc(new_len);
+		if (!*new_op)
+			return -ENOMEM;
+
+		scnprintf(*new_op, new_len, "%%%.*s",
+			(int)(rm[1].rm_eo - rm[1].rm_so), old_op + rm[1].rm_so);
+	} else if (!regexec(&sdt_op_regex2, old_op, 5, rm, 0)) {
+		/* [sp], [sp, NUM] or [sp,NUM] */
+		new_len = 7;	/* + ( % s p ) NULL */
+
+		/* If the arugment is [sp], need to fill offset '0' */
+		if (rm[2].rm_so == -1)
+			new_len += 1;
+		else
+			new_len += (int)(rm[2].rm_eo - rm[2].rm_so);
+
+		*new_op = zalloc(new_len);
+		if (!*new_op)
+			return -ENOMEM;
+
+		if (rm[2].rm_so == -1)
+			scnprintf(*new_op, new_len, "+0(%%sp)");
+		else
+			scnprintf(*new_op, new_len, "+%.*s(%%sp)",
+				  (int)(rm[2].rm_eo - rm[2].rm_so),
+				  old_op + rm[2].rm_so);
+	} else {
+		pr_debug4("Skipping unsupported SDT argument: %s\n", old_op);
+		return SDT_ARG_SKIP;
+	}
+
+	return SDT_ARG_VALID;
+}
diff --git a/tools/perf/arch/csky/include/perf_regs.h b/tools/perf/arch/csky/include/perf_regs.h
index 8f336ea..25ac3bd 100644
--- a/tools/perf/arch/csky/include/perf_regs.h
+++ b/tools/perf/arch/csky/include/perf_regs.h
@@ -15,7 +15,7 @@
 #define PERF_REG_IP	PERF_REG_CSKY_PC
 #define PERF_REG_SP	PERF_REG_CSKY_SP
 
-static inline const char *perf_reg_name(int id)
+static inline const char *__perf_reg_name(int id)
 {
 	switch (id) {
 	case PERF_REG_CSKY_A0:
diff --git a/tools/perf/arch/powerpc/include/perf_regs.h b/tools/perf/arch/powerpc/include/perf_regs.h
index 63f3ac9..04e5dc0 100644
--- a/tools/perf/arch/powerpc/include/perf_regs.h
+++ b/tools/perf/arch/powerpc/include/perf_regs.h
@@ -71,9 +71,15 @@
 	[PERF_REG_POWERPC_MMCR3] = "mmcr3",
 	[PERF_REG_POWERPC_SIER2] = "sier2",
 	[PERF_REG_POWERPC_SIER3] = "sier3",
+	[PERF_REG_POWERPC_PMC1] = "pmc1",
+	[PERF_REG_POWERPC_PMC2] = "pmc2",
+	[PERF_REG_POWERPC_PMC3] = "pmc3",
+	[PERF_REG_POWERPC_PMC4] = "pmc4",
+	[PERF_REG_POWERPC_PMC5] = "pmc5",
+	[PERF_REG_POWERPC_PMC6] = "pmc6",
 };
 
-static inline const char *perf_reg_name(int id)
+static inline const char *__perf_reg_name(int id)
 {
 	return reg_names[id];
 }
diff --git a/tools/perf/arch/powerpc/util/Build b/tools/perf/arch/powerpc/util/Build
index e86e210..b7945e5 100644
--- a/tools/perf/arch/powerpc/util/Build
+++ b/tools/perf/arch/powerpc/util/Build
@@ -1,4 +1,5 @@
 perf-y += header.o
+perf-y += machine.o
 perf-y += kvm-stat.o
 perf-y += perf_regs.o
 perf-y += mem-events.o
diff --git a/tools/perf/arch/powerpc/util/machine.c b/tools/perf/arch/powerpc/util/machine.c
new file mode 100644
index 0000000..e652a1a
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/machine.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <inttypes.h>
+#include <stdio.h>
+#include <string.h>
+#include <internal/lib.h> // page_size
+#include "debug.h"
+#include "symbol.h"
+
+/* On powerpc kernel text segment start at memory addresses, 0xc000000000000000
+ * whereas the modules are located at very high memory addresses,
+ * for example 0xc00800000xxxxxxx. The gap between end of kernel text segment
+ * and beginning of first module's text segment is very high.
+ * Therefore do not fill this gap and do not assign it to the kernel dso map.
+ */
+
+void arch__symbols__fixup_end(struct symbol *p, struct symbol *c)
+{
+	if (strchr(p->name, '[') == NULL && strchr(c->name, '['))
+		/* Limit the range of last kernel symbol */
+		p->end += page_size;
+	else
+		p->end = c->start;
+	pr_debug4("%s sym:%s end:%#" PRIx64 "\n", __func__, p->name, p->end);
+}
diff --git a/tools/perf/arch/powerpc/util/perf_regs.c b/tools/perf/arch/powerpc/util/perf_regs.c
index 2b6d470..8116a25 100644
--- a/tools/perf/arch/powerpc/util/perf_regs.c
+++ b/tools/perf/arch/powerpc/util/perf_regs.c
@@ -68,6 +68,12 @@
 	SMPL_REG(mmcr3, PERF_REG_POWERPC_MMCR3),
 	SMPL_REG(sier2, PERF_REG_POWERPC_SIER2),
 	SMPL_REG(sier3, PERF_REG_POWERPC_SIER3),
+	SMPL_REG(pmc1, PERF_REG_POWERPC_PMC1),
+	SMPL_REG(pmc2, PERF_REG_POWERPC_PMC2),
+	SMPL_REG(pmc3, PERF_REG_POWERPC_PMC3),
+	SMPL_REG(pmc4, PERF_REG_POWERPC_PMC4),
+	SMPL_REG(pmc5, PERF_REG_POWERPC_PMC5),
+	SMPL_REG(pmc6, PERF_REG_POWERPC_PMC6),
 	SMPL_REG_END
 };
 
diff --git a/tools/perf/arch/riscv/include/perf_regs.h b/tools/perf/arch/riscv/include/perf_regs.h
index 7a8bcde..6b02a76 100644
--- a/tools/perf/arch/riscv/include/perf_regs.h
+++ b/tools/perf/arch/riscv/include/perf_regs.h
@@ -19,7 +19,7 @@
 #define PERF_REG_IP	PERF_REG_RISCV_PC
 #define PERF_REG_SP	PERF_REG_RISCV_SP
 
-static inline const char *perf_reg_name(int id)
+static inline const char *__perf_reg_name(int id)
 {
 	switch (id) {
 	case PERF_REG_RISCV_PC:
diff --git a/tools/perf/arch/s390/include/perf_regs.h b/tools/perf/arch/s390/include/perf_regs.h
index bcfbaed..ce30315 100644
--- a/tools/perf/arch/s390/include/perf_regs.h
+++ b/tools/perf/arch/s390/include/perf_regs.h
@@ -14,7 +14,7 @@
 #define PERF_REG_IP PERF_REG_S390_PC
 #define PERF_REG_SP PERF_REG_S390_R15
 
-static inline const char *perf_reg_name(int id)
+static inline const char *__perf_reg_name(int id)
 {
 	switch (id) {
 	case PERF_REG_S390_R0:
diff --git a/tools/perf/arch/s390/util/machine.c b/tools/perf/arch/s390/util/machine.c
index 724efb2..7644a4f 100644
--- a/tools/perf/arch/s390/util/machine.c
+++ b/tools/perf/arch/s390/util/machine.c
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
+#include <inttypes.h>
 #include <unistd.h>
 #include <stdio.h>
 #include <string.h>
@@ -48,5 +49,5 @@
 		p->end = roundup(p->end, page_size);
 	else
 		p->end = c->start;
-	pr_debug4("%s sym:%s end:%#lx\n", __func__, p->name, p->end);
+	pr_debug4("%s sym:%s end:%#" PRIx64 "\n", __func__, p->name, p->end);
 }
diff --git a/tools/perf/arch/x86/include/perf_regs.h b/tools/perf/arch/x86/include/perf_regs.h
index b732133..cddc4cd 100644
--- a/tools/perf/arch/x86/include/perf_regs.h
+++ b/tools/perf/arch/x86/include/perf_regs.h
@@ -23,7 +23,7 @@
 #define PERF_REG_IP PERF_REG_X86_IP
 #define PERF_REG_SP PERF_REG_X86_SP
 
-static inline const char *perf_reg_name(int id)
+static inline const char *__perf_reg_name(int id)
 {
 	switch (id) {
 	case PERF_REG_X86_AX:
diff --git a/tools/perf/arch/x86/tests/insn-x86.c b/tools/perf/arch/x86/tests/insn-x86.c
index 745f29a..f782ef8 100644
--- a/tools/perf/arch/x86/tests/insn-x86.c
+++ b/tools/perf/arch/x86/tests/insn-x86.c
@@ -48,6 +48,7 @@
 		{"int",     INTEL_PT_OP_INT},
 		{"syscall", INTEL_PT_OP_SYSCALL},
 		{"sysret",  INTEL_PT_OP_SYSRET},
+		{"vmentry",  INTEL_PT_OP_VMENTRY},
 		{NULL, 0},
 	};
 	struct val_data *val;
diff --git a/tools/perf/arch/x86/tests/intel-pt-pkt-decoder-test.c b/tools/perf/arch/x86/tests/intel-pt-pkt-decoder-test.c
index 901bf1f..c933e3d 100644
--- a/tools/perf/arch/x86/tests/intel-pt-pkt-decoder-test.c
+++ b/tools/perf/arch/x86/tests/intel-pt-pkt-decoder-test.c
@@ -66,8 +66,8 @@
 	{7, {0x9d, 1, 2, 3, 4, 5, 6}, 0, {INTEL_PT_FUP, 4, 0x60504030201}, 0, 0 },
 	{9, {0xdd, 1, 2, 3, 4, 5, 6, 7, 8}, 0, {INTEL_PT_FUP, 6, 0x807060504030201}, 0, 0 },
 	/* Paging Information Packet */
-	{8, {0x02, 0x43, 2, 4, 6, 8, 10, 12}, 0, {INTEL_PT_PIP, 0, 0x60504030201}, 0, 0 },
-	{8, {0x02, 0x43, 3, 4, 6, 8, 10, 12}, 0, {INTEL_PT_PIP, 0, 0x60504030201 | (1ULL << 63)}, 0, 0 },
+	{8, {0x02, 0x43, 2, 4, 6, 8, 10, 12}, 0, {INTEL_PT_PIP, 0, 0xC0A08060402}, 0, 0 },
+	{8, {0x02, 0x43, 3, 4, 6, 8, 10, 12}, 0, {INTEL_PT_PIP, 0, 0xC0A08060403}, 0, 0 },
 	/* Mode Exec Packet */
 	{2, {0x99, 0x00}, 0, {INTEL_PT_MODE_EXEC, 0, 16}, 0, 0 },
 	{2, {0x99, 0x01}, 0, {INTEL_PT_MODE_EXEC, 0, 64}, 0, 0 },
diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index 347c39b..0c72d41 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -6,6 +6,9 @@
 perf-y += topdown.o
 perf-y += machine.o
 perf-y += event.o
+perf-y += evlist.o
+perf-y += mem-events.o
+perf-y += evsel.o
 
 perf-$(CONFIG_DWARF) += dwarf-regs.o
 perf-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o
diff --git a/tools/perf/arch/x86/util/event.c b/tools/perf/arch/x86/util/event.c
index 047dc00..9b31734 100644
--- a/tools/perf/arch/x86/util/event.c
+++ b/tools/perf/arch/x86/util/event.c
@@ -75,3 +75,28 @@
 }
 
 #endif
+
+void arch_perf_parse_sample_weight(struct perf_sample *data,
+				   const __u64 *array, u64 type)
+{
+	union perf_sample_weight weight;
+
+	weight.full = *array;
+	if (type & PERF_SAMPLE_WEIGHT)
+		data->weight = weight.full;
+	else {
+		data->weight = weight.var1_dw;
+		data->ins_lat = weight.var2_w;
+	}
+}
+
+void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
+					__u64 *array, u64 type)
+{
+	*array = data->weight;
+
+	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
+		*array &= 0xffffffff;
+		*array |= ((u64)data->ins_lat << 32);
+	}
+}
diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
new file mode 100644
index 0000000..8c6732c
--- /dev/null
+++ b/tools/perf/arch/x86/util/evlist.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stdio.h>
+#include "util/pmu.h"
+#include "util/evlist.h"
+#include "util/parse-events.h"
+
+#define TOPDOWN_L1_EVENTS	"{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound}"
+
+int arch_evlist__add_default_attrs(struct evlist *evlist)
+{
+	if (!pmu_have_event("cpu", "slots"))
+		return 0;
+
+	return parse_events(evlist, TOPDOWN_L1_EVENTS, NULL);
+}
diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/evsel.c
new file mode 100644
index 0000000..2f733cd
--- /dev/null
+++ b/tools/perf/arch/x86/util/evsel.c
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stdio.h>
+#include "util/evsel.h"
+
+void arch_evsel__set_sample_weight(struct evsel *evsel)
+{
+	evsel__set_sample_bit(evsel, WEIGHT_STRUCT);
+}
diff --git a/tools/perf/arch/x86/util/mem-events.c b/tools/perf/arch/x86/util/mem-events.c
new file mode 100644
index 0000000..588110f
--- /dev/null
+++ b/tools/perf/arch/x86/util/mem-events.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "util/pmu.h"
+#include "map_symbol.h"
+#include "mem-events.h"
+
+static char mem_loads_name[100];
+static bool mem_loads_name__init;
+
+#define MEM_LOADS_AUX		0x8203
+#define MEM_LOADS_AUX_NAME	"{cpu/mem-loads-aux/,cpu/mem-loads,ldlat=%u/pp}:S"
+
+bool is_mem_loads_aux_event(struct evsel *leader)
+{
+	if (!pmu_have_event("cpu", "mem-loads-aux"))
+		return false;
+
+	return leader->core.attr.config == MEM_LOADS_AUX;
+}
+
+char *perf_mem_events__name(int i)
+{
+	struct perf_mem_event *e = perf_mem_events__ptr(i);
+
+	if (!e)
+		return NULL;
+
+	if (i == PERF_MEM_EVENTS__LOAD) {
+		if (mem_loads_name__init)
+			return mem_loads_name;
+
+		mem_loads_name__init = true;
+
+		if (pmu_have_event("cpu", "mem-loads-aux")) {
+			scnprintf(mem_loads_name, sizeof(mem_loads_name),
+				  MEM_LOADS_AUX_NAME, perf_mem_events__loads_ldlat);
+		} else {
+			scnprintf(mem_loads_name, sizeof(mem_loads_name),
+				  e->name, perf_mem_events__loads_ldlat);
+		}
+		return mem_loads_name;
+	}
+
+	return (char *)e->name;
+}
diff --git a/tools/perf/bench/epoll-ctl.c b/tools/perf/bench/epoll-ctl.c
index ca2d591..ddaca75 100644
--- a/tools/perf/bench/epoll-ctl.c
+++ b/tools/perf/bench/epoll-ctl.c
@@ -21,7 +21,6 @@
 #include <sys/resource.h>
 #include <sys/epoll.h>
 #include <sys/eventfd.h>
-#include <internal/cpumap.h>
 #include <perf/cpumap.h>
 
 #include "../util/stat.h"
diff --git a/tools/perf/bench/epoll-wait.c b/tools/perf/bench/epoll-wait.c
index 75dca97..0a0ff12 100644
--- a/tools/perf/bench/epoll-wait.c
+++ b/tools/perf/bench/epoll-wait.c
@@ -76,7 +76,6 @@
 #include <sys/epoll.h>
 #include <sys/eventfd.h>
 #include <sys/types.h>
-#include <internal/cpumap.h>
 #include <perf/cpumap.h>
 
 #include "../util/stat.h"
diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
index 915bf3d..b65373c 100644
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -20,7 +20,6 @@
 #include <linux/kernel.h>
 #include <linux/zalloc.h>
 #include <sys/time.h>
-#include <internal/cpumap.h>
 #include <perf/cpumap.h>
 
 #include "../util/stat.h"
diff --git a/tools/perf/bench/futex-lock-pi.c b/tools/perf/bench/futex-lock-pi.c
index bb25d8b..89c6d16 100644
--- a/tools/perf/bench/futex-lock-pi.c
+++ b/tools/perf/bench/futex-lock-pi.c
@@ -14,7 +14,6 @@
 #include <linux/kernel.h>
 #include <linux/zalloc.h>
 #include <errno.h>
-#include <internal/cpumap.h>
 #include <perf/cpumap.h>
 #include "bench.h"
 #include "futex.h"
diff --git a/tools/perf/bench/futex-requeue.c b/tools/perf/bench/futex-requeue.c
index 7a15c2e..5fa2329 100644
--- a/tools/perf/bench/futex-requeue.c
+++ b/tools/perf/bench/futex-requeue.c
@@ -20,7 +20,6 @@
 #include <linux/kernel.h>
 #include <linux/time64.h>
 #include <errno.h>
-#include <internal/cpumap.h>
 #include <perf/cpumap.h>
 #include "bench.h"
 #include "futex.h"
diff --git a/tools/perf/bench/futex-wake-parallel.c b/tools/perf/bench/futex-wake-parallel.c
index cd2b81a..6e6f524 100644
--- a/tools/perf/bench/futex-wake-parallel.c
+++ b/tools/perf/bench/futex-wake-parallel.c
@@ -29,7 +29,6 @@
 #include <linux/time64.h>
 #include <errno.h>
 #include "futex.h"
-#include <internal/cpumap.h>
 #include <perf/cpumap.h>
 
 #include <err.h>
diff --git a/tools/perf/bench/futex-wake.c b/tools/perf/bench/futex-wake.c
index 2dfcef3..6d21786 100644
--- a/tools/perf/bench/futex-wake.c
+++ b/tools/perf/bench/futex-wake.c
@@ -20,7 +20,6 @@
 #include <linux/kernel.h>
 #include <linux/time64.h>
 #include <errno.h>
-#include <internal/cpumap.h>
 #include <perf/cpumap.h>
 #include "bench.h"
 #include "futex.h"
diff --git a/tools/perf/builtin-buildid-cache.c b/tools/perf/builtin-buildid-cache.c
index a254119..ecd0d3c 100644
--- a/tools/perf/builtin-buildid-cache.c
+++ b/tools/perf/builtin-buildid-cache.c
@@ -27,6 +27,7 @@
 #include "util/time-utils.h"
 #include "util/util.h"
 #include "util/probe-file.h"
+#include "util/config.h"
 #include <linux/string.h>
 #include <linux/err.h>
 
@@ -348,12 +349,21 @@
 	return 0;
 }
 
+static int perf_buildid_cache_config(const char *var, const char *value, void *cb)
+{
+	const char **debuginfod = cb;
+
+	if (!strcmp(var, "buildid-cache.debuginfod"))
+		*debuginfod = strdup(value);
+
+	return 0;
+}
+
 int cmd_buildid_cache(int argc, const char **argv)
 {
 	struct strlist *list;
 	struct str_node *pos;
-	int ret = 0;
-	int ns_id = -1;
+	int ret, ns_id = -1;
 	bool force = false;
 	bool list_files = false;
 	bool opts_flag = false;
@@ -363,7 +373,8 @@
 		   *purge_name_list_str = NULL,
 		   *missing_filename = NULL,
 		   *update_name_list_str = NULL,
-		   *kcore_filename = NULL;
+		   *kcore_filename = NULL,
+		   *debuginfod = NULL;
 	char sbuf[STRERR_BUFSIZE];
 
 	struct perf_data data = {
@@ -388,6 +399,8 @@
 	OPT_BOOLEAN('f', "force", &force, "don't complain, do it"),
 	OPT_STRING('u', "update", &update_name_list_str, "file list",
 		    "file(s) to update"),
+	OPT_STRING(0, "debuginfod", &debuginfod, "debuginfod url",
+		    "set debuginfod url"),
 	OPT_INCR('v', "verbose", &verbose, "be more verbose"),
 	OPT_INTEGER(0, "target-ns", &ns_id, "target pid for namespace context"),
 	OPT_END()
@@ -397,6 +410,10 @@
 		NULL
 	};
 
+	ret = perf_config(perf_buildid_cache_config, &debuginfod);
+	if (ret)
+		return ret;
+
 	argc = parse_options(argc, argv, buildid_cache_options,
 			     buildid_cache_usage, 0);
 
@@ -408,6 +425,11 @@
 	if (argc || !(list_files || opts_flag))
 		usage_with_options(buildid_cache_usage, buildid_cache_options);
 
+	if (debuginfod) {
+		pr_debug("DEBUGINFOD_URLS=%s\n", debuginfod);
+		setenv("DEBUGINFOD_URLS", debuginfod, 1);
+	}
+
 	/* -l is exclusive. It can not be used with other options. */
 	if (list_files && opts_flag) {
 		usage_with_options_msg(buildid_cache_usage,
diff --git a/tools/perf/builtin-buildid-list.c b/tools/perf/builtin-buildid-list.c
index e3ef755..87f5b1a 100644
--- a/tools/perf/builtin-buildid-list.c
+++ b/tools/perf/builtin-buildid-list.c
@@ -77,6 +77,9 @@
 	    perf_header__has_feat(&session->header, HEADER_AUXTRACE))
 		with_hits = false;
 
+	if (!perf_header__has_feat(&session->header, HEADER_BUILD_ID))
+		with_hits = true;
+
 	/*
 	 * in pipe-mode, the only way to get the buildids is to parse
 	 * the record stream. Buildids are stored as RECORD_HEADER_BUILD_ID
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index c5babea..e3b9d63 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -97,8 +97,8 @@
 	bool			 symbol_full;
 	bool			 stitch_lbr;
 
-	/* HITM shared clines stats */
-	struct c2c_stats	hitm_stats;
+	/* Shared cache line stats */
+	struct c2c_stats	shared_clines_stats;
 	int			shared_clines;
 
 	int			 display;
@@ -876,7 +876,7 @@
 	return &hists->stats;
 }
 
-static double percent(int st, int tot)
+static double percent(u32 st, u32 tot)
 {
 	return tot ? 100. * (double) st / (double) tot : 0;
 }
@@ -1048,6 +1048,19 @@
 	return 0;
 }
 
+static int display_metrics(struct perf_hpp *hpp, u32 val, u32 sum)
+{
+	int ret;
+
+	if (sum != 0)
+		ret = scnprintf(hpp->buf, hpp->size, "%5.1f%% ",
+				percent(val, sum));
+	else
+		ret = scnprintf(hpp->buf, hpp->size, "%6s ", "n/a");
+
+	return ret;
+}
+
 static int
 node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
 	   struct hist_entry *he)
@@ -1091,29 +1104,23 @@
 			ret = scnprintf(hpp->buf, hpp->size, "%2d{%2d ", node, num);
 			advance_hpp(hpp, ret);
 
-		#define DISPLAY_HITM(__h)						\
-			if (c2c_he->stats.__h> 0) {					\
-				ret = scnprintf(hpp->buf, hpp->size, "%5.1f%% ",	\
-						percent(stats->__h, c2c_he->stats.__h));\
-			} else {							\
-				ret = scnprintf(hpp->buf, hpp->size, "%6s ", "n/a");	\
-			}
-
 			switch (c2c.display) {
 			case DISPLAY_RMT:
-				DISPLAY_HITM(rmt_hitm);
+				ret = display_metrics(hpp, stats->rmt_hitm,
+						      c2c_he->stats.rmt_hitm);
 				break;
 			case DISPLAY_LCL:
-				DISPLAY_HITM(lcl_hitm);
+				ret = display_metrics(hpp, stats->lcl_hitm,
+						      c2c_he->stats.lcl_hitm);
 				break;
 			case DISPLAY_TOT:
-				DISPLAY_HITM(tot_hitm);
+				ret = display_metrics(hpp, stats->tot_hitm,
+						      c2c_he->stats.tot_hitm);
+				break;
 			default:
 				break;
 			}
 
-		#undef DISPLAY_HITM
-
 			advance_hpp(hpp, ret);
 
 			if (c2c_he->stats.store > 0) {
@@ -1851,53 +1858,69 @@
 
 #define DISPLAY_LINE_LIMIT  0.001
 
+static u8 filter_display(u32 val, u32 sum)
+{
+	if (sum == 0 || ((double)val / sum) < DISPLAY_LINE_LIMIT)
+		return HIST_FILTER__C2C;
+
+	return 0;
+}
+
 static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
 {
 	struct c2c_hist_entry *c2c_he;
-	double ld_dist;
 
 	if (c2c.show_all)
 		return true;
 
 	c2c_he = container_of(he, struct c2c_hist_entry, he);
 
-#define FILTER_HITM(__h)						\
-	if (stats->__h) {						\
-		ld_dist = ((double)c2c_he->stats.__h / stats->__h);	\
-		if (ld_dist < DISPLAY_LINE_LIMIT)			\
-			he->filtered = HIST_FILTER__C2C;		\
-	} else {							\
-		he->filtered = HIST_FILTER__C2C;			\
-	}
-
 	switch (c2c.display) {
 	case DISPLAY_LCL:
-		FILTER_HITM(lcl_hitm);
+		he->filtered = filter_display(c2c_he->stats.lcl_hitm,
+					      stats->lcl_hitm);
 		break;
 	case DISPLAY_RMT:
-		FILTER_HITM(rmt_hitm);
+		he->filtered = filter_display(c2c_he->stats.rmt_hitm,
+					      stats->rmt_hitm);
 		break;
 	case DISPLAY_TOT:
-		FILTER_HITM(tot_hitm);
+		he->filtered = filter_display(c2c_he->stats.tot_hitm,
+					      stats->tot_hitm);
+		break;
 	default:
 		break;
 	}
 
-#undef FILTER_HITM
-
 	return he->filtered == 0;
 }
 
-static inline int valid_hitm_or_store(struct hist_entry *he)
+static inline bool is_valid_hist_entry(struct hist_entry *he)
 {
 	struct c2c_hist_entry *c2c_he;
-	bool has_hitm;
+	bool has_record = false;
 
 	c2c_he = container_of(he, struct c2c_hist_entry, he);
-	has_hitm = c2c.display == DISPLAY_TOT ? c2c_he->stats.tot_hitm :
-		   c2c.display == DISPLAY_LCL ? c2c_he->stats.lcl_hitm :
-						c2c_he->stats.rmt_hitm;
-	return has_hitm || c2c_he->stats.store;
+
+	/* It's a valid entry if contains stores */
+	if (c2c_he->stats.store)
+		return true;
+
+	switch (c2c.display) {
+	case DISPLAY_LCL:
+		has_record = !!c2c_he->stats.lcl_hitm;
+		break;
+	case DISPLAY_RMT:
+		has_record = !!c2c_he->stats.rmt_hitm;
+		break;
+	case DISPLAY_TOT:
+		has_record = !!c2c_he->stats.tot_hitm;
+		break;
+	default:
+		break;
+	}
+
+	return has_record;
 }
 
 static void set_node_width(struct c2c_hist_entry *c2c_he, int len)
@@ -1951,7 +1974,7 @@
 
 	calc_width(c2c_he);
 
-	if (!valid_hitm_or_store(he))
+	if (!is_valid_hist_entry(he))
 		he->filtered = HIST_FILTER__C2C;
 
 	return 0;
@@ -1961,7 +1984,7 @@
 {
 	struct c2c_hist_entry *c2c_he;
 	struct c2c_hists *c2c_hists;
-	bool display = he__display(he, &c2c.hitm_stats);
+	bool display = he__display(he, &c2c.shared_clines_stats);
 
 	c2c_he = container_of(he, struct c2c_hist_entry, he);
 	c2c_hists = c2c_he->hists;
@@ -2048,14 +2071,14 @@
 
 #define HAS_HITMS(__h) ((__h)->stats.lcl_hitm || (__h)->stats.rmt_hitm)
 
-static int resort_hitm_cb(struct hist_entry *he, void *arg __maybe_unused)
+static int resort_shared_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
 {
 	struct c2c_hist_entry *c2c_he;
 	c2c_he = container_of(he, struct c2c_hist_entry, he);
 
 	if (HAS_HITMS(c2c_he)) {
 		c2c.shared_clines++;
-		c2c_add_stats(&c2c.hitm_stats, &c2c_he->stats);
+		c2c_add_stats(&c2c.shared_clines_stats, &c2c_he->stats);
 	}
 
 	return 0;
@@ -2111,6 +2134,8 @@
 	fprintf(out, "  Load MESI State Exclusive         : %10d\n", stats->ld_excl);
 	fprintf(out, "  Load MESI State Shared            : %10d\n", stats->ld_shared);
 	fprintf(out, "  Load LLC Misses                   : %10d\n", llc_misses);
+	fprintf(out, "  Load access blocked by data       : %10d\n", stats->blk_data);
+	fprintf(out, "  Load access blocked by address    : %10d\n", stats->blk_addr);
 	fprintf(out, "  LLC Misses to Local DRAM          : %10.1f%%\n", ((double)stats->lcl_dram/(double)llc_misses) * 100.);
 	fprintf(out, "  LLC Misses to Remote DRAM         : %10.1f%%\n", ((double)stats->rmt_dram/(double)llc_misses) * 100.);
 	fprintf(out, "  LLC Misses to Remote cache (HIT)  : %10.1f%%\n", ((double)stats->rmt_hit /(double)llc_misses) * 100.);
@@ -2126,7 +2151,7 @@
 
 static void print_shared_cacheline_info(FILE *out)
 {
-	struct c2c_stats *stats = &c2c.hitm_stats;
+	struct c2c_stats *stats = &c2c.shared_clines_stats;
 	int hitm_cnt = stats->lcl_hitm + stats->rmt_hitm;
 
 	fprintf(out, "=================================================\n");
@@ -2139,6 +2164,7 @@
 	fprintf(out, "  L2D hits on shared lines          : %10d\n", stats->ld_l2hit);
 	fprintf(out, "  LLC hits on shared lines          : %10d\n", stats->ld_llchit + stats->lcl_hitm);
 	fprintf(out, "  Locked Access on shared lines     : %10d\n", stats->locks);
+	fprintf(out, "  Blocked Access on shared lines    : %10d\n", stats->blk_data + stats->blk_addr);
 	fprintf(out, "  Store HITs on shared lines        : %10d\n", stats->store);
 	fprintf(out, "  Store L1D hits on shared lines    : %10d\n", stats->st_l1hit);
 	fprintf(out, "  Total Merged records              : %10d\n", hitm_cnt + stats->store);
@@ -2176,16 +2202,17 @@
 	struct perf_hpp_list hpp_list;
 	struct rb_node *nd;
 	int ret;
+	const char *cl_output;
+
+	cl_output = "cl_num,"
+		    "cl_rmt_hitm,"
+		    "cl_lcl_hitm,"
+		    "cl_stores_l1hit,"
+		    "cl_stores_l1miss,"
+		    "dcacheline";
 
 	perf_hpp_list__init(&hpp_list);
-	ret = hpp_list__parse(&hpp_list,
-				"cl_num,"
-				"cl_rmt_hitm,"
-				"cl_lcl_hitm,"
-				"cl_stores_l1hit,"
-				"cl_stores_l1miss,"
-				"dcacheline",
-				NULL);
+	ret = hpp_list__parse(&hpp_list, cl_output, NULL);
 
 	if (WARN_ONCE(ret, "failed to setup sort entries\n"))
 		return;
@@ -2729,6 +2756,7 @@
 	OPT_END()
 	};
 	int err = 0;
+	const char *output_str, *sort_str = NULL;
 
 	argc = parse_options(argc, argv, options, report_c2c_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
@@ -2805,29 +2833,34 @@
 		goto out_mem2node;
 	}
 
-	c2c_hists__reinit(&c2c.hists,
-			"cl_idx,"
-			"dcacheline,"
-			"dcacheline_node,"
-			"dcacheline_count,"
-			"percent_hitm,"
-			"tot_hitm,lcl_hitm,rmt_hitm,"
-			"tot_recs,"
-			"tot_loads,"
-			"tot_stores,"
-			"stores_l1hit,stores_l1miss,"
-			"ld_fbhit,ld_l1hit,ld_l2hit,"
-			"ld_lclhit,lcl_hitm,"
-			"ld_rmthit,rmt_hitm,"
-			"dram_lcl,dram_rmt",
-			c2c.display == DISPLAY_TOT ? "tot_hitm" :
-			c2c.display == DISPLAY_LCL ? "lcl_hitm" : "rmt_hitm"
-			);
+	output_str = "cl_idx,"
+		     "dcacheline,"
+		     "dcacheline_node,"
+		     "dcacheline_count,"
+		     "percent_hitm,"
+		     "tot_hitm,lcl_hitm,rmt_hitm,"
+		     "tot_recs,"
+		     "tot_loads,"
+		     "tot_stores,"
+		     "stores_l1hit,stores_l1miss,"
+		     "ld_fbhit,ld_l1hit,ld_l2hit,"
+		     "ld_lclhit,lcl_hitm,"
+		     "ld_rmthit,rmt_hitm,"
+		     "dram_lcl,dram_rmt";
+
+	if (c2c.display == DISPLAY_TOT)
+		sort_str = "tot_hitm";
+	else if (c2c.display == DISPLAY_RMT)
+		sort_str = "rmt_hitm";
+	else if (c2c.display == DISPLAY_LCL)
+		sort_str = "lcl_hitm";
+
+	c2c_hists__reinit(&c2c.hists, output_str, sort_str);
 
 	ui_progress__init(&prog, c2c.hists.hists.nr_entries, "Sorting...");
 
 	hists__collapse_resort(&c2c.hists.hists, NULL);
-	hists__output_resort_cb(&c2c.hists.hists, &prog, resort_hitm_cb);
+	hists__output_resort_cb(&c2c.hists.hists, &prog, resort_shared_cl_cb);
 	hists__iterate_cb(&c2c.hists.hists, resort_cl_cb);
 
 	ui_progress__finish();
diff --git a/tools/perf/builtin-daemon.c b/tools/perf/builtin-daemon.c
new file mode 100644
index 0000000..617feaf
--- /dev/null
+++ b/tools/perf/builtin-daemon.c
@@ -0,0 +1,1521 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <internal/lib.h>
+#include <subcmd/parse-options.h>
+#include <api/fd/array.h>
+#include <api/fs/fs.h>
+#include <linux/zalloc.h>
+#include <linux/string.h>
+#include <linux/limits.h>
+#include <linux/string.h>
+#include <string.h>
+#include <sys/file.h>
+#include <signal.h>
+#include <stdlib.h>
+#include <time.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <errno.h>
+#include <sys/inotify.h>
+#include <libgen.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/stat.h>
+#include <sys/signalfd.h>
+#include <sys/wait.h>
+#include <poll.h>
+#include <sys/stat.h>
+#include <time.h>
+#include "builtin.h"
+#include "perf.h"
+#include "debug.h"
+#include "config.h"
+#include "util.h"
+
+#define SESSION_OUTPUT  "output"
+#define SESSION_CONTROL "control"
+#define SESSION_ACK     "ack"
+
+/*
+ * Session states:
+ *
+ *   OK       - session is up and running
+ *   RECONFIG - session is pending for reconfiguration,
+ *              new values are already loaded in session object
+ *   KILL     - session is pending to be killed
+ *
+ * Session object life and its state is maintained by
+ * following functions:
+ *
+ *  setup_server_config
+ *    - reads config file and setup session objects
+ *      with following states:
+ *
+ *      OK       - no change needed
+ *      RECONFIG - session needs to be changed
+ *                 (run variable changed)
+ *      KILL     - session needs to be killed
+ *                 (session is no longer in config file)
+ *
+ *  daemon__reconfig
+ *    - scans session objects and does following actions
+ *      for states:
+ *
+ *      OK       - skip
+ *      RECONFIG - session is killed and re-run with new config
+ *      KILL     - session is killed
+ *
+ *    - all sessions have OK state on the function exit
+ */
+enum daemon_session_state {
+	OK,
+	RECONFIG,
+	KILL,
+};
+
+struct daemon_session {
+	char				*base;
+	char				*name;
+	char				*run;
+	char				*control;
+	int				 pid;
+	struct list_head		 list;
+	enum daemon_session_state	 state;
+	time_t				 start;
+};
+
+struct daemon {
+	const char		*config;
+	char			*config_real;
+	char			*config_base;
+	const char		*csv_sep;
+	const char		*base_user;
+	char			*base;
+	struct list_head	 sessions;
+	FILE			*out;
+	char			 perf[PATH_MAX];
+	int			 signal_fd;
+	time_t			 start;
+};
+
+static struct daemon __daemon = {
+	.sessions = LIST_HEAD_INIT(__daemon.sessions),
+};
+
+static const char * const daemon_usage[] = {
+	"perf daemon start [<options>]",
+	"perf daemon [<options>]",
+	NULL
+};
+
+static bool done;
+
+static void sig_handler(int sig __maybe_unused)
+{
+	done = true;
+}
+
+static struct daemon_session *daemon__add_session(struct daemon *config, char *name)
+{
+	struct daemon_session *session = zalloc(sizeof(*session));
+
+	if (!session)
+		return NULL;
+
+	session->name = strdup(name);
+	if (!session->name) {
+		free(session);
+		return NULL;
+	}
+
+	session->pid = -1;
+	list_add_tail(&session->list, &config->sessions);
+	return session;
+}
+
+static struct daemon_session *daemon__find_session(struct daemon *daemon, char *name)
+{
+	struct daemon_session *session;
+
+	list_for_each_entry(session, &daemon->sessions, list) {
+		if (!strcmp(session->name, name))
+			return session;
+	}
+
+	return NULL;
+}
+
+static int get_session_name(const char *var, char *session, int len)
+{
+	const char *p = var + sizeof("session-") - 1;
+
+	while (*p != '.' && *p != 0x0 && len--)
+		*session++ = *p++;
+
+	*session = 0;
+	return *p == '.' ? 0 : -EINVAL;
+}
+
+static int session_config(struct daemon *daemon, const char *var, const char *value)
+{
+	struct daemon_session *session;
+	char name[100];
+
+	if (get_session_name(var, name, sizeof(name)))
+		return -EINVAL;
+
+	var = strchr(var, '.');
+	if (!var)
+		return -EINVAL;
+
+	var++;
+
+	session = daemon__find_session(daemon, name);
+
+	if (!session) {
+		/* New session is defined. */
+		session = daemon__add_session(daemon, name);
+		if (!session)
+			return -ENOMEM;
+
+		pr_debug("reconfig: found new session %s\n", name);
+
+		/* Trigger reconfig to start it. */
+		session->state = RECONFIG;
+	} else if (session->state == KILL) {
+		/* Current session is defined, no action needed. */
+		pr_debug("reconfig: found current session %s\n", name);
+		session->state = OK;
+	}
+
+	if (!strcmp(var, "run")) {
+		bool same = false;
+
+		if (session->run)
+			same = !strcmp(session->run, value);
+
+		if (!same) {
+			if (session->run) {
+				free(session->run);
+				pr_debug("reconfig: session %s is changed\n", name);
+			}
+
+			session->run = strdup(value);
+			if (!session->run)
+				return -ENOMEM;
+
+			/*
+			 * Either new or changed run value is defined,
+			 * trigger reconfig for the session.
+			 */
+			session->state = RECONFIG;
+		}
+	}
+
+	return 0;
+}
+
+static int server_config(const char *var, const char *value, void *cb)
+{
+	struct daemon *daemon = cb;
+
+	if (strstarts(var, "session-")) {
+		return session_config(daemon, var, value);
+	} else if (!strcmp(var, "daemon.base") && !daemon->base_user) {
+		if (daemon->base && strcmp(daemon->base, value)) {
+			pr_err("failed: can't redefine base, bailing out\n");
+			return -EINVAL;
+		}
+		daemon->base = strdup(value);
+		if (!daemon->base)
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static int client_config(const char *var, const char *value, void *cb)
+{
+	struct daemon *daemon = cb;
+
+	if (!strcmp(var, "daemon.base") && !daemon->base_user) {
+		daemon->base = strdup(value);
+		if (!daemon->base)
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static int check_base(struct daemon *daemon)
+{
+	struct stat st;
+
+	if (!daemon->base) {
+		pr_err("failed: base not defined\n");
+		return -EINVAL;
+	}
+
+	if (stat(daemon->base, &st)) {
+		switch (errno) {
+		case EACCES:
+			pr_err("failed: permission denied for '%s' base\n",
+			       daemon->base);
+			return -EACCES;
+		case ENOENT:
+			pr_err("failed: base '%s' does not exists\n",
+			       daemon->base);
+			return -EACCES;
+		default:
+			pr_err("failed: can't access base '%s': %s\n",
+			       daemon->base, strerror(errno));
+			return -errno;
+		}
+	}
+
+	if ((st.st_mode & S_IFMT) != S_IFDIR) {
+		pr_err("failed: base '%s' is not directory\n",
+		       daemon->base);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int setup_client_config(struct daemon *daemon)
+{
+	struct perf_config_set *set = perf_config_set__load_file(daemon->config_real);
+	int err = -ENOMEM;
+
+	if (set) {
+		err = perf_config_set(set, client_config, daemon);
+		perf_config_set__delete(set);
+	}
+
+	return err ?: check_base(daemon);
+}
+
+static int setup_server_config(struct daemon *daemon)
+{
+	struct perf_config_set *set;
+	struct daemon_session *session;
+	int err = -ENOMEM;
+
+	pr_debug("reconfig: started\n");
+
+	/*
+	 * Mark all sessions for kill, the server config
+	 * will set following states, see explanation at
+	 * enum daemon_session_state declaration.
+	 */
+	list_for_each_entry(session, &daemon->sessions, list)
+		session->state = KILL;
+
+	set = perf_config_set__load_file(daemon->config_real);
+	if (set) {
+		err = perf_config_set(set, server_config, daemon);
+		perf_config_set__delete(set);
+	}
+
+	return err ?: check_base(daemon);
+}
+
+static int daemon_session__run(struct daemon_session *session,
+			       struct daemon *daemon)
+{
+	char buf[PATH_MAX];
+	char **argv;
+	int argc, fd;
+
+	if (asprintf(&session->base, "%s/session-%s",
+		     daemon->base, session->name) < 0) {
+		perror("failed: asprintf");
+		return -1;
+	}
+
+	if (mkdir(session->base, 0755) && errno != EEXIST) {
+		perror("failed: mkdir");
+		return -1;
+	}
+
+	session->start = time(NULL);
+
+	session->pid = fork();
+	if (session->pid < 0)
+		return -1;
+	if (session->pid > 0) {
+		pr_info("reconfig: ruining session [%s:%d]: %s\n",
+			session->name, session->pid, session->run);
+		return 0;
+	}
+
+	if (chdir(session->base)) {
+		perror("failed: chdir");
+		return -1;
+	}
+
+	fd = open("/dev/null", O_RDONLY);
+	if (fd < 0) {
+		perror("failed: open /dev/null");
+		return -1;
+	}
+
+	dup2(fd, 0);
+	close(fd);
+
+	fd = open(SESSION_OUTPUT, O_RDWR|O_CREAT|O_TRUNC, 0644);
+	if (fd < 0) {
+		perror("failed: open session output");
+		return -1;
+	}
+
+	dup2(fd, 1);
+	dup2(fd, 2);
+	close(fd);
+
+	if (mkfifo(SESSION_CONTROL, O_RDWR) && errno != EEXIST) {
+		perror("failed: create control fifo");
+		return -1;
+	}
+
+	if (mkfifo(SESSION_ACK, O_RDWR) && errno != EEXIST) {
+		perror("failed: create ack fifo");
+		return -1;
+	}
+
+	scnprintf(buf, sizeof(buf), "%s record --control=fifo:%s,%s %s",
+		  daemon->perf, SESSION_CONTROL, SESSION_ACK, session->run);
+
+	argv = argv_split(buf, &argc);
+	if (!argv)
+		exit(-1);
+
+	exit(execve(daemon->perf, argv, NULL));
+	return -1;
+}
+
+static pid_t handle_signalfd(struct daemon *daemon)
+{
+	struct daemon_session *session;
+	struct signalfd_siginfo si;
+	ssize_t err;
+	int status;
+	pid_t pid;
+
+	err = read(daemon->signal_fd, &si, sizeof(struct signalfd_siginfo));
+	if (err != sizeof(struct signalfd_siginfo))
+		return -1;
+
+	list_for_each_entry(session, &daemon->sessions, list) {
+
+		if (session->pid != (int) si.ssi_pid)
+			continue;
+
+		pid = waitpid(session->pid, &status, 0);
+		if (pid == session->pid) {
+			if (WIFEXITED(status)) {
+				pr_info("session '%s' exited, status=%d\n",
+					session->name, WEXITSTATUS(status));
+			} else if (WIFSIGNALED(status)) {
+				pr_info("session '%s' killed (signal %d)\n",
+					session->name, WTERMSIG(status));
+			} else if (WIFSTOPPED(status)) {
+				pr_info("session '%s' stopped (signal %d)\n",
+					session->name, WSTOPSIG(status));
+			} else {
+				pr_info("session '%s' Unexpected status (0x%x)\n",
+					session->name, status);
+			}
+		}
+
+		session->state = KILL;
+		session->pid = -1;
+		return pid;
+	}
+
+	return 0;
+}
+
+static int daemon_session__wait(struct daemon_session *session, struct daemon *daemon,
+				int secs)
+{
+	struct pollfd pollfd = {
+		.fd	= daemon->signal_fd,
+		.events	= POLLIN,
+	};
+	pid_t wpid = 0, pid = session->pid;
+	time_t start;
+
+	start = time(NULL);
+
+	do {
+		int err = poll(&pollfd, 1, 1000);
+
+		if (err > 0) {
+			wpid = handle_signalfd(daemon);
+		} else if (err < 0) {
+			perror("failed: poll\n");
+			return -1;
+		}
+
+		if (start + secs < time(NULL))
+			return -1;
+	} while (wpid != pid);
+
+	return 0;
+}
+
+static bool daemon__has_alive_session(struct daemon *daemon)
+{
+	struct daemon_session *session;
+
+	list_for_each_entry(session, &daemon->sessions, list) {
+		if (session->pid != -1)
+			return true;
+	}
+
+	return false;
+}
+
+static int daemon__wait(struct daemon *daemon, int secs)
+{
+	struct pollfd pollfd = {
+		.fd	= daemon->signal_fd,
+		.events	= POLLIN,
+	};
+	time_t start;
+
+	start = time(NULL);
+
+	do {
+		int err = poll(&pollfd, 1, 1000);
+
+		if (err > 0) {
+			handle_signalfd(daemon);
+		} else if (err < 0) {
+			perror("failed: poll\n");
+			return -1;
+		}
+
+		if (start + secs < time(NULL))
+			return -1;
+	} while (daemon__has_alive_session(daemon));
+
+	return 0;
+}
+
+static int daemon_session__control(struct daemon_session *session,
+				   const char *msg, bool do_ack)
+{
+	struct pollfd pollfd = { .events = POLLIN, };
+	char control_path[PATH_MAX];
+	char ack_path[PATH_MAX];
+	int control, ack = -1, len;
+	char buf[20];
+	int ret = -1;
+	ssize_t err;
+
+	/* open the control file */
+	scnprintf(control_path, sizeof(control_path), "%s/%s",
+		  session->base, SESSION_CONTROL);
+
+	control = open(control_path, O_WRONLY|O_NONBLOCK);
+	if (!control)
+		return -1;
+
+	if (do_ack) {
+		/* open the ack file */
+		scnprintf(ack_path, sizeof(ack_path), "%s/%s",
+			  session->base, SESSION_ACK);
+
+		ack = open(ack_path, O_RDONLY, O_NONBLOCK);
+		if (!ack) {
+			close(control);
+			return -1;
+		}
+	}
+
+	/* write the command */
+	len = strlen(msg);
+
+	err = writen(control, msg, len);
+	if (err != len) {
+		pr_err("failed: write to control pipe: %d (%s)\n",
+		       errno, control_path);
+		goto out;
+	}
+
+	if (!do_ack)
+		goto out;
+
+	/* wait for an ack */
+	pollfd.fd = ack;
+
+	if (!poll(&pollfd, 1, 2000)) {
+		pr_err("failed: control ack timeout\n");
+		goto out;
+	}
+
+	if (!(pollfd.revents & POLLIN)) {
+		pr_err("failed: did not received an ack\n");
+		goto out;
+	}
+
+	err = read(ack, buf, sizeof(buf));
+	if (err > 0)
+		ret = strcmp(buf, "ack\n");
+	else
+		perror("failed: read ack %d\n");
+
+out:
+	if (ack != -1)
+		close(ack);
+
+	close(control);
+	return ret;
+}
+
+static int setup_server_socket(struct daemon *daemon)
+{
+	struct sockaddr_un addr;
+	char path[PATH_MAX];
+	int fd = socket(AF_UNIX, SOCK_STREAM, 0);
+
+	if (fd < 0) {
+		fprintf(stderr, "socket: %s\n", strerror(errno));
+		return -1;
+	}
+
+	if (fcntl(fd, F_SETFD, FD_CLOEXEC)) {
+		perror("failed: fcntl FD_CLOEXEC");
+		close(fd);
+		return -1;
+	}
+
+	scnprintf(path, sizeof(path), "%s/control", daemon->base);
+
+	if (strlen(path) + 1 >= sizeof(addr.sun_path)) {
+		pr_err("failed: control path too long '%s'\n", path);
+		close(fd);
+		return -1;
+	}
+
+	memset(&addr, 0, sizeof(addr));
+	addr.sun_family = AF_UNIX;
+
+	strlcpy(addr.sun_path, path, sizeof(addr.sun_path) - 1);
+	unlink(path);
+
+	if (bind(fd, (struct sockaddr *)&addr, sizeof(addr)) == -1) {
+		perror("failed: bind");
+		close(fd);
+		return -1;
+	}
+
+	if (listen(fd, 1) == -1) {
+		perror("failed: listen");
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+enum {
+	CMD_LIST   = 0,
+	CMD_SIGNAL = 1,
+	CMD_STOP   = 2,
+	CMD_PING   = 3,
+	CMD_MAX,
+};
+
+#define SESSION_MAX 64
+
+union cmd {
+	int cmd;
+
+	/* CMD_LIST */
+	struct {
+		int	cmd;
+		int	verbose;
+		char	csv_sep;
+	} list;
+
+	/* CMD_SIGNAL */
+	struct {
+		int	cmd;
+		int	sig;
+		char	name[SESSION_MAX];
+	} signal;
+
+	/* CMD_PING */
+	struct {
+		int	cmd;
+		char	name[SESSION_MAX];
+	} ping;
+};
+
+enum {
+	PING_OK	  = 0,
+	PING_FAIL = 1,
+	PING_MAX,
+};
+
+static int daemon_session__ping(struct daemon_session *session)
+{
+	return daemon_session__control(session, "ping", true) ?  PING_FAIL : PING_OK;
+}
+
+static int cmd_session_list(struct daemon *daemon, union cmd *cmd, FILE *out)
+{
+	char csv_sep = cmd->list.csv_sep;
+	struct daemon_session *session;
+	time_t curr = time(NULL);
+
+	if (csv_sep) {
+		fprintf(out, "%d%c%s%c%s%c%s/%s",
+			/* pid daemon  */
+			getpid(), csv_sep, "daemon",
+			/* base */
+			csv_sep, daemon->base,
+			/* output */
+			csv_sep, daemon->base, SESSION_OUTPUT);
+
+		fprintf(out, "%c%s/%s",
+			/* lock */
+			csv_sep, daemon->base, "lock");
+
+		fprintf(out, "%c%lu",
+			/* session up time */
+			csv_sep, (curr - daemon->start) / 60);
+
+		fprintf(out, "\n");
+	} else {
+		fprintf(out, "[%d:daemon] base: %s\n", getpid(), daemon->base);
+		if (cmd->list.verbose) {
+			fprintf(out, "  output:  %s/%s\n",
+				daemon->base, SESSION_OUTPUT);
+			fprintf(out, "  lock:    %s/lock\n",
+				daemon->base);
+			fprintf(out, "  up:      %lu minutes\n",
+				(curr - daemon->start) / 60);
+		}
+	}
+
+	list_for_each_entry(session, &daemon->sessions, list) {
+		if (csv_sep) {
+			fprintf(out, "%d%c%s%c%s",
+				/* pid */
+				session->pid,
+				/* name */
+				csv_sep, session->name,
+				/* base */
+				csv_sep, session->run);
+
+			fprintf(out, "%c%s%c%s/%s",
+				/* session dir */
+				csv_sep, session->base,
+				/* session output */
+				csv_sep, session->base, SESSION_OUTPUT);
+
+			fprintf(out, "%c%s/%s%c%s/%s",
+				/* session control */
+				csv_sep, session->base, SESSION_CONTROL,
+				/* session ack */
+				csv_sep, session->base, SESSION_ACK);
+
+			fprintf(out, "%c%lu",
+				/* session up time */
+				csv_sep, (curr - session->start) / 60);
+
+			fprintf(out, "\n");
+		} else {
+			fprintf(out, "[%d:%s] perf record %s\n",
+				session->pid, session->name, session->run);
+			if (!cmd->list.verbose)
+				continue;
+			fprintf(out, "  base:    %s\n",
+				session->base);
+			fprintf(out, "  output:  %s/%s\n",
+				session->base, SESSION_OUTPUT);
+			fprintf(out, "  control: %s/%s\n",
+				session->base, SESSION_CONTROL);
+			fprintf(out, "  ack:     %s/%s\n",
+				session->base, SESSION_ACK);
+			fprintf(out, "  up:      %lu minutes\n",
+				(curr - session->start) / 60);
+		}
+	}
+
+	return 0;
+}
+
+static int daemon_session__signal(struct daemon_session *session, int sig)
+{
+	if (session->pid < 0)
+		return -1;
+	return kill(session->pid, sig);
+}
+
+static int cmd_session_kill(struct daemon *daemon, union cmd *cmd, FILE *out)
+{
+	struct daemon_session *session;
+	bool all = false;
+
+	all = !strcmp(cmd->signal.name, "all");
+
+	list_for_each_entry(session, &daemon->sessions, list) {
+		if (all || !strcmp(cmd->signal.name, session->name)) {
+			daemon_session__signal(session, cmd->signal.sig);
+			fprintf(out, "signal %d sent to session '%s [%d]'\n",
+				cmd->signal.sig, session->name, session->pid);
+		}
+	}
+
+	return 0;
+}
+
+static const char *ping_str[PING_MAX] = {
+	[PING_OK]   = "OK",
+	[PING_FAIL] = "FAIL",
+};
+
+static int cmd_session_ping(struct daemon *daemon, union cmd *cmd, FILE *out)
+{
+	struct daemon_session *session;
+	bool all = false, found = false;
+
+	all = !strcmp(cmd->ping.name, "all");
+
+	list_for_each_entry(session, &daemon->sessions, list) {
+		if (all || !strcmp(cmd->ping.name, session->name)) {
+			int state = daemon_session__ping(session);
+
+			fprintf(out, "%-4s %s\n", ping_str[state], session->name);
+			found = true;
+		}
+	}
+
+	if (!found && !all) {
+		fprintf(out, "%-4s %s (not found)\n",
+			ping_str[PING_FAIL], cmd->ping.name);
+	}
+	return 0;
+}
+
+static int handle_server_socket(struct daemon *daemon, int sock_fd)
+{
+	int ret = -1, fd;
+	FILE *out = NULL;
+	union cmd cmd;
+
+	fd = accept(sock_fd, NULL, NULL);
+	if (fd < 0) {
+		perror("failed: accept");
+		return -1;
+	}
+
+	if (sizeof(cmd) != readn(fd, &cmd, sizeof(cmd))) {
+		perror("failed: read");
+		goto out;
+	}
+
+	out = fdopen(fd, "w");
+	if (!out) {
+		perror("failed: fdopen");
+		goto out;
+	}
+
+	switch (cmd.cmd) {
+	case CMD_LIST:
+		ret = cmd_session_list(daemon, &cmd, out);
+		break;
+	case CMD_SIGNAL:
+		ret = cmd_session_kill(daemon, &cmd, out);
+		break;
+	case CMD_STOP:
+		done = 1;
+		ret = 0;
+		pr_debug("perf daemon is exciting\n");
+		break;
+	case CMD_PING:
+		ret = cmd_session_ping(daemon, &cmd, out);
+		break;
+	default:
+		break;
+	}
+
+	fclose(out);
+out:
+	/* If out is defined, then fd is closed via fclose. */
+	if (!out)
+		close(fd);
+	return ret;
+}
+
+static int setup_client_socket(struct daemon *daemon)
+{
+	struct sockaddr_un addr;
+	char path[PATH_MAX];
+	int fd = socket(AF_UNIX, SOCK_STREAM, 0);
+
+	if (fd == -1) {
+		perror("failed: socket");
+		return -1;
+	}
+
+	scnprintf(path, sizeof(path), "%s/control", daemon->base);
+
+	if (strlen(path) + 1 >= sizeof(addr.sun_path)) {
+		pr_err("failed: control path too long '%s'\n", path);
+		close(fd);
+		return -1;
+	}
+
+	memset(&addr, 0, sizeof(addr));
+	addr.sun_family = AF_UNIX;
+	strlcpy(addr.sun_path, path, sizeof(addr.sun_path) - 1);
+
+	if (connect(fd, (struct sockaddr *) &addr, sizeof(addr)) == -1) {
+		perror("failed: connect");
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+static void daemon_session__kill(struct daemon_session *session,
+				 struct daemon *daemon)
+{
+	int how = 0;
+
+	do {
+		switch (how) {
+		case 0:
+			daemon_session__control(session, "stop", false);
+			break;
+		case 1:
+			daemon_session__signal(session, SIGTERM);
+			break;
+		case 2:
+			daemon_session__signal(session, SIGKILL);
+			break;
+		default:
+			break;
+		}
+		how++;
+
+	} while (daemon_session__wait(session, daemon, 10));
+}
+
+static void daemon__signal(struct daemon *daemon, int sig)
+{
+	struct daemon_session *session;
+
+	list_for_each_entry(session, &daemon->sessions, list)
+		daemon_session__signal(session, sig);
+}
+
+static void daemon_session__delete(struct daemon_session *session)
+{
+	free(session->base);
+	free(session->name);
+	free(session->run);
+	free(session);
+}
+
+static void daemon_session__remove(struct daemon_session *session)
+{
+	list_del(&session->list);
+	daemon_session__delete(session);
+}
+
+static void daemon__stop(struct daemon *daemon)
+{
+	struct daemon_session *session;
+
+	list_for_each_entry(session, &daemon->sessions, list)
+		daemon_session__control(session, "stop", false);
+}
+
+static void daemon__kill(struct daemon *daemon)
+{
+	int how = 0;
+
+	do {
+		switch (how) {
+		case 0:
+			daemon__stop(daemon);
+			break;
+		case 1:
+			daemon__signal(daemon, SIGTERM);
+			break;
+		case 2:
+			daemon__signal(daemon, SIGKILL);
+			break;
+		default:
+			break;
+		}
+		how++;
+
+	} while (daemon__wait(daemon, 10));
+}
+
+static void daemon__exit(struct daemon *daemon)
+{
+	struct daemon_session *session, *h;
+
+	list_for_each_entry_safe(session, h, &daemon->sessions, list)
+		daemon_session__remove(session);
+
+	free(daemon->config_real);
+	free(daemon->config_base);
+	free(daemon->base);
+}
+
+static int daemon__reconfig(struct daemon *daemon)
+{
+	struct daemon_session *session, *n;
+
+	list_for_each_entry_safe(session, n, &daemon->sessions, list) {
+		/* No change. */
+		if (session->state == OK)
+			continue;
+
+		/* Remove session. */
+		if (session->state == KILL) {
+			if (session->pid > 0) {
+				daemon_session__kill(session, daemon);
+				pr_info("reconfig: session '%s' killed\n", session->name);
+			}
+			daemon_session__remove(session);
+			continue;
+		}
+
+		/* Reconfig session. */
+		if (session->pid > 0) {
+			daemon_session__kill(session, daemon);
+			pr_info("reconfig: session '%s' killed\n", session->name);
+		}
+		if (daemon_session__run(session, daemon))
+			return -1;
+
+		session->state = OK;
+	}
+
+	return 0;
+}
+
+static int setup_config_changes(struct daemon *daemon)
+{
+	char *basen = strdup(daemon->config_real);
+	char *dirn  = strdup(daemon->config_real);
+	char *base, *dir;
+	int fd, wd = -1;
+
+	if (!dirn || !basen)
+		goto out;
+
+	fd = inotify_init1(IN_NONBLOCK|O_CLOEXEC);
+	if (fd < 0) {
+		perror("failed: inotify_init");
+		goto out;
+	}
+
+	dir = dirname(dirn);
+	base = basename(basen);
+	pr_debug("config file: %s, dir: %s\n", base, dir);
+
+	wd = inotify_add_watch(fd, dir, IN_CLOSE_WRITE);
+	if (wd >= 0) {
+		daemon->config_base = strdup(base);
+		if (!daemon->config_base) {
+			close(fd);
+			wd = -1;
+		}
+	} else {
+		perror("failed: inotify_add_watch");
+	}
+
+out:
+	free(basen);
+	free(dirn);
+	return wd < 0 ? -1 : fd;
+}
+
+static bool process_inotify_event(struct daemon *daemon, char *buf, ssize_t len)
+{
+	char *p = buf;
+
+	while (p < (buf + len)) {
+		struct inotify_event *event = (struct inotify_event *) p;
+
+		/*
+		 * We monitor config directory, check if our
+		 * config file was changes.
+		 */
+		if ((event->mask & IN_CLOSE_WRITE) &&
+		    !(event->mask & IN_ISDIR)) {
+			if (!strcmp(event->name, daemon->config_base))
+				return true;
+		}
+		p += sizeof(*event) + event->len;
+	}
+	return false;
+}
+
+static int handle_config_changes(struct daemon *daemon, int conf_fd,
+				 bool *config_changed)
+{
+	char buf[4096];
+	ssize_t len;
+
+	while (!(*config_changed)) {
+		len = read(conf_fd, buf, sizeof(buf));
+		if (len == -1) {
+			if (errno != EAGAIN) {
+				perror("failed: read");
+				return -1;
+			}
+			return 0;
+		}
+		*config_changed = process_inotify_event(daemon, buf, len);
+	}
+	return 0;
+}
+
+static int setup_config(struct daemon *daemon)
+{
+	if (daemon->base_user) {
+		daemon->base = strdup(daemon->base_user);
+		if (!daemon->base)
+			return -ENOMEM;
+	}
+
+	if (daemon->config) {
+		char *real = realpath(daemon->config, NULL);
+
+		if (!real) {
+			perror("failed: realpath");
+			return -1;
+		}
+		daemon->config_real = real;
+		return 0;
+	}
+
+	if (perf_config_system() && !access(perf_etc_perfconfig(), R_OK))
+		daemon->config_real = strdup(perf_etc_perfconfig());
+	else if (perf_config_global() && perf_home_perfconfig())
+		daemon->config_real = strdup(perf_home_perfconfig());
+
+	return daemon->config_real ? 0 : -1;
+}
+
+#ifndef F_TLOCK
+#define F_TLOCK 2
+
+#include <sys/file.h>
+
+static int lockf(int fd, int cmd, off_t len)
+{
+	if (cmd != F_TLOCK || len != 0)
+		return -1;
+
+	return flock(fd, LOCK_EX | LOCK_NB);
+}
+#endif // F_TLOCK
+
+/*
+ * Each daemon tries to create and lock BASE/lock file,
+ * if it's successful we are sure we're the only daemon
+ * running over the BASE.
+ *
+ * Once daemon is finished, file descriptor to lock file
+ * is closed and lock is released.
+ */
+static int check_lock(struct daemon *daemon)
+{
+	char path[PATH_MAX];
+	char buf[20];
+	int fd, pid;
+	ssize_t len;
+
+	scnprintf(path, sizeof(path), "%s/lock", daemon->base);
+
+	fd = open(path, O_RDWR|O_CREAT|O_CLOEXEC, 0640);
+	if (fd < 0)
+		return -1;
+
+	if (lockf(fd, F_TLOCK, 0) < 0) {
+		filename__read_int(path, &pid);
+		fprintf(stderr, "failed: another perf daemon (pid %d) owns %s\n",
+			pid, daemon->base);
+		close(fd);
+		return -1;
+	}
+
+	scnprintf(buf, sizeof(buf), "%d", getpid());
+	len = strlen(buf);
+
+	if (write(fd, buf, len) != len) {
+		perror("failed: write");
+		close(fd);
+		return -1;
+	}
+
+	if (ftruncate(fd, len)) {
+		perror("failed: ftruncate");
+		close(fd);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int go_background(struct daemon *daemon)
+{
+	int pid, fd;
+
+	pid = fork();
+	if (pid < 0)
+		return -1;
+
+	if (pid > 0)
+		return 1;
+
+	if (setsid() < 0)
+		return -1;
+
+	if (check_lock(daemon))
+		return -1;
+
+	umask(0);
+
+	if (chdir(daemon->base)) {
+		perror("failed: chdir");
+		return -1;
+	}
+
+	fd = open("output", O_RDWR|O_CREAT|O_TRUNC, 0644);
+	if (fd < 0) {
+		perror("failed: open");
+		return -1;
+	}
+
+	if (fcntl(fd, F_SETFD, FD_CLOEXEC)) {
+		perror("failed: fcntl FD_CLOEXEC");
+		close(fd);
+		return -1;
+	}
+
+	close(0);
+	dup2(fd, 1);
+	dup2(fd, 2);
+	close(fd);
+
+	daemon->out = fdopen(1, "w");
+	if (!daemon->out) {
+		close(1);
+		close(2);
+		return -1;
+	}
+
+	setbuf(daemon->out, NULL);
+	return 0;
+}
+
+static int setup_signalfd(struct daemon *daemon)
+{
+	sigset_t mask;
+
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGCHLD);
+
+	if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1)
+		return -1;
+
+	daemon->signal_fd = signalfd(-1, &mask, SFD_NONBLOCK|SFD_CLOEXEC);
+	return daemon->signal_fd;
+}
+
+static int __cmd_start(struct daemon *daemon, struct option parent_options[],
+		       int argc, const char **argv)
+{
+	bool foreground = false;
+	struct option start_options[] = {
+		OPT_BOOLEAN('f', "foreground", &foreground, "stay on console"),
+		OPT_PARENT(parent_options),
+		OPT_END()
+	};
+	int sock_fd = -1, conf_fd = -1, signal_fd = -1;
+	int sock_pos, file_pos, signal_pos;
+	struct fdarray fda;
+	int err = 0;
+
+	argc = parse_options(argc, argv, start_options, daemon_usage, 0);
+	if (argc)
+		usage_with_options(daemon_usage, start_options);
+
+	daemon->start = time(NULL);
+
+	if (setup_config(daemon)) {
+		pr_err("failed: config not found\n");
+		return -1;
+	}
+
+	if (setup_server_config(daemon))
+		return -1;
+
+	if (foreground && check_lock(daemon))
+		return -1;
+
+	if (!foreground) {
+		err = go_background(daemon);
+		if (err) {
+			/* original process, exit normally */
+			if (err == 1)
+				err = 0;
+			daemon__exit(daemon);
+			return err;
+		}
+	}
+
+	debug_set_file(daemon->out);
+	debug_set_display_time(true);
+
+	pr_info("daemon started (pid %d)\n", getpid());
+
+	fdarray__init(&fda, 3);
+
+	sock_fd = setup_server_socket(daemon);
+	if (sock_fd < 0)
+		goto out;
+
+	conf_fd = setup_config_changes(daemon);
+	if (conf_fd < 0)
+		goto out;
+
+	signal_fd = setup_signalfd(daemon);
+	if (signal_fd < 0)
+		goto out;
+
+	sock_pos = fdarray__add(&fda, sock_fd, POLLIN|POLLERR|POLLHUP, 0);
+	if (sock_pos < 0)
+		goto out;
+
+	file_pos = fdarray__add(&fda, conf_fd, POLLIN|POLLERR|POLLHUP, 0);
+	if (file_pos < 0)
+		goto out;
+
+	signal_pos = fdarray__add(&fda, signal_fd, POLLIN|POLLERR|POLLHUP, 0);
+	if (signal_pos < 0)
+		goto out;
+
+	signal(SIGINT, sig_handler);
+	signal(SIGTERM, sig_handler);
+	signal(SIGPIPE, SIG_IGN);
+
+	while (!done && !err) {
+		err = daemon__reconfig(daemon);
+
+		if (!err && fdarray__poll(&fda, -1)) {
+			bool reconfig = false;
+
+			if (fda.entries[sock_pos].revents & POLLIN)
+				err = handle_server_socket(daemon, sock_fd);
+			if (fda.entries[file_pos].revents & POLLIN)
+				err = handle_config_changes(daemon, conf_fd, &reconfig);
+			if (fda.entries[signal_pos].revents & POLLIN)
+				err = handle_signalfd(daemon) < 0;
+
+			if (reconfig)
+				err = setup_server_config(daemon);
+		}
+	}
+
+out:
+	fdarray__exit(&fda);
+
+	daemon__kill(daemon);
+	daemon__exit(daemon);
+
+	if (sock_fd != -1)
+		close(sock_fd);
+	if (conf_fd != -1)
+		close(conf_fd);
+	if (conf_fd != -1)
+		close(signal_fd);
+
+	pr_info("daemon exited\n");
+	fclose(daemon->out);
+	return err;
+}
+
+static int send_cmd(struct daemon *daemon, union cmd *cmd)
+{
+	int ret = -1, fd;
+	char *line = NULL;
+	size_t len = 0;
+	ssize_t nread;
+	FILE *in = NULL;
+
+	if (setup_client_config(daemon))
+		return -1;
+
+	fd = setup_client_socket(daemon);
+	if (fd < 0)
+		return -1;
+
+	if (sizeof(*cmd) != writen(fd, cmd, sizeof(*cmd))) {
+		perror("failed: write");
+		goto out;
+	}
+
+	in = fdopen(fd, "r");
+	if (!in) {
+		perror("failed: fdopen");
+		goto out;
+	}
+
+	while ((nread = getline(&line, &len, in)) != -1) {
+		if (fwrite(line, nread, 1, stdout) != 1)
+			goto out_fclose;
+		fflush(stdout);
+	}
+
+	ret = 0;
+out_fclose:
+	fclose(in);
+	free(line);
+out:
+	/* If in is defined, then fd is closed via fclose. */
+	if (!in)
+		close(fd);
+	return ret;
+}
+
+static int send_cmd_list(struct daemon *daemon)
+{
+	union cmd cmd = { .cmd = CMD_LIST, };
+
+	cmd.list.verbose = verbose;
+	cmd.list.csv_sep = daemon->csv_sep ? *daemon->csv_sep : 0;
+
+	return send_cmd(daemon, &cmd);
+}
+
+static int __cmd_signal(struct daemon *daemon, struct option parent_options[],
+			int argc, const char **argv)
+{
+	const char *name = "all";
+	struct option start_options[] = {
+		OPT_STRING(0, "session", &name, "session",
+			"Sent signal to specific session"),
+		OPT_PARENT(parent_options),
+		OPT_END()
+	};
+	union cmd cmd;
+
+	argc = parse_options(argc, argv, start_options, daemon_usage, 0);
+	if (argc)
+		usage_with_options(daemon_usage, start_options);
+
+	if (setup_config(daemon)) {
+		pr_err("failed: config not found\n");
+		return -1;
+	}
+
+	cmd.signal.cmd = CMD_SIGNAL,
+	cmd.signal.sig = SIGUSR2;
+	strncpy(cmd.signal.name, name, sizeof(cmd.signal.name) - 1);
+
+	return send_cmd(daemon, &cmd);
+}
+
+static int __cmd_stop(struct daemon *daemon, struct option parent_options[],
+			int argc, const char **argv)
+{
+	struct option start_options[] = {
+		OPT_PARENT(parent_options),
+		OPT_END()
+	};
+	union cmd cmd = { .cmd = CMD_STOP, };
+
+	argc = parse_options(argc, argv, start_options, daemon_usage, 0);
+	if (argc)
+		usage_with_options(daemon_usage, start_options);
+
+	if (setup_config(daemon)) {
+		pr_err("failed: config not found\n");
+		return -1;
+	}
+
+	return send_cmd(daemon, &cmd);
+}
+
+static int __cmd_ping(struct daemon *daemon, struct option parent_options[],
+		      int argc, const char **argv)
+{
+	const char *name = "all";
+	struct option ping_options[] = {
+		OPT_STRING(0, "session", &name, "session",
+			"Ping to specific session"),
+		OPT_PARENT(parent_options),
+		OPT_END()
+	};
+	union cmd cmd = { .cmd = CMD_PING, };
+
+	argc = parse_options(argc, argv, ping_options, daemon_usage, 0);
+	if (argc)
+		usage_with_options(daemon_usage, ping_options);
+
+	if (setup_config(daemon)) {
+		pr_err("failed: config not found\n");
+		return -1;
+	}
+
+	scnprintf(cmd.ping.name, sizeof(cmd.ping.name), "%s", name);
+	return send_cmd(daemon, &cmd);
+}
+
+int cmd_daemon(int argc, const char **argv)
+{
+	struct option daemon_options[] = {
+		OPT_INCR('v', "verbose", &verbose, "be more verbose"),
+		OPT_STRING(0, "config", &__daemon.config,
+			"config file", "config file path"),
+		OPT_STRING(0, "base", &__daemon.base_user,
+			"directory", "base directory"),
+		OPT_STRING_OPTARG('x', "field-separator", &__daemon.csv_sep,
+			"field separator", "print counts with custom separator", ","),
+		OPT_END()
+	};
+
+	perf_exe(__daemon.perf, sizeof(__daemon.perf));
+	__daemon.out = stdout;
+
+	argc = parse_options(argc, argv, daemon_options, daemon_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+
+	if (argc) {
+		if (!strcmp(argv[0], "start"))
+			return __cmd_start(&__daemon, daemon_options, argc, argv);
+		if (!strcmp(argv[0], "signal"))
+			return __cmd_signal(&__daemon, daemon_options, argc, argv);
+		else if (!strcmp(argv[0], "stop"))
+			return __cmd_stop(&__daemon, daemon_options, argc, argv);
+		else if (!strcmp(argv[0], "ping"))
+			return __cmd_ping(&__daemon, daemon_options, argc, argv);
+
+		pr_err("failed: unknown command '%s'\n", argv[0]);
+		return -1;
+	}
+
+	if (setup_config(&__daemon)) {
+		pr_err("failed: config not found\n");
+		return -1;
+	}
+
+	return send_cmd_list(&__daemon);
+}
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 43937f4..6fe44d9 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -313,7 +313,7 @@
 	 * if jit marker, then inject jit mmaps and generate ELF images
 	 */
 	ret = jit_process(inject->session, &inject->output, machine,
-			  event->mmap.filename, event->mmap.pid, &n);
+			  event->mmap.filename, event->mmap.pid, event->mmap.tid, &n);
 	if (ret < 0)
 		return ret;
 	if (ret) {
@@ -413,7 +413,7 @@
 	 * if jit marker, then inject jit mmaps and generate ELF images
 	 */
 	ret = jit_process(inject->session, &inject->output, machine,
-			  event->mmap2.filename, event->mmap2.pid, &n);
+			  event->mmap2.filename, event->mmap2.pid, event->mmap2.tid, &n);
 	if (ret < 0)
 		return ret;
 	if (ret) {
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 8237420..cdd2b9f 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -30,6 +30,7 @@
 	bool			dump_raw;
 	bool			force;
 	bool			phys_addr;
+	bool			data_page_size;
 	int			operation;
 	const char		*cpu_list;
 	DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
@@ -124,6 +125,9 @@
 	if (mem->phys_addr)
 		rec_argv[i++] = "--phys-data";
 
+	if (mem->data_page_size)
+		rec_argv[i++] = "--data-page-size";
+
 	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
 		e = perf_mem_events__ptr(j);
 		if (!e->record)
@@ -172,7 +176,8 @@
 {
 	struct perf_mem *mem = container_of(tool, struct perf_mem, tool);
 	struct addr_location al;
-	const char *fmt;
+	const char *fmt, *field_sep;
+	char str[PAGE_SIZE_NAME_LEN];
 
 	if (machine__resolve(machine, &al, sample) < 0) {
 		fprintf(stderr, "problem processing %d event, skipping it.\n",
@@ -186,60 +191,47 @@
 	if (al.map != NULL)
 		al.map->dso->hit = 1;
 
-	if (mem->phys_addr) {
-		if (symbol_conf.field_sep) {
-			fmt = "%d%s%d%s0x%"PRIx64"%s0x%"PRIx64"%s0x%016"PRIx64
-			      "%s%"PRIu64"%s0x%"PRIx64"%s%s:%s\n";
-		} else {
-			fmt = "%5d%s%5d%s0x%016"PRIx64"%s0x016%"PRIx64
-			      "%s0x%016"PRIx64"%s%5"PRIu64"%s0x%06"PRIx64
-			      "%s%s:%s\n";
-			symbol_conf.field_sep = " ";
-		}
-
-		printf(fmt,
-			sample->pid,
-			symbol_conf.field_sep,
-			sample->tid,
-			symbol_conf.field_sep,
-			sample->ip,
-			symbol_conf.field_sep,
-			sample->addr,
-			symbol_conf.field_sep,
-			sample->phys_addr,
-			symbol_conf.field_sep,
-			sample->weight,
-			symbol_conf.field_sep,
-			sample->data_src,
-			symbol_conf.field_sep,
-			al.map ? (al.map->dso ? al.map->dso->long_name : "???") : "???",
-			al.sym ? al.sym->name : "???");
+	field_sep = symbol_conf.field_sep;
+	if (field_sep) {
+		fmt = "%d%s%d%s0x%"PRIx64"%s0x%"PRIx64"%s";
 	} else {
-		if (symbol_conf.field_sep) {
-			fmt = "%d%s%d%s0x%"PRIx64"%s0x%"PRIx64"%s%"PRIu64
-			      "%s0x%"PRIx64"%s%s:%s\n";
-		} else {
-			fmt = "%5d%s%5d%s0x%016"PRIx64"%s0x016%"PRIx64
-			      "%s%5"PRIu64"%s0x%06"PRIx64"%s%s:%s\n";
-			symbol_conf.field_sep = " ";
-		}
-
-		printf(fmt,
-			sample->pid,
-			symbol_conf.field_sep,
-			sample->tid,
-			symbol_conf.field_sep,
-			sample->ip,
-			symbol_conf.field_sep,
-			sample->addr,
-			symbol_conf.field_sep,
-			sample->weight,
-			symbol_conf.field_sep,
-			sample->data_src,
-			symbol_conf.field_sep,
-			al.map ? (al.map->dso ? al.map->dso->long_name : "???") : "???",
-			al.sym ? al.sym->name : "???");
+		fmt = "%5d%s%5d%s0x%016"PRIx64"%s0x016%"PRIx64"%s";
+		symbol_conf.field_sep = " ";
 	}
+	printf(fmt,
+		sample->pid,
+		symbol_conf.field_sep,
+		sample->tid,
+		symbol_conf.field_sep,
+		sample->ip,
+		symbol_conf.field_sep,
+		sample->addr,
+		symbol_conf.field_sep);
+
+	if (mem->phys_addr) {
+		printf("0x%016"PRIx64"%s",
+			sample->phys_addr,
+			symbol_conf.field_sep);
+	}
+
+	if (mem->data_page_size) {
+		printf("%s%s",
+			get_page_size_name(sample->data_page_size, str),
+			symbol_conf.field_sep);
+	}
+
+	if (field_sep)
+		fmt = "%"PRIu64"%s0x%"PRIx64"%s%s:%s\n";
+	else
+		fmt = "%5"PRIu64"%s0x%06"PRIx64"%s%s:%s\n";
+
+	printf(fmt,
+		sample->weight,
+		symbol_conf.field_sep,
+		sample->data_src,
+		symbol_conf.field_sep,
+		al.map ? (al.map->dso ? al.map->dso->long_name : "???") : "???",
+		al.sym ? al.sym->name : "???");
 out_put:
 	addr_location__put(&al);
 	return 0;
@@ -287,10 +279,15 @@
 	if (ret < 0)
 		goto out_delete;
 
+	printf("# PID, TID, IP, ADDR, ");
+
 	if (mem->phys_addr)
-		printf("# PID, TID, IP, ADDR, PHYS ADDR, LOCAL WEIGHT, DSRC, SYMBOL\n");
-	else
-		printf("# PID, TID, IP, ADDR, LOCAL WEIGHT, DSRC, SYMBOL\n");
+		printf("PHYS ADDR, ");
+
+	if (mem->data_page_size)
+		printf("DATA PAGE SIZE, ");
+
+	printf("LOCAL WEIGHT, DSRC, SYMBOL\n");
 
 	ret = perf_session__process_events(session);
 
@@ -300,7 +297,7 @@
 }
 static char *get_sort_order(struct perf_mem *mem)
 {
-	bool has_extra_options = mem->phys_addr ? true : false;
+	bool has_extra_options = (mem->phys_addr | mem->data_page_size) ? true : false;
 	char sort[128];
 
 	/*
@@ -312,13 +309,16 @@
 			     "dso_daddr,tlb,locked");
 	} else if (has_extra_options) {
 		strcpy(sort, "--sort=local_weight,mem,sym,dso,symbol_daddr,"
-			     "dso_daddr,snoop,tlb,locked");
+			     "dso_daddr,snoop,tlb,locked,blocked");
 	} else
 		return NULL;
 
 	if (mem->phys_addr)
 		strcat(sort, ",phys_daddr");
 
+	if (mem->data_page_size)
+		strcat(sort, ",data_page_size");
+
 	return strdup(sort);
 }
 
@@ -464,6 +464,7 @@
 		   " between columns '.' is reserved."),
 	OPT_BOOLEAN('f', "force", &mem.force, "don't complain, do it"),
 	OPT_BOOLEAN('p', "phys-data", &mem.phys_addr, "Record/Report sample physical addresses"),
+	OPT_BOOLEAN(0, "data-page-size", &mem.data_page_size, "Record/Report sample data address page size"),
 	OPT_END()
 	};
 	const char *const mem_subcommands[] = { "record", "report", NULL };
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index fd39116..35465d1 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -102,6 +102,7 @@
 	bool			no_buildid_cache;
 	bool			no_buildid_cache_set;
 	bool			buildid_all;
+	bool			buildid_mmap;
 	bool			timestamp_filename;
 	bool			timestamp_boundary;
 	struct switch_output	switch_output;
@@ -730,6 +731,8 @@
 	if (err)
 		return err;
 
+	auxtrace_regroup_aux_output(rec->evlist);
+
 	return auxtrace_parse_filters(rec->evlist);
 }
 
@@ -1663,7 +1666,7 @@
 		status = -1;
 		goto out_delete_session;
 	}
-	err = evlist__add_pollfd(rec->evlist, done_fd);
+	err = evlist__add_wakeup_eventfd(rec->evlist, done_fd);
 	if (err < 0) {
 		pr_err("Failed to add wakeup eventfd to poll list\n");
 		status = err;
@@ -1937,18 +1940,19 @@
 
 		if (evlist__ctlfd_process(rec->evlist, &cmd) > 0) {
 			switch (cmd) {
-			case EVLIST_CTL_CMD_ENABLE:
-				pr_info(EVLIST_ENABLED_MSG);
-				break;
-			case EVLIST_CTL_CMD_DISABLE:
-				pr_info(EVLIST_DISABLED_MSG);
-				break;
 			case EVLIST_CTL_CMD_SNAPSHOT:
 				hit_auxtrace_snapshot_trigger(rec);
 				evlist__ctlfd_ack(rec->evlist);
 				break;
+			case EVLIST_CTL_CMD_STOP:
+				done = 1;
+				break;
 			case EVLIST_CTL_CMD_ACK:
 			case EVLIST_CTL_CMD_UNSUPPORTED:
+			case EVLIST_CTL_CMD_ENABLE:
+			case EVLIST_CTL_CMD_DISABLE:
+			case EVLIST_CTL_CMD_EVLIST:
+			case EVLIST_CTL_CMD_PING:
 			default:
 				break;
 			}
@@ -2135,6 +2139,8 @@
 			rec->no_buildid_cache = true;
 		else if (!strcmp(value, "skip"))
 			rec->no_buildid = true;
+		else if (!strcmp(value, "mmap"))
+			rec->buildid_mmap = true;
 		else
 			return -1;
 		return 0;
@@ -2474,6 +2480,8 @@
 		    "Record the sample physical addresses"),
 	OPT_BOOLEAN(0, "data-page-size", &record.opts.sample_data_page_size,
 		    "Record the sampled data address data page size"),
+	OPT_BOOLEAN(0, "code-page-size", &record.opts.sample_code_page_size,
+		    "Record the sampled code address (ip) page size"),
 	OPT_BOOLEAN(0, "sample-cpu", &record.opts.sample_cpu, "Record the sample cpu"),
 	OPT_BOOLEAN_SET('T', "timestamp", &record.opts.sample_time,
 			&record.opts.sample_time_set,
@@ -2552,6 +2560,8 @@
 		   "file", "vmlinux pathname"),
 	OPT_BOOLEAN(0, "buildid-all", &record.buildid_all,
 		    "Record build-id of all DSOs regardless of hits"),
+	OPT_BOOLEAN(0, "buildid-mmap", &record.buildid_mmap,
+		    "Record build-id in map events"),
 	OPT_BOOLEAN(0, "timestamp-filename", &record.timestamp_filename,
 		    "append timestamp to output filename"),
 	OPT_BOOLEAN(0, "timestamp-boundary", &record.timestamp_boundary,
@@ -2655,6 +2665,21 @@
 
 	}
 
+	if (rec->buildid_mmap) {
+		if (!perf_can_record_build_id()) {
+			pr_err("Failed: no support to record build id in mmap events, update your kernel.\n");
+			err = -EINVAL;
+			goto out_opts;
+		}
+		pr_debug("Enabling build id in mmap2 events.\n");
+		/* Enable mmap build id synthesizing. */
+		symbol_conf.buildid_mmap2 = true;
+		/* Enable perf_event_attr::build_id bit. */
+		rec->opts.build_id = true;
+		/* Disable build id cache. */
+		rec->no_buildid = true;
+	}
+
 	if (rec->opts.kcore)
 		rec->data.is_dir = true;
 
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 42dad4a..5915f19 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -117,6 +117,7 @@
 	PERF_OUTPUT_IPC             = 1ULL << 31,
 	PERF_OUTPUT_TOD             = 1ULL << 32,
 	PERF_OUTPUT_DATA_PAGE_SIZE  = 1ULL << 33,
+	PERF_OUTPUT_CODE_PAGE_SIZE  = 1ULL << 34,
 };
 
 struct perf_script {
@@ -182,6 +183,7 @@
 	{.str = "ipc", .field = PERF_OUTPUT_IPC},
 	{.str = "tod", .field = PERF_OUTPUT_TOD},
 	{.str = "data_page_size", .field = PERF_OUTPUT_DATA_PAGE_SIZE},
+	{.str = "code_page_size", .field = PERF_OUTPUT_CODE_PAGE_SIZE},
 };
 
 enum {
@@ -256,7 +258,7 @@
 			      PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD |
 			      PERF_OUTPUT_ADDR | PERF_OUTPUT_DATA_SRC |
 			      PERF_OUTPUT_WEIGHT | PERF_OUTPUT_PHYS_ADDR |
-			      PERF_OUTPUT_DATA_PAGE_SIZE,
+			      PERF_OUTPUT_DATA_PAGE_SIZE | PERF_OUTPUT_CODE_PAGE_SIZE,
 
 		.invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
 	},
@@ -523,6 +525,10 @@
 	    evsel__check_stype(evsel, PERF_SAMPLE_DATA_PAGE_SIZE, "DATA_PAGE_SIZE", PERF_OUTPUT_DATA_PAGE_SIZE))
 		return -EINVAL;
 
+	if (PRINT_FIELD(CODE_PAGE_SIZE) &&
+	    evsel__check_stype(evsel, PERF_SAMPLE_CODE_PAGE_SIZE, "CODE_PAGE_SIZE", PERF_OUTPUT_CODE_PAGE_SIZE))
+		return -EINVAL;
+
 	return 0;
 }
 
@@ -1531,6 +1537,8 @@
 	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TX_ABORT, "tx abrt"},
 	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TRACE_BEGIN, "tr strt"},
 	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TRACE_END, "tr end"},
+	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_VMENTRY, "vmentry"},
+	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_VMEXIT, "vmexit"},
 	{0, NULL}
 };
 
@@ -1760,6 +1768,18 @@
 	return len + perf_sample__fprintf_pt_spacing(len, fp);
 }
 
+static int perf_sample__fprintf_synth_psb(struct perf_sample *sample, FILE *fp)
+{
+	struct perf_synth_intel_psb *data = perf_sample__synth_ptr(sample);
+	int len;
+
+	if (perf_sample__bad_synth_size(sample, *data))
+		return 0;
+
+	len = fprintf(fp, " psb offs: %#" PRIx64, data->offset);
+	return len + perf_sample__fprintf_pt_spacing(len, fp);
+}
+
 static int perf_sample__fprintf_synth(struct perf_sample *sample,
 				      struct evsel *evsel, FILE *fp)
 {
@@ -1776,6 +1796,8 @@
 		return perf_sample__fprintf_synth_pwrx(sample, fp);
 	case PERF_SYNTH_INTEL_CBR:
 		return perf_sample__fprintf_synth_cbr(sample, fp);
+	case PERF_SYNTH_INTEL_PSB:
+		return perf_sample__fprintf_synth_psb(sample, fp);
 	default:
 		break;
 	}
@@ -2036,6 +2058,9 @@
 	if (PRINT_FIELD(DATA_PAGE_SIZE))
 		fprintf(fp, " %s", get_page_size_name(sample->data_page_size, str));
 
+	if (PRINT_FIELD(CODE_PAGE_SIZE))
+		fprintf(fp, " %s", get_page_size_name(sample->code_page_size, str));
+
 	perf_sample__fprintf_ipc(sample, attr, fp);
 
 	fprintf(fp, "\n");
@@ -2786,7 +2811,7 @@
 				break;
 		}
 		if (i == imax && strcmp(tok, "flags") == 0) {
-			print_flags = change == REMOVE ? false : true;
+			print_flags = change != REMOVE;
 			continue;
 		}
 		if (i == imax) {
@@ -3234,7 +3259,7 @@
 
 static bool is_top_script(const char *script_path)
 {
-	return ends_with(script_path, "top") == NULL ? false : true;
+	return ends_with(script_path, "top") != NULL;
 }
 
 static int has_required_arg(char *script_path)
@@ -3535,12 +3560,16 @@
 		     "addr,symoff,srcline,period,iregs,uregs,brstack,"
 		     "brstacksym,flags,bpf-output,brstackinsn,brstackoff,"
 		     "callindent,insn,insnlen,synth,phys_addr,metric,misc,ipc,tod,"
-		     "data_page_size",
+		     "data_page_size,code_page_size",
 		     parse_output_fields),
 	OPT_BOOLEAN('a', "all-cpus", &system_wide,
 		    "system-wide collection from all CPUs"),
+	OPT_STRING(0, "dsos", &symbol_conf.dso_list_str, "dso[,dso...]",
+		   "only consider symbols in these DSOs"),
 	OPT_STRING('S', "symbols", &symbol_conf.sym_list_str, "symbol[,symbol...]",
 		   "only consider these symbols"),
+	OPT_INTEGER(0, "addr-range", &symbol_conf.addr_range,
+		    "Use with -S to list traced records within address range"),
 	OPT_CALLBACK_OPTARG(0, "insn-trace", &itrace_synth_opts, NULL, NULL,
 			"Decode instructions from itrace", parse_insn_trace),
 	OPT_CALLBACK_OPTARG(0, "xed", NULL, NULL, NULL,
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 8cc2496..2e2e4a83 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -67,6 +67,7 @@
 #include "util/top.h"
 #include "util/affinity.h"
 #include "util/pfm.h"
+#include "util/bpf_counter.h"
 #include "asm/bug.h"
 
 #include <linux/time64.h>
@@ -137,6 +138,19 @@
 	NULL,
 };
 
+static const char *topdown_metric_L2_attrs[] = {
+	"slots",
+	"topdown-retiring",
+	"topdown-bad-spec",
+	"topdown-fe-bound",
+	"topdown-be-bound",
+	"topdown-heavy-ops",
+	"topdown-br-mispredict",
+	"topdown-fetch-lat",
+	"topdown-mem-bound",
+	NULL,
+};
+
 static const char *smi_cost_attrs = {
 	"{"
 	"msr/aperf/,"
@@ -409,12 +423,32 @@
 	return 0;
 }
 
+static int read_bpf_map_counters(void)
+{
+	struct evsel *counter;
+	int err;
+
+	evlist__for_each_entry(evsel_list, counter) {
+		err = bpf_counter__read(counter);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
 static void read_counters(struct timespec *rs)
 {
 	struct evsel *counter;
+	int err;
 
-	if (!stat_config.stop_read_counter && (read_affinity_counters(rs) < 0))
-		return;
+	if (!stat_config.stop_read_counter) {
+		if (target__has_bpf(&target))
+			err = read_bpf_map_counters();
+		else
+			err = read_affinity_counters(rs);
+		if (err < 0)
+			return;
+	}
 
 	evlist__for_each_entry(evsel_list, counter) {
 		if (counter->err)
@@ -496,11 +530,22 @@
 	return false;
 }
 
-static void enable_counters(void)
+static int enable_counters(void)
 {
+	struct evsel *evsel;
+	int err;
+
+	if (target__has_bpf(&target)) {
+		evlist__for_each_entry(evsel_list, evsel) {
+			err = bpf_counter__enable(evsel);
+			if (err)
+				return err;
+		}
+	}
+
 	if (stat_config.initial_delay < 0) {
 		pr_info(EVLIST_DISABLED_MSG);
-		return;
+		return 0;
 	}
 
 	if (stat_config.initial_delay > 0) {
@@ -518,6 +563,7 @@
 		if (stat_config.initial_delay > 0)
 			pr_info(EVLIST_ENABLED_MSG);
 	}
+	return 0;
 }
 
 static void disable_counters(void)
@@ -578,18 +624,19 @@
 	if (evlist__ctlfd_process(evlist, &cmd) > 0) {
 		switch (cmd) {
 		case EVLIST_CTL_CMD_ENABLE:
-			pr_info(EVLIST_ENABLED_MSG);
 			if (interval)
 				process_interval();
 			break;
 		case EVLIST_CTL_CMD_DISABLE:
 			if (interval)
 				process_interval();
-			pr_info(EVLIST_DISABLED_MSG);
 			break;
 		case EVLIST_CTL_CMD_SNAPSHOT:
 		case EVLIST_CTL_CMD_ACK:
 		case EVLIST_CTL_CMD_UNSUPPORTED:
+		case EVLIST_CTL_CMD_EVLIST:
+		case EVLIST_CTL_CMD_STOP:
+		case EVLIST_CTL_CMD_PING:
 		default:
 			break;
 		}
@@ -720,7 +767,7 @@
 	const bool forks = (argc > 0);
 	bool is_pipe = STAT_RECORD ? perf_stat.data.is_pipe : false;
 	struct affinity affinity;
-	int i, cpu;
+	int i, cpu, err;
 	bool second_pass = false;
 
 	if (forks) {
@@ -737,6 +784,13 @@
 	if (affinity__setup(&affinity) < 0)
 		return -1;
 
+	if (target__has_bpf(&target)) {
+		evlist__for_each_entry(evsel_list, counter) {
+			if (bpf_counter__load(counter, &target))
+				return -1;
+		}
+	}
+
 	evlist__for_each_cpu (evsel_list, i, cpu) {
 		affinity__set(&affinity, cpu);
 
@@ -850,7 +904,7 @@
 	}
 
 	if (STAT_RECORD) {
-		int err, fd = perf_data__fd(&perf_stat.data);
+		int fd = perf_data__fd(&perf_stat.data);
 
 		if (is_pipe) {
 			err = perf_header__write_pipe(perf_data__fd(&perf_stat.data));
@@ -876,7 +930,9 @@
 
 	if (forks) {
 		evlist__start_workload(evsel_list);
-		enable_counters();
+		err = enable_counters();
+		if (err)
+			return -1;
 
 		if (interval || timeout || evlist__ctlfd_initialized(evsel_list))
 			status = dispatch_events(forks, timeout, interval, &times);
@@ -895,7 +951,9 @@
 		if (WIFSIGNALED(status))
 			psignal(WTERMSIG(status), argv[0]);
 	} else {
-		enable_counters();
+		err = enable_counters();
+		if (err)
+			return -1;
 		status = dispatch_events(forks, timeout, interval, &times);
 	}
 
@@ -1085,6 +1143,10 @@
 		   "stat events on existing process id"),
 	OPT_STRING('t', "tid", &target.tid, "tid",
 		   "stat events on existing thread id"),
+#ifdef HAVE_BPF_SKEL
+	OPT_STRING('b', "bpf-prog", &target.bpf_str, "bpf-prog-id",
+		   "stat events on existing bpf program id"),
+#endif
 	OPT_BOOLEAN('a', "all-cpus", &target.system_wide,
 		    "system-wide collection from all CPUs"),
 	OPT_BOOLEAN('g', "group", &group,
@@ -1153,7 +1215,9 @@
 	OPT_BOOLEAN(0, "metric-no-merge", &stat_config.metric_no_merge,
 		       "don't try to share events between metrics in a group"),
 	OPT_BOOLEAN(0, "topdown", &topdown_run,
-			"measure topdown level 1 statistics"),
+			"measure top-down statistics"),
+	OPT_UINTEGER(0, "td-level", &stat_config.topdown_level,
+			"Set the metrics level for the top-down statistics (0: max level)"),
 	OPT_BOOLEAN(0, "smi-cost", &smi_cost,
 			"measure SMI cost"),
 	OPT_CALLBACK('M', "metrics", &evsel_list, "metric/metric group list",
@@ -1706,17 +1770,30 @@
 	}
 
 	if (topdown_run) {
+		const char **metric_attrs = topdown_metric_attrs;
+		unsigned int max_level = 1;
 		char *str = NULL;
 		bool warn = false;
 
 		if (!force_metric_only)
 			stat_config.metric_only = true;
 
-		if (topdown_filter_events(topdown_metric_attrs, &str, 1) < 0) {
+		if (pmu_have_event("cpu", topdown_metric_L2_attrs[5])) {
+			metric_attrs = topdown_metric_L2_attrs;
+			max_level = 2;
+		}
+
+		if (stat_config.topdown_level > max_level) {
+			pr_err("Invalid top-down metrics level. The max level is %u.\n", max_level);
+			return -1;
+		} else if (!stat_config.topdown_level)
+			stat_config.topdown_level = max_level;
+
+		if (topdown_filter_events(metric_attrs, &str, 1) < 0) {
 			pr_err("Out of memory\n");
 			return -1;
 		}
-		if (topdown_metric_attrs[0] && str) {
+		if (metric_attrs[0] && str) {
 			if (!stat_config.interval && !stat_config.metric_only) {
 				fprintf(stat_config.output,
 					"Topdown accuracy may decrease when measuring long periods.\n"
@@ -1779,6 +1856,9 @@
 		}
 		if (evlist__add_default_attrs(evsel_list, default_attrs1) < 0)
 			return -1;
+
+		if (arch_evlist__add_default_attrs(evsel_list) < 0)
+			return -1;
 	}
 
 	/* Detailed events get appended to the event list: */
@@ -2064,11 +2144,12 @@
 		"perf stat [<options>] [<command>]",
 		NULL
 	};
-	int status = -EINVAL, run_idx;
+	int status = -EINVAL, run_idx, err;
 	const char *mode;
 	FILE *output = stderr;
 	unsigned int interval, timeout;
 	const char * const stat_subcommands[] = { "record", "report" };
+	char errbuf[BUFSIZ];
 
 	setlocale(LC_ALL, "");
 
@@ -2179,6 +2260,12 @@
 	} else if (big_num_opt == 0) /* User passed --no-big-num */
 		stat_config.big_num = false;
 
+	err = target__validate(&target);
+	if (err) {
+		target__strerror(&target, err, errbuf, BUFSIZ);
+		pr_warning("%s\n", errbuf);
+	}
+
 	setup_system_wide(argc);
 
 	/*
@@ -2252,8 +2339,6 @@
 		}
 	}
 
-	target__validate(&target);
-
 	if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide))
 		target.per_thread = true;
 
@@ -2384,9 +2469,10 @@
 		 * tools remain  -acme
 		 */
 		int fd = perf_data__fd(&perf_stat.data);
-		int err = perf_event__synthesize_kernel_mmap((void *)&perf_stat,
-							     process_synthesized_event,
-							     &perf_stat.session->machines.host);
+
+		err = perf_event__synthesize_kernel_mmap((void *)&perf_stat,
+							 process_synthesized_event,
+							 &perf_stat.session->machines.host);
 		if (err) {
 			pr_warning("Couldn't synthesize the kernel mmap record, harmless, "
 				   "older tools may produce warnings about this file\n.");
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index 14a2db6..7303e80 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -37,6 +37,7 @@
 int cmd_mem(int argc, const char **argv);
 int cmd_data(int argc, const char **argv);
 int cmd_ftrace(int argc, const char **argv);
+int cmd_daemon(int argc, const char **argv);
 
 int find_scripts(char **scripts_array, char **scripts_path_array, int num,
 		 int pathlen);
diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt
index bc6c585..825a12e 100644
--- a/tools/perf/command-list.txt
+++ b/tools/perf/command-list.txt
@@ -31,3 +31,4 @@
 perf-top			mainporcelain common
 perf-trace			mainporcelain audit
 perf-version			mainporcelain common
+perf-daemon			mainporcelain common
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 27f94b0..20cb91e 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -88,6 +88,7 @@
 	{ "mem",	cmd_mem,	0 },
 	{ "data",	cmd_data,	0 },
 	{ "ftrace",	cmd_ftrace,	0 },
+	{ "daemon",	cmd_daemon,	0 },
 };
 
 struct pager_config {
diff --git a/tools/perf/pmu-events/arch/arm64/ampere/emag/branch.json b/tools/perf/pmu-events/arch/arm64/ampere/emag/branch.json
index 2d15b11..5c69c1e 100644
--- a/tools/perf/pmu-events/arch/arm64/ampere/emag/branch.json
+++ b/tools/perf/pmu-events/arch/arm64/ampere/emag/branch.json
@@ -9,15 +9,11 @@
         "ArchStdEvent": "BR_INDIRECT_SPEC"
     },
     {
-        "PublicDescription": "Mispredicted or not predicted branch speculatively executed",
-        "EventCode": "0x10",
-        "EventName": "BR_MIS_PRED",
+        "ArchStdEvent": "BR_MIS_PRED",
         "BriefDescription": "Branch mispredicted"
     },
     {
-        "PublicDescription": "Predictable branch speculatively executed",
-        "EventCode": "0x12",
-        "EventName": "BR_PRED",
+        "ArchStdEvent": "BR_PRED",
         "BriefDescription": "Predictable branch"
     }
 ]
diff --git a/tools/perf/pmu-events/arch/arm64/ampere/emag/bus.json b/tools/perf/pmu-events/arch/arm64/ampere/emag/bus.json
index 5c1a9a9..9bea1ba 100644
--- a/tools/perf/pmu-events/arch/arm64/ampere/emag/bus.json
+++ b/tools/perf/pmu-events/arch/arm64/ampere/emag/bus.json
@@ -18,9 +18,6 @@
         "ArchStdEvent": "BUS_ACCESS_PERIPH"
     },
     {
-        "PublicDescription": "Bus access",
-        "EventCode": "0x19",
-        "EventName": "BUS_ACCESS",
-        "BriefDescription": "Bus access"
+        "ArchStdEvent": "BUS_ACCESS",
     }
 ]
diff --git a/tools/perf/pmu-events/arch/arm64/ampere/emag/cache.json b/tools/perf/pmu-events/arch/arm64/ampere/emag/cache.json
index 40010a8..1e25f2a 100644
--- a/tools/perf/pmu-events/arch/arm64/ampere/emag/cache.json
+++ b/tools/perf/pmu-events/arch/arm64/ampere/emag/cache.json
@@ -39,70 +39,40 @@
         "ArchStdEvent": "L2D_CACHE_INVAL"
     },
     {
-        "PublicDescription": "Level 1 instruction cache refill",
-        "EventCode": "0x01",
-        "EventName": "L1I_CACHE_REFILL",
-        "BriefDescription": "L1I cache refill"
+        "ArchStdEvent": "L1I_CACHE_REFILL",
     },
     {
-        "PublicDescription": "Level 1 instruction TLB refill",
-        "EventCode": "0x02",
-        "EventName": "L1I_TLB_REFILL",
-        "BriefDescription": "L1I TLB refill"
+        "ArchStdEvent": "L1I_TLB_REFILL",
     },
     {
-        "PublicDescription": "Level 1 data cache refill",
-        "EventCode": "0x03",
-        "EventName": "L1D_CACHE_REFILL",
-        "BriefDescription": "L1D cache refill"
+        "ArchStdEvent": "L1D_CACHE_REFILL",
     },
     {
-        "PublicDescription": "Level 1 data cache access",
-        "EventCode": "0x04",
-        "EventName": "L1D_CACHE_ACCESS",
-        "BriefDescription": "L1D cache access"
+        "ArchStdEvent": "L1D_CACHE",
     },
     {
-        "PublicDescription": "Level 1 data TLB refill",
-        "EventCode": "0x05",
-        "EventName": "L1D_TLB_REFILL",
-        "BriefDescription": "L1D TLB refill"
+        "ArchStdEvent": "L1D_TLB_REFILL",
     },
     {
-        "PublicDescription": "Level 1 instruction cache access",
-        "EventCode": "0x14",
-        "EventName": "L1I_CACHE_ACCESS",
-        "BriefDescription": "L1I cache access"
+        "ArchStdEvent": "L1I_CACHE",
     },
     {
-        "PublicDescription": "Level 2 data cache access",
-        "EventCode": "0x16",
-        "EventName": "L2D_CACHE_ACCESS",
-        "BriefDescription": "L2D cache access"
+        "ArchStdEvent": "L2D_CACHE",
     },
     {
-        "PublicDescription": "Level 2 data refill",
-        "EventCode": "0x17",
-        "EventName": "L2D_CACHE_REFILL",
-        "BriefDescription": "L2D cache refill"
+        "ArchStdEvent": "L2D_CACHE_REFILL",
     },
     {
-        "PublicDescription": "Level 2 data cache, Write-Back",
-        "EventCode": "0x18",
-        "EventName": "L2D_CACHE_WB",
-        "BriefDescription": "L2D cache Write-Back"
+        "ArchStdEvent": "L2D_CACHE_WB",
     },
     {
-        "PublicDescription": "Level 1 data TLB access. This event counts any load or store operation which accesses the data L1 TLB",
-        "EventCode": "0x25",
-        "EventName": "L1D_TLB_ACCESS",
+        "PublicDescription": "This event counts any load or store operation which accesses the data L1 TLB",
+        "ArchStdEvent": "L1D_TLB",
         "BriefDescription": "L1D TLB access"
     },
     {
-        "PublicDescription": "Level 1 instruction TLB access. This event counts any instruction fetch which accesses the instruction L1 TLB",
-        "EventCode": "0x26",
-        "EventName": "L1I_TLB_ACCESS",
-        "BriefDescription": "L1I TLB access"
+        "PublicDescription": "This event counts any instruction fetch which accesses the instruction L1 TLB",
+        "ArchStdEvent": "L1I_TLB",
     },
     {
         "PublicDescription": "Level 2 access to data TLB that caused a page table walk. This event counts on any data access which causes L2D_TLB_REFILL to count",
@@ -114,7 +84,7 @@
         "PublicDescription": "Level 2 access to instruciton TLB that caused a page table walk. This event counts on any instruciton access which causes L2I_TLB_REFILL to count",
         "EventCode": "0x35",
         "EventName": "L2I_TLB_ACCESS",
-        "BriefDescription": "L2D TLB access"
+        "BriefDescription": "L2I TLB access"
     },
     {
         "PublicDescription": "Branch target buffer misprediction",
diff --git a/tools/perf/pmu-events/arch/arm64/ampere/emag/clock.json b/tools/perf/pmu-events/arch/arm64/ampere/emag/clock.json
index 51d1dc1..9076ca2 100644
--- a/tools/perf/pmu-events/arch/arm64/ampere/emag/clock.json
+++ b/tools/perf/pmu-events/arch/arm64/ampere/emag/clock.json
@@ -1,9 +1,7 @@
 [
     {
         "PublicDescription": "The number of core clock cycles",
-        "EventCode": "0x11",
-        "EventName": "CPU_CYCLES",
-        "BriefDescription": "Clock cycles"
+        "ArchStdEvent": "CPU_CYCLES",
     },
     {
         "PublicDescription": "FSU clocking gated off cycle",
diff --git a/tools/perf/pmu-events/arch/arm64/ampere/emag/exception.json b/tools/perf/pmu-events/arch/arm64/ampere/emag/exception.json
index 66e51bc..9761433 100644
--- a/tools/perf/pmu-events/arch/arm64/ampere/emag/exception.json
+++ b/tools/perf/pmu-events/arch/arm64/ampere/emag/exception.json
@@ -36,15 +36,9 @@
         "ArchStdEvent": "EXC_TRAP_FIQ"
     },
     {
-        "PublicDescription": "Exception taken",
-        "EventCode": "0x09",
-        "EventName": "EXC_TAKEN",
-        "BriefDescription": "Exception taken"
+        "ArchStdEvent": "EXC_TAKEN",
     },
     {
-        "PublicDescription": "Instruction architecturally executed, condition check pass, exception return",
-        "EventCode": "0x0a",
-        "EventName": "EXC_RETURN",
-        "BriefDescription": "Exception return"
+        "ArchStdEvent": "EXC_RETURN",
     }
 ]
diff --git a/tools/perf/pmu-events/arch/arm64/ampere/emag/instruction.json b/tools/perf/pmu-events/arch/arm64/ampere/emag/instruction.json
index 0d3e467..482aa3f 100644
--- a/tools/perf/pmu-events/arch/arm64/ampere/emag/instruction.json
+++ b/tools/perf/pmu-events/arch/arm64/ampere/emag/instruction.json
@@ -40,45 +40,29 @@
     },
     {
         "PublicDescription": "Instruction architecturally executed, software increment",
-        "EventCode": "0x00",
-        "EventName": "SW_INCR",
+        "ArchStdEvent": "SW_INCR",
         "BriefDescription": "Software increment"
     },
     {
-        "PublicDescription": "Instruction architecturally executed",
-        "EventCode": "0x08",
-        "EventName": "INST_RETIRED",
-        "BriefDescription": "Instruction retired"
+        "ArchStdEvent": "INST_RETIRED",
     },
     {
-        "PublicDescription": "Instruction architecturally executed, condition code check pass, write to CONTEXTIDR",
-        "EventCode": "0x0b",
-        "EventName": "CID_WRITE_RETIRED",
+        "ArchStdEvent": "CID_WRITE_RETIRED",
         "BriefDescription": "Write to CONTEXTIDR"
     },
     {
-        "PublicDescription": "Operation speculatively executed",
-        "EventCode": "0x1b",
-        "EventName": "INST_SPEC",
-        "BriefDescription": "Speculatively executed"
+        "ArchStdEvent": "INST_SPEC",
     },
     {
-        "PublicDescription": "Instruction architecturally executed (condition check pass), write to TTBR",
-        "EventCode": "0x1c",
-        "EventName": "TTBR_WRITE_RETIRED",
-        "BriefDescription": "Instruction executed, TTBR write"
+        "ArchStdEvent": "TTBR_WRITE_RETIRED",
     },
     {
-        "PublicDescription": "Instruction architecturally executed, branch. This event counts all branches, taken or not. This excludes exception entries, debug entries and CCFAIL branches",
-        "EventCode": "0x21",
-        "EventName": "BR_RETIRED",
-        "BriefDescription": "Branch retired"
+        "PublicDescription": "This event counts all branches, taken or not. This excludes exception entries, debug entries and CCFAIL branches",
+        "ArchStdEvent": "BR_RETIRED",
     },
     {
-        "PublicDescription": "Instruction architecturally executed, mispredicted branch. This event counts any branch counted by BR_RETIRED which is not correctly predicted and causes a pipeline flush",
-        "EventCode": "0x22",
-        "EventName": "BR_MISPRED_RETIRED",
-        "BriefDescription": "Mispredicted branch retired"
+        "PublicDescription": "This event counts any branch counted by BR_RETIRED which is not correctly predicted and causes a pipeline flush",
+        "ArchStdEvent": "BR_MIS_PRED_RETIRED",
     },
     {
         "PublicDescription": "Operation speculatively executed, NOP",
diff --git a/tools/perf/pmu-events/arch/arm64/ampere/emag/memory.json b/tools/perf/pmu-events/arch/arm64/ampere/emag/memory.json
index c2fe674..2e75556 100644
--- a/tools/perf/pmu-events/arch/arm64/ampere/emag/memory.json
+++ b/tools/perf/pmu-events/arch/arm64/ampere/emag/memory.json
@@ -15,15 +15,10 @@
         "ArchStdEvent": "UNALIGNED_LDST_SPEC"
     },
     {
-        "PublicDescription": "Data memory access",
-        "EventCode": "0x13",
-        "EventName": "MEM_ACCESS",
-        "BriefDescription": "Memory access"
+        "ArchStdEvent": "MEM_ACCESS",
     },
     {
-        "PublicDescription": "Local memory error. This event counts any correctable or uncorrectable memory error (ECC or parity) in the protected core RAMs",
-        "EventCode": "0x1a",
-        "EventName": "MEM_ERROR",
-        "BriefDescription": "Memory error"
+        "PublicDescription": "This event counts any correctable or uncorrectable memory error (ECC or parity) in the protected core RAMs",
+        "ArchStdEvent": "MEMORY_ERROR",
     }
 ]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/branch.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/branch.json
index b5e5d05..ec0dc92 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/branch.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/branch.json
@@ -1,14 +1,10 @@
 [
     {
-        "PublicDescription": "Mispredicted or not predicted branch speculatively executed. This event counts any predictable branch instruction which is mispredicted either due to dynamic misprediction or because the MMU is off and the branches are statically predicted not taken.",
-        "EventCode": "0x10",
-        "EventName": "BR_MIS_PRED",
-        "BriefDescription": "Mispredicted or not predicted branch speculatively executed."
+        "PublicDescription": "This event counts any predictable branch instruction which is mispredicted either due to dynamic misprediction or because the MMU is off and the branches are statically predicted not taken",
+        "ArchStdEvent": "BR_MIS_PRED",
     },
     {
-        "PublicDescription": "Predictable branch speculatively executed. This event counts all predictable branches.",
-        "EventCode": "0x12",
-        "EventName": "BR_PRED",
-        "BriefDescription": "Predictable branch speculatively executed."
+        "PublicDescription": "This event counts all predictable branches.",
+        "ArchStdEvent": "BR_PRED",
     }
 ]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/bus.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/bus.json
index fce7309..6263929 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/bus.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/bus.json
@@ -1,24 +1,21 @@
 [
     {
-        "EventCode": "0x11",
-        "EventName": "CPU_CYCLES",
+        "PublicDescription": "The number of core clock cycles"
+        "ArchStdEvent": "CPU_CYCLES",
         "BriefDescription": "The number of core clock cycles."
     },
     {
-        "PublicDescription": "Bus access. This event counts for every beat of data transferred over the data channels between the core and the SCU. If both read and write data beats are transferred on a given cycle, this event is counted twice on that cycle. This event counts the sum of BUS_ACCESS_RD and BUS_ACCESS_WR.",
-        "EventCode": "0x19",
-        "EventName": "BUS_ACCESS",
-        "BriefDescription": "Bus access."
+        "PublicDescription": "This event counts for every beat of data transferred over the data channels between the core and the SCU. If both read and write data beats are transferred on a given cycle, this event is counted twice on that cycle. This event counts the sum of BUS_ACCESS_RD and BUS_ACCESS_WR.",
+        "ArchStdEvent": "BUS_ACCESS",
     },
     {
-        "EventCode": "0x1D",
-        "EventName": "BUS_CYCLES",
-        "BriefDescription": "Bus cycles. This event duplicates CPU_CYCLES."
+        "PublicDescription": "This event duplicates CPU_CYCLES."
+        "ArchStdEvent": "BUS_CYCLES",
     },
     {
-        "ArchStdEvent":  "BUS_ACCESS_RD"
+        "ArchStdEvent":  "BUS_ACCESS_RD",
     },
     {
-        "ArchStdEvent":  "BUS_ACCESS_WR"
+        "ArchStdEvent":  "BUS_ACCESS_WR",
     }
 ]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/cache.json
index 2459408..cd67bb9 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/cache.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/cache.json
@@ -1,133 +1,95 @@
 [
     {
-        "PublicDescription": "L1 instruction cache refill. This event counts any instruction fetch which misses in the cache.",
-        "EventCode": "0x01",
-        "EventName": "L1I_CACHE_REFILL",
-        "BriefDescription": "L1 instruction cache refill"
+        "PublicDescription": "This event counts any instruction fetch which misses in the cache.",
+        "ArchStdEvent": "L1I_CACHE_REFILL",
     },
     {
-        "PublicDescription": "L1 instruction TLB refill. This event counts any refill of the instruction L1 TLB from the L2 TLB. This includes refills that result in a translation fault.",
-        "EventCode": "0x02",
-        "EventName": "L1I_TLB_REFILL",
-        "BriefDescription": "L1 instruction TLB refill"
+        "PublicDescription": "This event counts any refill of the instruction L1 TLB from the L2 TLB. This includes refills that result in a translation fault.",
+        "ArchStdEvent": "L1I_TLB_REFILL",
     },
     {
-        "PublicDescription": "L1 data cache refill. This event counts any load or store operation or page table walk access which causes data to be read from outside the L1, including accesses which do not allocate into L1.",
-        "EventCode": "0x03",
-        "EventName": "L1D_CACHE_REFILL",
-        "BriefDescription": "L1 data cache refill"
+        "PublicDescription": "This event counts any load or store operation or page table walk access which causes data to be read from outside the L1, including accesses which do not allocate into L1.",
+        "ArchStdEvent": "L1D_CACHE_REFILL",
     },
     {
-        "PublicDescription": "L1 data cache access. This event counts any load or store operation or page table walk access which looks up in the L1 data cache. In particular, any access which could count the L1D_CACHE_REFILL event causes this event to count.",
-        "EventCode": "0x04",
-        "EventName": "L1D_CACHE",
-        "BriefDescription": "L1 data cache access"
+        "PublicDescription": "This event counts any load or store operation or page table walk access which looks up in the L1 data cache. In particular, any access which could count the L1D_CACHE_REFILL event causes this event to count.",
+        "ArchStdEvent": "L1D_CACHE",
     },
     {
-        "PublicDescription": "L1 data TLB refill. This event counts any refill of the data L1 TLB from the L2 TLB. This includes refills that result in a translation fault.",
-        "EventCode": "0x05",
-        "EventName": "L1D_TLB_REFILL",
-        "BriefDescription": "L1 data TLB refill"
+        "PublicDescription": "This event counts any refill of the data L1 TLB from the L2 TLB. This includes refills that result in a translation fault.",
+        "ArchStdEvent": "L1D_TLB_REFILL",
     },
-    {
+    {,
         "PublicDescription": "Level 1 instruction cache access or Level 0 Macro-op cache access. This event counts any instruction fetch which accesses the L1 instruction cache or L0 Macro-op cache.",
-        "EventCode": "0x14",
-        "EventName": "L1I_CACHE",
-        "BriefDescription": "L1 instruction cache access"
+        "ArchStdEvent": "L1I_CACHE",
     },
     {
-        "PublicDescription": "L1 data cache Write-Back. This event counts any write-back of data from the L1 data cache to L2 or L3. This counts both victim line evictions and snoops, including cache maintenance operations.",
-        "EventCode": "0x15",
-        "EventName": "L1D_CACHE_WB",
-        "BriefDescription": "L1 data cache Write-Back"
+        "PublicDescription": "This event counts any write-back of data from the L1 data cache to L2 or L3. This counts both victim line evictions and snoops, including cache maintenance operations.",
+        "ArchStdEvent": "L1D_CACHE_WB",
     },
     {
-        "PublicDescription": "L2 data cache access. This event counts any transaction from L1 which looks up in the L2 cache, and any write-back from the L1 to the L2. Snoops from outside the core and cache maintenance operations are not counted.",
-        "EventCode": "0x16",
-        "EventName": "L2D_CACHE",
-        "BriefDescription": "L2 data cache access"
+        "PublicDescription": "This event counts any transaction from L1 which looks up in the L2 cache, and any write-back from the L1 to the L2. Snoops from outside the core and cache maintenance operations are not counted.",
+        "ArchStdEvent": "L2D_CACHE",
     },
     {
         "PublicDescription": "L2 data cache refill. This event counts any cacheable transaction from L1 which causes data to be read from outside the core. L2 refills caused by stashes into L2 should not be counted",
-        "EventCode": "0x17",
-        "EventName": "L2D_CACHE_REFILL",
-        "BriefDescription": "L2 data cache refill"
+        "ArchStdEvent": "L2D_CACHE_REFILL",
     },
     {
-        "PublicDescription": "L2 data cache write-back. This event counts any write-back of data from the L2 cache to outside the core. This includes snoops to the L2 which return data, regardless of whether they cause an invalidation. Invalidations from the L2 which do not write data outside of the core and snoops which return data from the L1 are not counted",
-        "EventCode": "0x18",
-        "EventName": "L2D_CACHE_WB",
-        "BriefDescription": "L2 data cache write-back"
+        "PublicDescription": "This event counts any write-back of data from the L2 cache to outside the core. This includes snoops to the L2 which return data, regardless of whether they cause an invalidation. Invalidations from the L2 which do not write data outside of the core and snoops which return data from the L1 are not counted",
+        "ArchStdEvent": "L2D_CACHE_WB",
     },
     {
-        "PublicDescription": "L2 data cache allocation without refill. This event counts any full cache line write into the L2 cache which does not cause a linefill, including write-backs from L1 to L2 and full-line writes which do not allocate into L1.",
-        "EventCode": "0x20",
-        "EventName": "L2D_CACHE_ALLOCATE",
-        "BriefDescription": "L2 data cache allocation without refill"
+        "PublicDescription": "This event counts any full cache line write into the L2 cache which does not cause a linefill, including write-backs from L1 to L2 and full-line writes which do not allocate into L1.",
+        "ArchStdEvent": "L2D_CACHE_ALLOCATE",
     },
     {
-        "PublicDescription": "Level 1 data TLB access. This event counts any load or store operation which accesses the data L1 TLB. If both a load and a store are executed on a cycle, this event counts twice. This event counts regardless of whether the MMU is enabled.",
-        "EventCode": "0x25",
-        "EventName": "L1D_TLB",
+        "PublicDescription": "This event counts any load or store operation which accesses the data L1 TLB. If both a load and a store are executed on a cycle, this event counts twice. This event counts regardless of whether the MMU is enabled.",
+        "ArchStdEvent": "L1D_TLB",
         "BriefDescription": "Level 1 data TLB access."
     },
     {
-        "PublicDescription": "Level 1 instruction TLB access. This event counts any instruction fetch which accesses the instruction L1 TLB.This event counts regardless of whether the MMU is enabled.",
-        "EventCode": "0x26",
-        "EventName": "L1I_TLB",
+        "PublicDescription": "This event counts any instruction fetch which accesses the instruction L1 TLB.This event counts regardless of whether the MMU is enabled.",
+        "ArchStdEvent": "L1I_TLB",
         "BriefDescription": "Level 1 instruction TLB access"
     },
     {
         "PublicDescription": "This event counts any full cache line write into the L3 cache which does not cause a linefill, including write-backs from L2 to L3 and full-line writes which do not allocate into L2",
-        "EventCode": "0x29",
-        "EventName": "L3D_CACHE_ALLOCATE",
+        "ArchStdEvent": "L3D_CACHE_ALLOCATE",
         "BriefDescription": "Allocation without refill"
     },
     {
-        "PublicDescription": "Attributable Level 3 unified cache refill. This event counts for any cacheable read transaction returning datafrom the SCU for which the data source was outside the cluster. Transactions such as ReadUnique are counted here as 'read' transactions, even though they can be generated by store instructions.",
-        "EventCode": "0x2A",
-        "EventName": "L3D_CACHE_REFILL",
+        "PublicDescription": "This event counts for any cacheable read transaction returning datafrom the SCU for which the data source was outside the cluster. Transactions such as ReadUnique are counted here as 'read' transactions, even though they can be generated by store instructions.",
+        "ArchStdEvent": "L3D_CACHE_REFILL",
         "BriefDescription": "Attributable Level 3 unified cache refill."
     },
     {
-        "PublicDescription": "Attributable Level 3 unified cache access. This event counts for any cacheable read transaction returning datafrom the SCU, or for any cacheable write to the SCU.",
-        "EventCode": "0x2B",
-        "EventName": "L3D_CACHE",
+        "PublicDescription": "This event counts for any cacheable read transaction returning datafrom the SCU, or for any cacheable write to the SCU.",
+        "ArchStdEvent": "L3D_CACHE",
         "BriefDescription": "Attributable Level 3 unified cache access."
     },
     {
-        "PublicDescription": "Attributable L2 data or unified TLB refill. This event counts on anyrefill of the L2 TLB, caused by either an instruction or data access.This event does not count if the MMU is disabled.",
-        "EventCode": "0x2D",
-        "EventName": "L2D_TLB_REFILL",
+        "PublicDescription": "This event counts on anyrefill of the L2 TLB, caused by either an instruction or data access.This event does not count if the MMU is disabled.",
+        "ArchStdEvent": "L2D_TLB_REFILL",
         "BriefDescription": "Attributable L2 data or unified TLB refill"
     },
     {
-        "PublicDescription": "Attributable L2 data or unified TLB access. This event counts on any access to the L2 TLB (caused by a refill of any of the L1 TLBs). This event does not count if the MMU is disabled.",
-        "EventCode": "0x2F",
-        "EventName": "L2D_TLB",
-        "BriefDescription": "Attributable L2 data or unified TLB access"
+        "PublicDescription": "This event counts on any access to the L2 TLB (caused by a refill of any of the L1 TLBs). This event does not count if the MMU is disabled.",
+        "ArchStdEvent": "L2D_TLB",
     },
     {
-        "PublicDescription": "Access to data TLB that caused a page table walk. This event counts on any data access which causes L2D_TLB_REFILL to count.",
-        "EventCode": "0x34",
-        "EventName": "DTLB_WALK",
-        "BriefDescription": "Access to data TLB that caused a page table walk."
+        "PublicDescription": "This event counts on any data access which causes L2D_TLB_REFILL to count.",
+        "ArchStdEvent": "DTLB_WALK",
     },
     {
-        "PublicDescription": "Access to instruction TLB that caused a page table walk. This event counts on any instruction access which causes L2D_TLB_REFILL to count.",
-        "EventCode": "0x35",
-        "EventName": "ITLB_WALK",
-        "BriefDescription": "Access to instruction TLB that caused a page table walk."
+        "PublicDescription": "This event counts on any instruction access which causes L2D_TLB_REFILL to count.",
+        "ArchStdEvent": "ITLB_WALK",
     },
     {
-        "EventCode": "0x36",
-        "EventName": "LL_CACHE_RD",
-        "BriefDescription": "Last level cache access, read"
+        "ArchStdEvent": "LL_CACHE_RD",
     },
     {
-        "EventCode": "0x37",
-        "EventName": "LL_CACHE_MISS_RD",
-        "BriefDescription": "Last level cache miss, read"
+        "ArchStdEvent": "LL_CACHE_MISS_RD",
     },
     {
         "ArchStdEvent": "L1D_CACHE_INVAL"
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/exception.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/exception.json
index 98d29c8..ea4631d 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/exception.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/exception.json
@@ -1,14 +1,10 @@
 [
     {
-        "EventCode": "0x09",
-        "EventName": "EXC_TAKEN",
-        "BriefDescription": "Exception taken."
+        "ArchStdEvent": "EXC_TAKEN",
     },
     {
-        "PublicDescription": "Local memory error. This event counts any correctable or uncorrectable memory error (ECC or parity) in the protected core RAMs",
-        "EventCode": "0x1A",
-        "EventName": "MEMORY_ERROR",
-        "BriefDescription": "Local memory error."
+        "PublicDescription": "This event counts any correctable or uncorrectable memory error (ECC or parity) in the protected core RAMs",
+        "ArchStdEvent": "MEMORY_ERROR",
     },
     {
         "ArchStdEvent": "EXC_DABORT"
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/instruction.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/instruction.json
index c153ac7..8e59566 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/instruction.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/instruction.json
@@ -1,49 +1,32 @@
 [
     {
-        "PublicDescription": "Software increment. Instruction architecturally executed (condition code check pass).",
-        "EventCode": "0x00",
-        "EventName": "SW_INCR",
-        "BriefDescription": "Software increment."
+        "ArchStdEvent": "SW_INCR",
     },
     {
-        "PublicDescription": "Instruction architecturally executed. This event counts all retired instructions, including those that fail their condition check.",
-        "EventCode": "0x08",
-        "EventName": "INST_RETIRED",
-        "BriefDescription": "Instruction architecturally executed."
+        "PublicDescription": "This event counts all retired instructions, including those that fail their condition check.",
+        "ArchStdEvent": "INST_RETIRED",
     },
     {
-        "EventCode": "0x0A",
-        "EventName": "EXC_RETURN",
-        "BriefDescription": "Instruction architecturally executed, condition code check pass, exception return."
+        "ArchStdEvent": "EXC_RETURN",
     },
     {
-        "PublicDescription": "Instruction architecturally executed, condition code check pass, write to CONTEXTIDR. This event only counts writes to CONTEXTIDR in AArch32 state, and via the CONTEXTIDR_EL1 mnemonic in AArch64 state.",
-        "EventCode": "0x0B",
-        "EventName": "CID_WRITE_RETIRED",
-        "BriefDescription": "Instruction architecturally executed, condition code check pass, write to CONTEXTIDR."
+        "PublicDescription": "This event only counts writes to CONTEXTIDR in AArch32 state, and via the CONTEXTIDR_EL1 mnemonic in AArch64 state.",
+        "ArchStdEvent": "CID_WRITE_RETIRED",
     },
     {
-        "EventCode": "0x1B",
-        "EventName": "INST_SPEC",
-        "BriefDescription": "Operation speculatively executed"
+        "ArchStdEvent": "INST_SPEC",
     },
     {
-        "PublicDescription": "Instruction architecturally executed, condition code check pass, write to TTBR. This event only counts writes to TTBR0/TTBR1 in AArch32 state and TTBR0_EL1/TTBR1_EL1 in AArch64 state.",
-        "EventCode": "0x1C",
-        "EventName": "TTBR_WRITE_RETIRED",
-        "BriefDescription": "Instruction architecturally executed, condition code check pass, write to TTBR"
+        "PublicDescription": "This event only counts writes to TTBR0/TTBR1 in AArch32 state and TTBR0_EL1/TTBR1_EL1 in AArch64 state.",
+        "ArchStdEvent": "TTBR_WRITE_RETIRED",
+    },
+    {,
+        "PublicDescription": "This event counts all branches, taken or not. This excludes exception entries, debug entries and CCFAIL branches.",
+        "ArchStdEvent": "BR_RETIRED",
     },
     {
-        "PublicDescription": "Instruction architecturally executed, branch. This event counts all branches, taken or not. This excludes exception entries, debug entries and CCFAIL branches.",
-        "EventCode": "0x21",
-        "EventName": "BR_RETIRED",
-        "BriefDescription": "Instruction architecturally executed, branch."
-    },
-    {
-        "PublicDescription": "Instruction architecturally executed, mispredicted branch. This event counts any branch counted by BR_RETIRED which is not correctly predicted and causes a pipeline flush.",
-        "EventCode": "0x22",
-        "EventName": "BR_MIS_PRED_RETIRED",
-        "BriefDescription": "Instruction architecturally executed, mispredicted branch."
+        "PublicDescription": "This event counts any branch counted by BR_RETIRED which is not correctly predicted and causes a pipeline flush.",
+        "ArchStdEvent": "BR_MIS_PRED_RETIRED",
     },
     {
         "ArchStdEvent": "ASE_SPEC"
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/memory.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/memory.json
index b866432..f06f399 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/memory.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/memory.json
@@ -1,9 +1,7 @@
 [
     {
-        "PublicDescription": "Data memory access. This event counts memory accesses due to load or store instructions. This event counts the sum of MEM_ACCESS_RD and MEM_ACCESS_WR.",
-        "EventCode": "0x13",
-        "EventName": "MEM_ACCESS",
-        "BriefDescription": "Data memory access"
+        "PublicDescription": "This event counts memory accesses due to load or store instructions. This event counts the sum of MEM_ACCESS_RD and MEM_ACCESS_WR.",
+        "ArchStdEvent": "MEM_ACCESS",
     },
     {
          "ArchStdEvent": "MEM_ACCESS_RD"
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/other.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/other.json
index 8bde029..c2ccbf6 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/other.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/other.json
@@ -1,7 +1,5 @@
 [
     {
-        "EventCode": "0x31",
-        "EventName": "REMOTE_ACCESS",
-        "BriefDescription": "Access to another socket in a multi-socket system"
+        "ArchStdEvent": "REMOTE_ACCESS",
     }
 ]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/pipeline.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/pipeline.json
index 010a647..d79f0aea 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/pipeline.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/pipeline.json
@@ -1,14 +1,10 @@
 [
     {
-        "PublicDescription": "No operation issued because of the frontend. The counter counts on any cycle when there are no fetched instructions available to dispatch.",
-        "EventCode": "0x23",
-        "EventName": "STALL_FRONTEND",
-        "BriefDescription": "No operation issued because of the frontend."
+        "PublicDescription": "The counter counts on any cycle when there are no fetched instructions available to dispatch.",
+        "ArchStdEvent": "STALL_FRONTEND",
     },
     {
-        "PublicDescription": "No operation issued because of the backend. The counter counts on any cycle fetched instructions are not dispatched due to resource constraints.",
-        "EventCode": "0x24",
-        "EventName": "STALL_BACKEND",
-        "BriefDescription": "No operation issued because of the backend."
+        "PublicDescription": "The counter counts on any cycle fetched instructions are not dispatched due to resource constraints.",
+        "ArchStdEvent": "STALL_BACKEND",
     }
 ]
diff --git a/tools/perf/pmu-events/arch/arm64/armv8-common-and-microarch.json b/tools/perf/pmu-events/arch/arm64/armv8-common-and-microarch.json
new file mode 100644
index 0000000..75376c7
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/armv8-common-and-microarch.json
@@ -0,0 +1,248 @@
+[
+    {
+        "PublicDescription": "Instruction architecturally executed, Condition code check pass, software increment",
+        "EventCode": "0x00",
+        "EventName": "SW_INCR",
+        "BriefDescription": "Instruction architecturally executed, Condition code check pass, software increment"
+    },
+    {
+        "PublicDescription": "Level 1 instruction cache refill",
+        "EventCode": "0x01",
+        "EventName": "L1I_CACHE_REFILL",
+        "BriefDescription": "Level 1 instruction cache refill"
+    },
+    {
+        "PublicDescription": "Attributable Level 1 instruction TLB refill",
+        "EventCode": "0x02",
+        "EventName": "L1I_TLB_REFILL",
+        "BriefDescription": "Attributable Level 1 instruction TLB refill"
+    },
+    {
+        "PublicDescription": "Level 1 data cache refill",
+        "EventCode": "0x03",
+        "EventName": "L1D_CACHE_REFILL",
+        "BriefDescription": "Level 1 data cache refill"
+    },
+    {
+        "PublicDescription": "Level 1 data cache access",
+        "EventCode": "0x04",
+        "EventName": "L1D_CACHE",
+        "BriefDescription": "Level 1 data cache access"
+    },
+    {
+        "PublicDescription": "Attributable Level 1 data TLB refill",
+        "EventCode": "0x05",
+        "EventName": "L1D_TLB_REFILL",
+        "BriefDescription": "Attributable Level 1 data TLB refill"
+    },
+    {
+        "PublicDescription": "Instruction architecturally executed",
+        "EventCode": "0x08",
+        "EventName": "INST_RETIRED",
+        "BriefDescription": "Instruction architecturally executed"
+    },
+    {
+        "PublicDescription": "Exception taken",
+        "EventCode": "0x09",
+        "EventName": "EXC_TAKEN",
+        "BriefDescription": "Exception taken"
+    },
+    {
+        "PublicDescription": "Instruction architecturally executed, condition check pass, exception return",
+        "EventCode": "0x0a",
+        "EventName": "EXC_RETURN",
+        "BriefDescription": "Instruction architecturally executed, condition check pass, exception return"
+    },
+    {
+        "PublicDescription": "Instruction architecturally executed, condition code check pass, write to CONTEXTIDR",
+        "EventCode": "0x0b",
+        "EventName": "CID_WRITE_RETIRED",
+        "BriefDescription": "Instruction architecturally executed, condition code check pass, write to CONTEXTIDR"
+    },
+    {
+        "PublicDescription": "Mispredicted or not predicted branch speculatively executed",
+        "EventCode": "0x10",
+        "EventName": "BR_MIS_PRED",
+        "BriefDescription": "Mispredicted or not predicted branch speculatively executed"
+    },
+    {
+        "PublicDescription": "Cycle",
+        "EventCode": "0x11",
+        "EventName": "CPU_CYCLES",
+        "BriefDescription": "Cycle"
+    },
+    {
+        "PublicDescription": "Predictable branch speculatively executed",
+        "EventCode": "0x12",
+        "EventName": "BR_PRED",
+        "BriefDescription": "Predictable branch speculatively executed"
+    },
+    {
+        "PublicDescription": "Data memory access",
+        "EventCode": "0x13",
+        "EventName": "MEM_ACCESS",
+        "BriefDescription": "Data memory access"
+    },
+    {
+        "PublicDescription": "Attributable Level 1 instruction cache access",
+        "EventCode": "0x14",
+        "EventName": "L1I_CACHE",
+        "BriefDescription": "Attributable Level 1 instruction cache access"
+    },
+    {
+        "PublicDescription": "Attributable Level 1 data cache write-back",
+        "EventCode": "0x15",
+        "EventName": "L1D_CACHE_WB",
+        "BriefDescription": "Attributable Level 1 data cache write-back"
+    },
+    {
+        "PublicDescription": "Level 2 data cache access",
+        "EventCode": "0x16",
+        "EventName": "L2D_CACHE",
+        "BriefDescription": "Level 2 data cache access"
+    },
+    {
+        "PublicDescription": "Level 2 data refill",
+        "EventCode": "0x17",
+        "EventName": "L2D_CACHE_REFILL",
+        "BriefDescription": "Level 2 data refill"
+    },
+    {
+        "PublicDescription": "Attributable Level 2 data cache write-back",
+        "EventCode": "0x18",
+        "EventName": "L2D_CACHE_WB",
+        "BriefDescription": "Attributable Level 2 data cache write-back"
+    },
+    {
+        "PublicDescription": "Attributable Bus access",
+        "EventCode": "0x19",
+        "EventName": "BUS_ACCESS",
+        "BriefDescription": "Attributable Bus access"
+    },
+    {
+        "PublicDescription": "Local memory error",
+        "EventCode": "0x1a",
+        "EventName": "MEMORY_ERROR",
+        "BriefDescription": "Local memory error"
+    },
+    {
+        "PublicDescription": "Operation speculatively executed",
+        "EventCode": "0x1b",
+        "EventName": "INST_SPEC",
+        "BriefDescription": "Operation speculatively executed"
+    },
+    {
+        "PublicDescription": "Instruction architecturally executed, Condition code check pass, write to TTBR",
+        "EventCode": "0x1c",
+        "EventName": "TTBR_WRITE_RETIRED",
+        "BriefDescription": "Instruction architecturally executed, Condition code check pass, write to TTBR"
+    },
+    {
+        "PublicDescription": "Bus cycle",
+        "EventCode": "0x1D",
+        "EventName": "BUS_CYCLES",
+        "BriefDescription": "Bus cycle"
+    },
+    {
+        "PublicDescription": "Attributable Level 2 data cache allocation without refill",
+        "EventCode": "0x20",
+        "EventName": "L2D_CACHE_ALLOCATE",
+        "BriefDescription": "Attributable Level 2 data cache allocation without refill"
+    },
+    {
+        "PublicDescription": "Instruction architecturally executed, branch",
+        "EventCode": "0x21",
+        "EventName": "BR_RETIRED",
+        "BriefDescription": "Instruction architecturally executed, branch"
+    },
+    {
+        "PublicDescription": "Instruction architecturally executed, mispredicted branch",
+        "EventCode": "0x22",
+        "EventName": "BR_MIS_PRED_RETIRED",
+        "BriefDescription": "Instruction architecturally executed, mispredicted branch"
+    },
+    {
+        "PublicDescription": "No operation issued because of the frontend",
+        "EventCode": "0x23",
+        "EventName": "STALL_FRONTEND",
+        "BriefDescription": "No operation issued because of the frontend"
+    },
+    {
+        "PublicDescription": "No operation issued due to the backend",
+        "EventCode": "0x24",
+        "EventName": "STALL_BACKEND",
+        "BriefDescription": "No operation issued due to the backend"
+    },
+    {
+        "PublicDescription": "Attributable Level 1 data or unified TLB access",
+        "EventCode": "0x25",
+        "EventName": "L1D_TLB",
+        "BriefDescription": "Attributable Level 1 data or unified TLB access"
+    },
+    {
+        "PublicDescription": "Attributable Level 1 instruction TLB access",
+        "EventCode": "0x26",
+        "EventName": "L1I_TLB",
+        "BriefDescription": "Attributable Level 1 instruction TLB access"
+    },
+    {
+        "PublicDescription": "Attributable Level 3 data cache allocation without refill",
+        "EventCode": "0x29",
+        "EventName": "L3D_CACHE_ALLOCATE",
+        "BriefDescription": "Attributable Level 3 data cache allocation without refill"
+    },
+    {
+        "PublicDescription": "Attributable Level 3 data cache refill",
+        "EventCode": "0x2A",
+        "EventName": "L3D_CACHE_REFILL",
+        "BriefDescription": "Attributable Level 3 data cache refill"
+    },
+    {
+        "PublicDescription": "Attributable Level 3 data cache access",
+        "EventCode": "0x2B",
+        "EventName": "L3D_CACHE",
+        "BriefDescription": "Attributable Level 3 data cache access"
+    },
+    {
+        "PublicDescription": "Attributable Level 2 data TLB refill",
+        "EventCode": "0x2D",
+        "EventName": "L2D_TLB_REFILL",
+        "BriefDescription": "Attributable Level 2 data TLB refill"
+    },
+    {
+        "PublicDescription": "Attributable Level 2 data or unified TLB access",
+        "EventCode": "0x2F",
+        "EventName": "L2D_TLB",
+        "BriefDescription": "Attributable Level 2 data or unified TLB access"
+    },
+    {
+        "PublicDescription": "Access to another socket in a multi-socket system",
+        "EventCode": "0x31",
+        "EventName": "REMOTE_ACCESS",
+        "BriefDescription": "Access to another socket in a multi-socket system"
+    },
+    {
+        "PublicDescription": "Access to data TLB causes a translation table walk",
+        "EventCode": "0x34",
+        "EventName": "DTLB_WALK",
+        "BriefDescription": "Access to data TLB causes a translation table walk"
+    },
+    {
+        "PublicDescription": "Access to instruction TLB that causes a translation table walk",
+        "EventCode": "0x35",
+        "EventName": "ITLB_WALK",
+        "BriefDescription": "Access to instruction TLB that causes a translation table walk"
+    },
+    {
+        "PublicDescription": "Attributable Last level cache memory read",
+        "EventCode": "0x36",
+        "EventName": "LL_CACHE_RD",
+        "BriefDescription": "Attributable Last level cache memory read"
+    },
+    {
+        "PublicDescription": "Last level cache miss, read",
+        "EventCode": "0x37",
+        "EventName": "LL_CACHE_MISS_RD",
+        "BriefDescription": "Last level cache miss, read"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/freescale/imx8mm/sys/metrics.json b/tools/perf/pmu-events/arch/arm64/freescale/imx8mm/sys/metrics.json
index 8e553b6..f416fa0 100644
--- a/tools/perf/pmu-events/arch/arm64/freescale/imx8mm/sys/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/freescale/imx8mm/sys/metrics.json
@@ -6,7 +6,7 @@
 	    "ScaleUnit": "9.765625e-4KB",
 	    "Unit": "imx8_ddr",
 	    "Compat": "i.MX8MM"
-    },
+   },
    {
 	    "BriefDescription": "bytes all masters write to ddr based on write-cycles event",
 	    "MetricName": "imx8mm_ddr_write.all",
@@ -14,5 +14,5 @@
 	    "ScaleUnit": "9.765625e-4KB",
 	    "Unit": "imx8_ddr",
 	    "Compat": "i.MX8MM"
-    }
+   }
 ]
diff --git a/tools/perf/pmu-events/arch/arm64/freescale/imx8mn/sys/ddrc.json b/tools/perf/pmu-events/arch/arm64/freescale/imx8mn/sys/ddrc.json
new file mode 100644
index 0000000..8352e73
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/freescale/imx8mn/sys/ddrc.json
@@ -0,0 +1,37 @@
+[
+   {
+           "BriefDescription": "ddr cycles event",
+           "EventCode": "0x00",
+           "EventName": "imx8mn_ddr.cycles",
+           "Unit": "imx8_ddr",
+           "Compat": "i.MX8MN"
+   },
+   {
+           "BriefDescription": "ddr read-cycles event",
+           "EventCode": "0x2a",
+           "EventName": "imx8mn_ddr.read_cycles",
+           "Unit": "imx8_ddr",
+           "Compat": "i.MX8MN"
+   },
+   {
+           "BriefDescription": "ddr write-cycles event",
+           "EventCode": "0x2b",
+           "EventName": "imx8mn_ddr.write_cycles",
+           "Unit": "imx8_ddr",
+           "Compat": "i.MX8MN"
+   },
+   {
+           "BriefDescription": "ddr read event",
+           "EventCode": "0x35",
+           "EventName": "imx8mn_ddr.read",
+           "Unit": "imx8_ddr",
+           "Compat": "i.MX8MN"
+   },
+   {
+           "BriefDescription": "ddr write event",
+           "EventCode": "0x38",
+           "EventName": "imx8mn_ddr.write",
+           "Unit": "imx8_ddr",
+           "Compat": "i.MX8MN"
+   }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/freescale/imx8mn/sys/metrics.json b/tools/perf/pmu-events/arch/arm64/freescale/imx8mn/sys/metrics.json
new file mode 100644
index 0000000..2bbba4d
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/freescale/imx8mn/sys/metrics.json
@@ -0,0 +1,18 @@
+[
+   {
+	    "BriefDescription": "bytes all masters read from ddr based on read-cycles event",
+	    "MetricName": "imx8mn_ddr_read.all",
+	    "MetricExpr": "imx8mn_ddr.read_cycles * 4 * 2",
+	    "ScaleUnit": "9.765625e-4KB",
+	    "Unit": "imx8_ddr",
+	    "Compat": "i.MX8MN"
+   },
+   {
+	    "BriefDescription": "bytes all masters write to ddr based on write-cycles event",
+	    "MetricName": "imx8mn_ddr_write.all",
+	    "MetricExpr": "imx8mn_ddr.write_cycles * 4 * 2",
+	    "ScaleUnit": "9.765625e-4KB",
+	    "Unit": "imx8_ddr",
+	    "Compat": "i.MX8MN"
+   }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/freescale/imx8mp/sys/ddrc.json b/tools/perf/pmu-events/arch/arm64/freescale/imx8mp/sys/ddrc.json
new file mode 100644
index 0000000..f9a89ef
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/freescale/imx8mp/sys/ddrc.json
@@ -0,0 +1,37 @@
+[
+   {
+           "BriefDescription": "ddr cycles event",
+           "EventCode": "0x00",
+           "EventName": "imx8mp_ddr.cycles",
+           "Unit": "imx8_ddr",
+           "Compat": "i.MX8MP"
+   },
+   {
+           "BriefDescription": "ddr read-cycles event",
+           "EventCode": "0x2a",
+           "EventName": "imx8mp_ddr.read_cycles",
+           "Unit": "imx8_ddr",
+           "Compat": "i.MX8MP"
+   },
+   {
+           "BriefDescription": "ddr write-cycles event",
+           "EventCode": "0x2b",
+           "EventName": "imx8mp_ddr.write_cycles",
+           "Unit": "imx8_ddr",
+           "Compat": "i.MX8MP"
+   },
+   {
+           "BriefDescription": "ddr read event",
+           "EventCode": "0x35",
+           "EventName": "imx8mp_ddr.read",
+           "Unit": "imx8_ddr",
+           "Compat": "i.MX8MP"
+   },
+   {
+           "BriefDescription": "ddr write event",
+           "EventCode": "0x38",
+           "EventName": "imx8mp_ddr.write",
+           "Unit": "imx8_ddr",
+           "Compat": "i.MX8MP"
+   }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/freescale/imx8mp/sys/metrics.json b/tools/perf/pmu-events/arch/arm64/freescale/imx8mp/sys/metrics.json
new file mode 100644
index 0000000..8b95444
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/freescale/imx8mp/sys/metrics.json
@@ -0,0 +1,466 @@
+[
+   {
+	    "BriefDescription": "bytes of all masters read from ddr",
+	    "MetricName": "imx8mp_ddr_read.all",
+	    "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0xffff\\,axi_id\\=0x0000@",
+	    "ScaleUnit": "9.765625e-4KB",
+	    "Unit": "imx8_ddr",
+	    "Compat": "i.MX8MP"
+   },
+   {
+	    "BriefDescription": "bytes of all masters write to ddr",
+	    "MetricName": "imx8mp_ddr_write.all",
+	    "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0xffff\\,axi_id\\=0x0000@",
+	    "ScaleUnit": "9.765625e-4KB",
+	    "Unit": "imx8_ddr",
+	    "Compat": "i.MX8MP"
+   },
+   {
+	    "BriefDescription": "bytes of a53 core read from ddr",
+	    "MetricName": "imx8mp_ddr_read.a53",
+	    "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x0000@",
+	    "ScaleUnit": "9.765625e-4KB",
+	    "Unit": "imx8_ddr",
+	    "Compat": "i.MX8MP"
+   },
+   {
+	    "BriefDescription": "bytes of a53 core write to ddr",
+	    "MetricName": "imx8mp_ddr_write.a53",
+	    "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x0000@",
+	    "ScaleUnit": "9.765625e-4KB",
+	    "Unit": "imx8_ddr",
+	    "Compat": "i.MX8MP"
+   },
+   {
+	    "BriefDescription": "bytes of supermix(m7) core read from ddr",
+	    "MetricName": "imx8mp_ddr_read.supermix",
+	    "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x000f\\,axi_id\\=0x0020@",
+	    "ScaleUnit": "9.765625e-4KB",
+	    "Unit": "imx8_ddr",
+	    "Compat": "i.MX8MP"
+   },
+   {
+	    "BriefDescription": "bytes of supermix(m7) write to ddr",
+	    "MetricName": "imx8mp_ddr_write.supermix",
+	    "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x000f\\,axi_id\\=0x0020@",
+	    "ScaleUnit": "9.765625e-4KB",
+	    "Unit": "imx8_ddr",
+	    "Compat": "i.MX8MP"
+   },
+   {
+	    "BriefDescription": "bytes of gpu 3d read from ddr",
+	    "MetricName": "imx8mp_ddr_read.3d",
+	    "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x0070@",
+	    "ScaleUnit": "9.765625e-4KB",
+	    "Unit": "imx8_ddr",
+	    "Compat": "i.MX8MP"
+   },
+   {
+	    "BriefDescription": "bytes of gpu 3d write to ddr",
+	    "MetricName": "imx8mp_ddr_write.3d",
+	    "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x0070@",
+	    "ScaleUnit": "9.765625e-4KB",
+	    "Unit": "imx8_ddr",
+	    "Compat": "i.MX8MP"
+   },
+   {
+	    "BriefDescription": "bytes of gpu 2d read from ddr",
+	    "MetricName": "imx8mp_ddr_read.2d",
+	    "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x0071@",
+	    "ScaleUnit": "9.765625e-4KB",
+	    "Unit": "imx8_ddr",
+	    "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of gpu 2d write to ddr",
+	   "MetricName": "imx8mp_ddr_write.2d",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x0071@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of display lcdif1 read from ddr",
+	   "MetricName": "imx8mp_ddr_read.lcdif1",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x0068@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of display lcdif1 write to ddr",
+	   "MetricName": "imx8mp_ddr_write.lcdif1",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x0068@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of display lcdif2 read from ddr",
+	   "MetricName": "imx8mp_ddr_read.lcdif2",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x0069@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of display lcdif2 write to ddr",
+	   "MetricName": "imx8mp_ddr_write.lcdif2",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x0069@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of display isi1 read from ddr",
+	   "MetricName": "imx8mp_ddr_read.isi1",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x006a@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	    "BriefDescription": "bytes of display isi1 write to ddr",
+	    "MetricName": "imx8mp_ddr_write.isi1",
+	    "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x006a@",
+	    "ScaleUnit": "9.765625e-4KB",
+	    "Unit": "imx8_ddr",
+	    "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of display isi2 read from ddr",
+	   "MetricName": "imx8mp_ddr_read.isi2",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x006b@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of display isi2 write to ddr",
+	   "MetricName": "imx8mp_ddr_write.isi2",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x006b@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of display isi3 read from ddr",
+	   "MetricName": "imx8mp_ddr_read.isi3",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x006c@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of display isi3 write to ddr",
+	   "MetricName": "imx8mp_ddr_write.isi3",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x006c@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of display isp1 read from ddr",
+	   "MetricName": "imx8mp_ddr_read.isp1",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x006d@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of display isp1 write to ddr",
+	   "MetricName": "imx8mp_ddr_write.isp1",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x006d@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of display isp2 read from ddr",
+	   "MetricName": "imx8mp_ddr_read.isp2",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x006e@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of display isp2 write to ddr",
+	   "MetricName": "imx8mp_ddr_write.isp2",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x006e@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of display dewarp read from ddr",
+	   "MetricName": "imx8mp_ddr_read.dewarp",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x006f@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of display dewarp write to ddr",
+	   "MetricName": "imx8mp_ddr_write.dewarp",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x006f@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of vpu1 read from ddr",
+	   "MetricName": "imx8mp_ddr_read.vpu1",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x007c@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of vpu1 write to ddr",
+	   "MetricName": "imx8mp_ddr_write.vpu1",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x007c@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of vpu2 read from ddr",
+	   "MetricName": "imx8mp_ddr_read.vpu2",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x007d@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of vpu2 write to ddr",
+	   "MetricName": "imx8mp_ddr_write.vpu2",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x007d@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of vpu3 read from ddr",
+	   "MetricName": "imx8mp_ddr_read.vpu3",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x007e@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of vpu3 write to ddr",
+	   "MetricName": "imx8mp_ddr_write.vpu3",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x007e@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of npu read from ddr",
+	   "MetricName": "imx8mp_ddr_read.npu",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x0073@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of npu write to ddr",
+	   "MetricName": "imx8mp_ddr_write.npu",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x0073@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of hsio usb1 read from ddr",
+	   "MetricName": "imx8mp_ddr_read.usb1",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x0078@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of hsio usb1 write to ddr",
+	   "MetricName": "imx8mp_ddr_write.usb1",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x0078@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of hsio usb2 read from ddr",
+	   "MetricName": "imx8mp_ddr_read.usb2",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x0079@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of hsio usb2 write to ddr",
+	   "MetricName": "imx8mp_ddr_write.usb2",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x0079@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of hsio pci read from ddr",
+	   "MetricName": "imx8mp_ddr_read.pci",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x007a@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of hsio pci write to ddr",
+	   "MetricName": "imx8mp_ddr_write.pci",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x007a@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of hdmi_tx hrv_mwr read from ddr",
+	   "MetricName": "imx8mp_ddr_read.hdmi_hrv_mwr",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x0074@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of hdmi_tx hrv_mwr write to ddr",
+	   "MetricName": "imx8mp_ddr_write.hdmi_hrv_mwr",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x0074@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of hdmi_tx lcdif read from ddr",
+	   "MetricName": "imx8mp_ddr_read.hdmi_lcdif",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x0075@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of hdmi_tx lcdif write to ddr",
+	   "MetricName": "imx8mp_ddr_write.hdmi_lcdif",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x0075@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of hdmi_tx tx_hdcp read from ddr",
+	   "MetricName": "imx8mp_ddr_read.hdmi_hdcp",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x0076@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of hdmi_tx tx_hdcp write to ddr",
+	   "MetricName": "imx8mp_ddr_write.hdmi_hdcp",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x0076@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of audio dsp read from ddr",
+	   "MetricName": "imx8mp_ddr_read.audio_dsp",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x0041@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of audio dsp write to ddr",
+	   "MetricName": "imx8mp_ddr_write.audio_dsp",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x0041@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of audio sdma2_per read from ddr",
+	   "MetricName": "imx8mp_ddr_read.audio_sdma2_per",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x0062@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of audio sdma2_per write to ddr",
+	   "MetricName": "imx8mp_ddr_write.audio_sdma2_per",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x0062@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of audio sdma2_burst read from ddr",
+	   "MetricName": "imx8mp_ddr_read.audio_sdma2_burst",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x0063@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of audio sdma2_burst write to ddr",
+	   "MetricName": "imx8mp_ddr_write.audio_sdma2_burst",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x0063@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of audio sdma3_per read from ddr",
+	   "MetricName": "imx8mp_ddr_read.audio_sdma3_per",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x0064@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of audio sdma3_per write to ddr",
+	   "MetricName": "imx8mp_ddr_write.audio_sdma3_per",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x0064@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of audio sdma3_burst read from ddr",
+	   "MetricName": "imx8mp_ddr_read.audio_sdma3_burst",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x0065@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of audio sdma3_burst write to ddr",
+	   "MetricName": "imx8mp_ddr_write.audio_sdma3_burst",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x0065@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of audio sdma_pif read from ddr",
+	   "MetricName": "imx8mp_ddr_read.audio_sdma_pif",
+	   "MetricExpr": "imx8_ddr0@axid\\-read\\,axi_mask\\=0x0000\\,axi_id\\=0x0066@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   },
+   {
+	   "BriefDescription": "bytes of audio sdma_pif write to ddr",
+	   "MetricName": "imx8mp_ddr_write.audio_sdma_pif",
+	   "MetricExpr": "imx8_ddr0@axid\\-write\\,axi_mask\\=0x0000\\,axi_id\\=0x0066@",
+	   "ScaleUnit": "9.765625e-4KB",
+	   "Unit": "imx8_ddr",
+	   "Compat": "i.MX8MP"
+   }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/freescale/imx8mq/sys/ddrc.json b/tools/perf/pmu-events/arch/arm64/freescale/imx8mq/sys/ddrc.json
new file mode 100644
index 0000000..c868272
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/freescale/imx8mq/sys/ddrc.json
@@ -0,0 +1,37 @@
+[
+   {
+           "BriefDescription": "ddr cycles event",
+           "EventCode": "0x00",
+           "EventName": "imx8mq_ddr.cycles",
+           "Unit": "imx8_ddr",
+           "Compat": "i.MX8MQ"
+   },
+   {
+           "BriefDescription": "ddr read-cycles event",
+           "EventCode": "0x2a",
+           "EventName": "imx8mq_ddr.read_cycles",
+           "Unit": "imx8_ddr",
+           "Compat": "i.MX8MQ"
+   },
+   {
+           "BriefDescription": "ddr write-cycles event",
+           "EventCode": "0x2b",
+           "EventName": "imx8mq_ddr.write_cycles",
+           "Unit": "imx8_ddr",
+           "Compat": "i.MX8MQ"
+   },
+   {
+           "BriefDescription": "ddr read event",
+           "EventCode": "0x35",
+           "EventName": "imx8mq_ddr.read",
+           "Unit": "imx8_ddr",
+           "Compat": "i.MX8MQ"
+   },
+   {
+           "BriefDescription": "ddr write event",
+           "EventCode": "0x38",
+           "EventName": "imx8mq_ddr.write",
+           "Unit": "imx8_ddr",
+           "Compat": "i.MX8MQ"
+   }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/freescale/imx8mq/sys/metrics.json b/tools/perf/pmu-events/arch/arm64/freescale/imx8mq/sys/metrics.json
new file mode 100644
index 0000000..862c981
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/freescale/imx8mq/sys/metrics.json
@@ -0,0 +1,18 @@
+[
+   {
+	    "BriefDescription": "bytes all masters read from ddr based on read-cycles event",
+	    "MetricName": "imx8mq_ddr_read.all",
+	    "MetricExpr": "imx8mq_ddr.read_cycles * 4 * 4",
+	    "ScaleUnit": "9.765625e-4KB",
+	    "Unit": "imx8_ddr",
+	    "Compat": "i.MX8MQ"
+   },
+   {
+	    "BriefDescription": "bytes all masters write to ddr based on write-cycles event",
+	    "MetricName": "imx8mq_ddr_write.all",
+	    "MetricExpr": "imx8mq_ddr.write_cycles * 4 * 4",
+	    "ScaleUnit": "9.765625e-4KB",
+	    "Unit": "imx8_ddr",
+	    "Compat": "i.MX8MQ"
+    }
+]
diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index aa4dc4f..650aec19 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -58,6 +58,7 @@
 perf-y += genelf.o
 perf-y += api-io.o
 perf-y += demangle-java-test.o
+perf-y += demangle-ocaml-test.o
 perf-y += pfm.o
 perf-y += parse-metric.o
 perf-y += pe-file-parsing.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 7273823..c4b888f 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -339,6 +339,10 @@
 		.func = test__demangle_java,
 	},
 	{
+		.desc = "Demangle OCaml",
+		.func = test__demangle_ocaml,
+	},
+	{
 		.desc = "Parse and process metrics",
 		.func = test__parse_metric,
 	},
diff --git a/tools/perf/tests/code-reading.c b/tools/perf/tests/code-reading.c
index 7c098d4..280f034 100644
--- a/tools/perf/tests/code-reading.c
+++ b/tools/perf/tests/code-reading.c
@@ -26,6 +26,7 @@
 #include "event.h"
 #include "record.h"
 #include "util/mmap.h"
+#include "util/string2.h"
 #include "util/synthetic-events.h"
 #include "thread.h"
 
@@ -41,15 +42,6 @@
 	size_t done_cnt;
 };
 
-static unsigned int hex(char c)
-{
-	if (c >= '0' && c <= '9')
-		return c - '0';
-	if (c >= 'a' && c <= 'f')
-		return c - 'a' + 10;
-	return c - 'A' + 10;
-}
-
 static size_t read_objdump_chunk(const char **line, unsigned char **buf,
 				 size_t *buf_len)
 {
diff --git a/tools/perf/tests/demangle-ocaml-test.c b/tools/perf/tests/demangle-ocaml-test.c
new file mode 100644
index 0000000..a273ed5
--- /dev/null
+++ b/tools/perf/tests/demangle-ocaml-test.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <string.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include "tests.h"
+#include "session.h"
+#include "debug.h"
+#include "demangle-ocaml.h"
+
+int test__demangle_ocaml(struct test *test __maybe_unused, int subtest __maybe_unused)
+{
+	int ret = TEST_OK;
+	char *buf = NULL;
+	size_t i;
+
+	struct {
+		const char *mangled, *demangled;
+	} test_cases[] = {
+		{ "main",
+		  NULL },
+		{ "camlStdlib__array__map_154",
+		  "Stdlib.array.map" },
+		{ "camlStdlib__anon_fn$5bstdlib$2eml$3a334$2c0$2d$2d54$5d_1453",
+		  "Stdlib.anon_fn[stdlib.ml:334,0--54]" },
+		{ "camlStdlib__bytes__$2b$2b_2205",
+		  "Stdlib.bytes.++" },
+	};
+
+	for (i = 0; i < sizeof(test_cases) / sizeof(test_cases[0]); i++) {
+		buf = ocaml_demangle_sym(test_cases[i].mangled);
+		if ((buf == NULL && test_cases[i].demangled != NULL)
+				|| (buf != NULL && test_cases[i].demangled == NULL)
+				|| (buf != NULL && strcmp(buf, test_cases[i].demangled))) {
+			pr_debug("FAILED: %s: %s != %s\n", test_cases[i].mangled,
+				 buf == NULL ? "(null)" : buf,
+				 test_cases[i].demangled == NULL ? "(null)" : test_cases[i].demangled);
+			ret = TEST_FAIL;
+		}
+		free(buf);
+	}
+
+	return ret;
+}
diff --git a/tools/perf/tests/openat-syscall-all-cpus.c b/tools/perf/tests/openat-syscall-all-cpus.c
index 71f85e2..f7dd6c46 100644
--- a/tools/perf/tests/openat-syscall-all-cpus.c
+++ b/tools/perf/tests/openat-syscall-all-cpus.c
@@ -15,7 +15,6 @@
 #include "tests.h"
 #include "thread_map.h"
 #include <perf/cpumap.h>
-#include <internal/cpumap.h>
 #include "debug.h"
 #include "stat.h"
 #include "util/counts.h"
diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c
index ce7be37..6dc1db1 100644
--- a/tools/perf/tests/parse-metric.c
+++ b/tools/perf/tests/parse-metric.c
@@ -70,6 +70,10 @@
 	.metric_name	= "M3",
 },
 {
+	.metric_expr	= "64 * l1d.replacement / 1000000000 / duration_time",
+	.metric_name	= "L1D_Cache_Fill_BW",
+},
+{
 	.name	= NULL,
 }
 };
@@ -107,6 +111,8 @@
 	evlist__for_each_entry(evlist, evsel) {
 		count = find_value(evsel->name, vals);
 		perf_stat__update_shadow_stats(evsel, count, 0, st);
+		if (!strcmp(evsel->name, "duration_time"))
+			update_stats(&walltime_nsecs_stats, count);
 	}
 }
 
@@ -321,6 +327,23 @@
 	return 0;
 }
 
+static int test_memory_bandwidth(void)
+{
+	double ratio;
+	struct value vals[] = {
+		{ .event = "l1d.replacement", .val = 4000000 },
+		{ .event = "duration_time",  .val = 200000000 },
+		{ .event = NULL, },
+	};
+
+	TEST_ASSERT_VAL("failed to compute metric",
+			compute_metric("L1D_Cache_Fill_BW", vals, &ratio) == 0);
+	TEST_ASSERT_VAL("L1D_Cache_Fill_BW, wrong ratio",
+			1.28 == ratio);
+
+	return 0;
+}
+
 static int test_metric_group(void)
 {
 	double ratio1, ratio2;
@@ -353,5 +376,6 @@
 	TEST_ASSERT_VAL("DCache_L2 failed", test_dcache_l2() == 0);
 	TEST_ASSERT_VAL("recursion fail failed", test_recursion_fail() == 0);
 	TEST_ASSERT_VAL("test metric group", test_metric_group() == 0);
+	TEST_ASSERT_VAL("Memory bandwidth", test_memory_bandwidth() == 0);
 	return 0;
 }
diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
index 2393916..0dbe3aa 100644
--- a/tools/perf/tests/sample-parsing.c
+++ b/tools/perf/tests/sample-parsing.c
@@ -129,6 +129,9 @@
 	if (type & PERF_SAMPLE_WEIGHT)
 		COMP(weight);
 
+	if (type & PERF_SAMPLE_WEIGHT_STRUCT)
+		COMP(ins_lat);
+
 	if (type & PERF_SAMPLE_DATA_SRC)
 		COMP(data_src);
 
@@ -157,6 +160,9 @@
 	if (type & PERF_SAMPLE_DATA_PAGE_SIZE)
 		COMP(data_page_size);
 
+	if (type & PERF_SAMPLE_CODE_PAGE_SIZE)
+		COMP(code_page_size);
+
 	if (type & PERF_SAMPLE_AUX) {
 		COMP(aux_sample.size);
 		if (memcmp(s1->aux_sample.data, s2->aux_sample.data,
@@ -196,7 +202,7 @@
 		.data = {1, -1ULL, 211, 212, 213},
 	};
 	u64 regs[64];
-	const u64 raw_data[] = {0x123456780a0b0c0dULL, 0x1102030405060708ULL};
+	const u32 raw_data[] = {0x12345678, 0x0a0b0c0d, 0x11020304, 0x05060708, 0 };
 	const u64 data[] = {0x2211443366558877ULL, 0, 0xaabbccddeeff4321ULL};
 	const u64 aux_data[] = {0xa55a, 0, 0xeeddee, 0x0282028202820282};
 	struct perf_sample sample = {
@@ -238,6 +244,8 @@
 		.phys_addr	= 113,
 		.cgroup		= 114,
 		.data_page_size = 115,
+		.code_page_size = 116,
+		.ins_lat        = 117,
 		.aux_sample	= {
 			.size	= sizeof(aux_data),
 			.data	= (void *)aux_data,
@@ -344,7 +352,7 @@
 	 * were added.  Please actually update the test rather than just change
 	 * the condition below.
 	 */
-	if (PERF_SAMPLE_MAX > PERF_SAMPLE_CODE_PAGE_SIZE << 1) {
+	if (PERF_SAMPLE_MAX > PERF_SAMPLE_WEIGHT_STRUCT << 1) {
 		pr_debug("sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating\n");
 		return -1;
 	}
@@ -374,8 +382,12 @@
 			return err;
 	}
 
-	/* Test all sample format bits together */
-	sample_type = PERF_SAMPLE_MAX - 1;
+	/*
+	 * Test all sample format bits together
+	 * Note: PERF_SAMPLE_WEIGHT and PERF_SAMPLE_WEIGHT_STRUCT cannot
+	 *       be set simultaneously.
+	 */
+	sample_type = (PERF_SAMPLE_MAX - 1) & ~PERF_SAMPLE_WEIGHT;
 	sample_regs = 0x3fff; /* shared yb intr and user regs */
 	for (i = 0; i < ARRAY_SIZE(rf); i++) {
 		err = do_test(sample_type, sample_regs, rf[i]);
diff --git a/tools/perf/tests/shell/buildid.sh b/tools/perf/tests/shell/buildid.sh
index 4861a20..416af61 100755
--- a/tools/perf/tests/shell/buildid.sh
+++ b/tools/perf/tests/shell/buildid.sh
@@ -50,6 +50,12 @@
 		exit 1
 	fi
 
+	${perf} buildid-cache -l | grep $id
+	if [ $? -ne 0 ]; then
+		echo "failed: ${id} is not reported by \"perf buildid-cache -l\""
+		exit 1
+	fi
+
 	echo "OK for ${1}"
 }
 
diff --git a/tools/perf/tests/shell/daemon.sh b/tools/perf/tests/shell/daemon.sh
new file mode 100755
index 0000000..e5b824d
--- /dev/null
+++ b/tools/perf/tests/shell/daemon.sh
@@ -0,0 +1,475 @@
+#!/bin/sh
+# daemon operations
+# SPDX-License-Identifier: GPL-2.0
+
+check_line_first()
+{
+	local line=$1
+	local name=$2
+	local base=$3
+	local output=$4
+	local lock=$5
+	local up=$6
+
+	local line_name=`echo "${line}" | awk 'BEGIN { FS = ":" } ; { print $2 }'`
+	local line_base=`echo "${line}" | awk 'BEGIN { FS = ":" } ; { print $3 }'`
+	local line_output=`echo "${line}" | awk 'BEGIN { FS = ":" } ; { print $4 }'`
+	local line_lock=`echo "${line}" | awk 'BEGIN { FS = ":" } ; { print $5 }'`
+	local line_up=`echo "${line}" | awk 'BEGIN { FS = ":" } ; { print $6 }'`
+
+	if [ "${name}" != "${line_name}" ]; then
+		echo "FAILED: wrong name"
+		error=1
+	fi
+
+	if [ "${base}" != "${line_base}" ]; then
+		echo "FAILED: wrong base"
+		error=1
+	fi
+
+	if [ "${output}" != "${line_output}" ]; then
+		echo "FAILED: wrong output"
+		error=1
+	fi
+
+	if [ "${lock}" != "${line_lock}" ]; then
+		echo "FAILED: wrong lock"
+		error=1
+	fi
+
+	if [ "${up}" != "${line_up}" ]; then
+		echo "FAILED: wrong up"
+		error=1
+	fi
+}
+
+check_line_other()
+{
+	local line=$1
+	local name=$2
+	local run=$3
+	local base=$4
+	local output=$5
+	local control=$6
+	local ack=$7
+	local up=$8
+
+	local line_name=`echo "${line}" | awk 'BEGIN { FS = ":" } ; { print $2 }'`
+	local line_run=`echo "${line}" | awk 'BEGIN { FS = ":" } ; { print $3 }'`
+	local line_base=`echo "${line}" | awk 'BEGIN { FS = ":" } ; { print $4 }'`
+	local line_output=`echo "${line}" | awk 'BEGIN { FS = ":" } ; { print $5 }'`
+	local line_control=`echo "${line}" | awk 'BEGIN { FS = ":" } ; { print $6 }'`
+	local line_ack=`echo "${line}" | awk 'BEGIN { FS = ":" } ; { print $7 }'`
+	local line_up=`echo "${line}" | awk 'BEGIN { FS = ":" } ; { print $8 }'`
+
+	if [ "${name}" != "${line_name}" ]; then
+		echo "FAILED: wrong name"
+		error=1
+	fi
+
+	if [ "${run}" != "${line_run}" ]; then
+		echo "FAILED: wrong run"
+		error=1
+	fi
+
+	if [ "${base}" != "${line_base}" ]; then
+		echo "FAILED: wrong base"
+		error=1
+	fi
+
+	if [ "${output}" != "${line_output}" ]; then
+		echo "FAILED: wrong output"
+		error=1
+	fi
+
+	if [ "${control}" != "${line_control}" ]; then
+		echo "FAILED: wrong control"
+		error=1
+	fi
+
+	if [ "${ack}" != "${line_ack}" ]; then
+		echo "FAILED: wrong ack"
+		error=1
+	fi
+
+	if [ "${up}" != "${line_up}" ]; then
+		echo "FAILED: wrong up"
+		error=1
+	fi
+}
+
+daemon_start()
+{
+	local config=$1
+	local session=$2
+
+	perf daemon start --config ${config}
+
+	# wait for the session to ping
+	local state="FAIL"
+	while [ "${state}" != "OK" ]; do
+		state=`perf daemon ping --config ${config} --session ${session} | awk '{ print $1 }'`
+		sleep 0.05
+	done
+}
+
+daemon_exit()
+{
+	local base=$1
+	local config=$2
+
+	local line=`perf daemon --config ${config} -x: | head -1`
+	local pid=`echo "${line}" | awk 'BEGIN { FS = ":" } ; { print $1 }'`
+
+	# stop daemon
+	perf daemon stop --config ${config}
+
+	# ... and wait for the pid to go away
+	tail --pid=${pid} -f /dev/null
+}
+
+test_list()
+{
+	echo "test daemon list"
+
+	local config=$(mktemp /tmp/perf.daemon.config.XXX)
+	local base=$(mktemp -d /tmp/perf.daemon.base.XXX)
+
+	cat <<EOF > ${config}
+[daemon]
+base=BASE
+
+[session-size]
+run = -e cpu-clock
+
+[session-time]
+run = -e task-clock
+EOF
+
+	sed -i -e "s|BASE|${base}|" ${config}
+
+	# start daemon
+	daemon_start ${config} size
+
+	# check first line
+	# pid:daemon:base:base/output:base/lock
+	local line=`perf daemon --config ${config} -x: | head -1`
+	check_line_first ${line} daemon ${base} ${base}/output ${base}/lock "0"
+
+	# check 1st session
+	# pid:size:-e cpu-clock:base/size:base/size/output:base/size/control:base/size/ack:0
+	local line=`perf daemon --config ${config} -x: | head -2 | tail -1`
+	check_line_other "${line}" size "-e cpu-clock" ${base}/session-size \
+			 ${base}/session-size/output ${base}/session-size/control \
+			 ${base}/session-size/ack "0"
+
+	# check 2nd session
+	# pid:time:-e task-clock:base/time:base/time/output:base/time/control:base/time/ack:0
+	local line=`perf daemon --config ${config} -x: | head -3 | tail -1`
+	check_line_other "${line}" time "-e task-clock" ${base}/session-time \
+			 ${base}/session-time/output ${base}/session-time/control \
+			 ${base}/session-time/ack "0"
+
+	# stop daemon
+	daemon_exit ${base} ${config}
+
+	rm -rf ${base}
+	rm -f ${config}
+}
+
+test_reconfig()
+{
+	echo "test daemon reconfig"
+
+	local config=$(mktemp /tmp/perf.daemon.config.XXX)
+	local base=$(mktemp -d /tmp/perf.daemon.base.XXX)
+
+	# prepare config
+	cat <<EOF > ${config}
+[daemon]
+base=BASE
+
+[session-size]
+run = -e cpu-clock
+
+[session-time]
+run = -e task-clock
+EOF
+
+	sed -i -e "s|BASE|${base}|" ${config}
+
+	# start daemon
+	daemon_start ${config} size
+
+	# check 2nd session
+	# pid:time:-e task-clock:base/time:base/time/output:base/time/control:base/time/ack:0
+	local line=`perf daemon --config ${config} -x: | head -3 | tail -1`
+	check_line_other "${line}" time "-e task-clock" ${base}/session-time \
+			 ${base}/session-time/output ${base}/session-time/control ${base}/session-time/ack "0"
+	local pid=`echo "${line}" | awk 'BEGIN { FS = ":" } ; { print $1 }'`
+
+	# prepare new config
+	local config_new=${config}.new
+	cat <<EOF > ${config_new}
+[daemon]
+base=BASE
+
+[session-size]
+run = -e cpu-clock
+
+[session-time]
+run = -e cpu-clock
+EOF
+
+	# TEST 1 - change config
+
+	sed -i -e "s|BASE|${base}|" ${config_new}
+	cp ${config_new} ${config}
+
+	# wait for old session to finish
+	tail --pid=${pid} -f /dev/null
+
+	# wait for new one to start
+	local state="FAIL"
+	while [ "${state}" != "OK" ]; do
+		state=`perf daemon ping --config ${config} --session time | awk '{ print $1 }'`
+	done
+
+	# check reconfigured 2nd session
+	# pid:time:-e task-clock:base/time:base/time/output:base/time/control:base/time/ack:0
+	local line=`perf daemon --config ${config} -x: | head -3 | tail -1`
+	check_line_other "${line}" time "-e cpu-clock" ${base}/session-time \
+			 ${base}/session-time/output ${base}/session-time/control ${base}/session-time/ack "0"
+
+	# TEST 2 - empty config
+
+	local config_empty=${config}.empty
+	cat <<EOF > ${config_empty}
+[daemon]
+base=BASE
+EOF
+
+	# change config
+	sed -i -e "s|BASE|${base}|" ${config_empty}
+	cp ${config_empty} ${config}
+
+	# wait for sessions to finish
+	local state="OK"
+	while [ "${state}" != "FAIL" ]; do
+		state=`perf daemon ping --config ${config} --session time | awk '{ print $1 }'`
+	done
+
+	local state="OK"
+	while [ "${state}" != "FAIL" ]; do
+		state=`perf daemon ping --config ${config} --session size | awk '{ print $1 }'`
+	done
+
+	local one=`perf daemon --config ${config} -x: | wc -l`
+
+	if [ ${one} -ne "1" ]; then
+		echo "FAILED: wrong list output"
+		error=1
+	fi
+
+	# TEST 3 - config again
+
+	cp ${config_new} ${config}
+
+	# wait for size to start
+	local state="FAIL"
+	while [ "${state}" != "OK" ]; do
+		state=`perf daemon ping --config ${config} --session size | awk '{ print $1 }'`
+	done
+
+	# wait for time to start
+	local state="FAIL"
+	while [ "${state}" != "OK" ]; do
+		state=`perf daemon ping --config ${config} --session time | awk '{ print $1 }'`
+	done
+
+	# stop daemon
+	daemon_exit ${base} ${config}
+
+	rm -rf ${base}
+	rm -f ${config}
+	rm -f ${config_new}
+	rm -f ${config_empty}
+}
+
+test_stop()
+{
+	echo "test daemon stop"
+
+	local config=$(mktemp /tmp/perf.daemon.config.XXX)
+	local base=$(mktemp -d /tmp/perf.daemon.base.XXX)
+
+	# prepare config
+	cat <<EOF > ${config}
+[daemon]
+base=BASE
+
+[session-size]
+run = -e cpu-clock
+
+[session-time]
+run = -e task-clock
+EOF
+
+	sed -i -e "s|BASE|${base}|" ${config}
+
+	# start daemon
+	daemon_start ${config} size
+
+	local pid_size=`perf daemon --config ${config} -x: | head -2 | tail -1 | awk 'BEGIN { FS = ":" } ; { print $1 }'`
+	local pid_time=`perf daemon --config ${config} -x: | head -3 | tail -1 | awk 'BEGIN { FS = ":" } ; { print $1 }'`
+
+	# check that sessions are running
+	if [ ! -d "/proc/${pid_size}" ]; then
+		echo "FAILED: session size not up"
+	fi
+
+	if [ ! -d "/proc/${pid_time}" ]; then
+		echo "FAILED: session time not up"
+	fi
+
+	# stop daemon
+	daemon_exit ${base} ${config}
+
+	# check that sessions are gone
+	if [ -d "/proc/${pid_size}" ]; then
+		echo "FAILED: session size still up"
+	fi
+
+	if [ -d "/proc/${pid_time}" ]; then
+		echo "FAILED: session time still up"
+	fi
+
+	rm -rf ${base}
+	rm -f ${config}
+}
+
+test_signal()
+{
+	echo "test daemon signal"
+
+	local config=$(mktemp /tmp/perf.daemon.config.XXX)
+	local base=$(mktemp -d /tmp/perf.daemon.base.XXX)
+
+	# prepare config
+	cat <<EOF > ${config}
+[daemon]
+base=BASE
+
+[session-test]
+run = -e cpu-clock --switch-output
+EOF
+
+	sed -i -e "s|BASE|${base}|" ${config}
+
+	# start daemon
+	daemon_start ${config} test
+
+	# send 2 signals
+	perf daemon signal --config ${config} --session test
+	perf daemon signal --config ${config}
+
+	# stop daemon
+	daemon_exit ${base} ${config}
+
+	# count is 2 perf.data for signals and 1 for perf record finished
+	count=`ls ${base}/session-test/ | grep perf.data | wc -l`
+	if [ ${count} -ne 3 ]; then
+		error=1
+		echo "FAILED: perf data no generated"
+	fi
+
+	rm -rf ${base}
+	rm -f ${config}
+}
+
+test_ping()
+{
+	echo "test daemon ping"
+
+	local config=$(mktemp /tmp/perf.daemon.config.XXX)
+	local base=$(mktemp -d /tmp/perf.daemon.base.XXX)
+
+	# prepare config
+	cat <<EOF > ${config}
+[daemon]
+base=BASE
+
+[session-size]
+run = -e cpu-clock
+
+[session-time]
+run = -e task-clock
+EOF
+
+	sed -i -e "s|BASE|${base}|" ${config}
+
+	# start daemon
+	daemon_start ${config} size
+
+	size=`perf daemon ping --config ${config} --session size | awk '{ print $1 }'`
+	type=`perf daemon ping --config ${config} --session time | awk '{ print $1 }'`
+
+	if [ ${size} != "OK" -o ${type} != "OK" ]; then
+		error=1
+		echo "FAILED: daemon ping failed"
+	fi
+
+	# stop daemon
+	daemon_exit ${base} ${config}
+
+	rm -rf ${base}
+	rm -f ${config}
+}
+
+test_lock()
+{
+	echo "test daemon lock"
+
+	local config=$(mktemp /tmp/perf.daemon.config.XXX)
+	local base=$(mktemp -d /tmp/perf.daemon.base.XXX)
+
+	# prepare config
+	cat <<EOF > ${config}
+[daemon]
+base=BASE
+
+[session-size]
+run = -e cpu-clock
+EOF
+
+	sed -i -e "s|BASE|${base}|" ${config}
+
+	# start daemon
+	daemon_start ${config} size
+
+	# start second daemon over the same config/base
+	failed=`perf daemon start --config ${config} 2>&1 | awk '{ print $1 }'`
+
+	# check that we failed properly
+	if [ ${failed} != "failed:" ]; then
+		error=1
+		echo "FAILED: daemon lock failed"
+	fi
+
+	# stop daemon
+	daemon_exit ${base} ${config}
+
+	rm -rf ${base}
+	rm -f ${config}
+}
+
+error=0
+
+test_list
+test_reconfig
+test_stop
+test_signal
+test_ping
+test_lock
+
+exit ${error}
diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
index 18fde2f..c9eef0b 100755
--- a/tools/perf/tests/shell/test_arm_coresight.sh
+++ b/tools/perf/tests/shell/test_arm_coresight.sh
@@ -11,6 +11,7 @@
 
 perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
 file=$(mktemp /tmp/temporary_file.XXXXX)
+glb_err=0
 
 skip_if_no_cs_etm_event() {
 	perf list | grep -q 'cs_etm//' && return 0
@@ -33,7 +34,7 @@
 	echo "Recording trace (only user mode) with path: CPU$2 => $1"
 	rm -f $file
 	perf record -o ${perfdata} -e cs_etm/@$1/u --per-thread \
-		-- taskset -c $2 touch $file
+		-- taskset -c $2 touch $file > /dev/null 2>&1
 }
 
 perf_script_branch_samples() {
@@ -43,8 +44,8 @@
 	#   touch  6512          1         branches:u:      ffffb220824c strcmp+0xc (/lib/aarch64-linux-gnu/ld-2.27.so)
 	#   touch  6512          1         branches:u:      ffffb22082e0 strcmp+0xa0 (/lib/aarch64-linux-gnu/ld-2.27.so)
 	#   touch  6512          1         branches:u:      ffffb2208320 strcmp+0xe0 (/lib/aarch64-linux-gnu/ld-2.27.so)
-	perf script -F,-time -i ${perfdata} | \
-		egrep " +$1 +[0-9]+ .* +branches:(.*:)? +"
+	perf script -F,-time -i ${perfdata} 2>&1 | \
+		egrep " +$1 +[0-9]+ .* +branches:(.*:)? +" > /dev/null 2>&1
 }
 
 perf_report_branch_samples() {
@@ -54,8 +55,8 @@
 	#   73.04%    73.04%  touch    libc-2.27.so      [.] _dl_addr
 	#    7.71%     7.71%  touch    libc-2.27.so      [.] getenv
 	#    2.59%     2.59%  touch    ld-2.27.so        [.] strcmp
-	perf report --stdio -i ${perfdata} | \
-		egrep " +[0-9]+\.[0-9]+% +[0-9]+\.[0-9]+% +$1 "
+	perf report --stdio -i ${perfdata} 2>&1 | \
+		egrep " +[0-9]+\.[0-9]+% +[0-9]+\.[0-9]+% +$1 " > /dev/null 2>&1
 }
 
 perf_report_instruction_samples() {
@@ -65,8 +66,17 @@
 	#   68.12%  touch    libc-2.27.so   [.] _dl_addr
 	#    5.80%  touch    libc-2.27.so   [.] getenv
 	#    4.35%  touch    ld-2.27.so     [.] _dl_fixup
-	perf report --itrace=i1000i --stdio -i ${perfdata} | \
-		egrep " +[0-9]+\.[0-9]+% +$1"
+	perf report --itrace=i1000i --stdio -i ${perfdata} 2>&1 | \
+		egrep " +[0-9]+\.[0-9]+% +$1" > /dev/null 2>&1
+}
+
+arm_cs_report() {
+	if [ $2 != 0 ]; then
+		echo "$1: FAIL"
+		glb_err=$2
+	else
+		echo "$1: PASS"
+	fi
 }
 
 is_device_sink() {
@@ -113,9 +123,7 @@
 			perf_report_instruction_samples touch
 
 			err=$?
-
-			# Exit when find failure
-			[ $err != 0 ] && exit $err
+			arm_cs_report "CoreSight path testing (CPU$2 -> $device_name)" $err
 		fi
 
 		arm_cs_iterate_devices $dev $2
@@ -129,9 +137,6 @@
 		# Find the ETM device belonging to which CPU
 		cpu=`cat $dev/cpu`
 
-		echo $dev
-		echo $cpu
-
 		# Use depth-first search (DFS) to iterate outputs
 		arm_cs_iterate_devices $dev $cpu
 	done
@@ -139,22 +144,20 @@
 
 arm_cs_etm_system_wide_test() {
 	echo "Recording trace with system wide mode"
-	perf record -o ${perfdata} -e cs_etm// -a -- ls
+	perf record -o ${perfdata} -e cs_etm// -a -- ls > /dev/null 2>&1
 
 	perf_script_branch_samples perf &&
 	perf_report_branch_samples perf &&
 	perf_report_instruction_samples perf
 
 	err=$?
-
-	# Exit when find failure
-	[ $err != 0 ] && exit $err
+	arm_cs_report "CoreSight system wide testing" $err
 }
 
 arm_cs_etm_snapshot_test() {
 	echo "Recording trace with snapshot mode"
 	perf record -o ${perfdata} -e cs_etm// -S \
-		-- dd if=/dev/zero of=/dev/null &
+		-- dd if=/dev/zero of=/dev/null > /dev/null 2>&1 &
 	PERFPID=$!
 
 	# Wait for perf program
@@ -172,12 +175,10 @@
 	perf_report_instruction_samples dd
 
 	err=$?
-
-	# Exit when find failure
-	[ $err != 0 ] && exit $err
+	arm_cs_report "CoreSight snapshot testing" $err
 }
 
 arm_cs_etm_traverse_path_test
 arm_cs_etm_system_wide_test
 arm_cs_etm_snapshot_test
-exit 0
+exit $glb_err
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 8e24a61..b85f005 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -119,6 +119,7 @@
 int test__jit_write_elf(struct test *test, int subtest);
 int test__api_io(struct test *test, int subtest);
 int test__demangle_java(struct test *test, int subtest);
+int test__demangle_ocaml(struct test *test, int subtest);
 int test__pfm(struct test *test, int subtest);
 const char *test__pfm_subtest_get_desc(int subtest);
 int test__pfm_subtest_get_nr(void);
diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
index bd77825..35b82ca 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -759,7 +759,7 @@
 			continue;
 		case 'k':
 			notes->options->show_linenr = !notes->options->show_linenr;
-			break;
+			continue;
 		case 'H':
 			nd = browser->curr_hot;
 			break;
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index e2563d0..e3e12f9 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -135,6 +135,7 @@
 
 perf-$(CONFIG_LIBBPF) += bpf-loader.o
 perf-$(CONFIG_LIBBPF) += bpf_map.o
+perf-$(CONFIG_PERF_BPF_SKEL) += bpf_counter.o
 perf-$(CONFIG_BPF_PROLOGUE) += bpf-prologue.o
 perf-$(CONFIG_LIBELF) += symbol-elf.o
 perf-$(CONFIG_LIBELF) += probe-file.o
@@ -172,6 +173,7 @@
 
 perf-$(CONFIG_LIBCAP) += cap.o
 
+perf-y += demangle-ocaml.o
 perf-y += demangle-java.o
 perf-y += demangle-rust.o
 
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index ce8c07b..e60841b8 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -321,12 +321,18 @@
 /*
  * Prevents from matching commas in the comment section, e.g.:
  * ffff200008446e70:       b.cs    ffff2000084470f4 <generic_exec_single+0x314>  // b.hs, b.nlast
+ *
+ * and skip comma as part of function arguments, e.g.:
+ * 1d8b4ac <linemap_lookup(line_maps const*, unsigned int)+0xcc>
  */
 static inline const char *validate_comma(const char *c, struct ins_operands *ops)
 {
 	if (ops->raw_comment && c > ops->raw_comment)
 		return NULL;
 
+	if (ops->raw_func_start && c > ops->raw_func_start)
+		return NULL;
+
 	return c;
 }
 
@@ -341,6 +347,8 @@
 	u64 start, end;
 
 	ops->raw_comment = strchr(ops->raw, arch->objdump.comment_char);
+	ops->raw_func_start = strchr(ops->raw, '<');
+
 	c = validate_comma(c, ops);
 
 	/*
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 0a0cd4f..096cdaf 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -32,6 +32,7 @@
 struct ins_operands {
 	char	*raw;
 	char	*raw_comment;
+	char	*raw_func_start;
 	struct {
 		char	*raw;
 		char	*name;
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
index 90d575c..32fe4183 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -172,12 +172,22 @@
 				decoder->record.from_ip = ip;
 			else if (idx == SPE_ADDR_PKT_HDR_INDEX_BRANCH)
 				decoder->record.to_ip = ip;
+			else if (idx == SPE_ADDR_PKT_HDR_INDEX_DATA_VIRT)
+				decoder->record.virt_addr = ip;
+			else if (idx == SPE_ADDR_PKT_HDR_INDEX_DATA_PHYS)
+				decoder->record.phys_addr = ip;
 			break;
 		case ARM_SPE_COUNTER:
 			break;
 		case ARM_SPE_CONTEXT:
 			break;
 		case ARM_SPE_OP_TYPE:
+			if (idx == SPE_OP_PKT_HDR_CLASS_LD_ST_ATOMIC) {
+				if (payload & 0x1)
+					decoder->record.op = ARM_SPE_ST;
+				else
+					decoder->record.op = ARM_SPE_LD;
+			}
 			break;
 		case ARM_SPE_EVENTS:
 			if (payload & BIT(EV_L1D_REFILL))
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 24727b8..59bdb73 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -24,12 +24,20 @@
 	ARM_SPE_REMOTE_ACCESS	= 1 << 7,
 };
 
+enum arm_spe_op_type {
+	ARM_SPE_LD		= 1 << 0,
+	ARM_SPE_ST		= 1 << 1,
+};
+
 struct arm_spe_record {
 	enum arm_spe_sample_type type;
 	int err;
+	u32 op;
 	u64 from_ip;
 	u64 to_ip;
 	u64 timestamp;
+	u64 virt_addr;
+	u64 phys_addr;
 };
 
 struct arm_spe_insn;
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 8901a165..2539d4b 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -53,6 +53,7 @@
 	u8				sample_tlb;
 	u8				sample_branch;
 	u8				sample_remote_access;
+	u8				sample_memory;
 
 	u64				l1d_miss_id;
 	u64				l1d_access_id;
@@ -62,6 +63,7 @@
 	u64				tlb_access_id;
 	u64				branch_miss_id;
 	u64				remote_access_id;
+	u64				memory_id;
 
 	u64				kernel_start;
 
@@ -235,7 +237,6 @@
 	sample->cpumode = arm_spe_cpumode(spe, sample->ip);
 	sample->pid = speq->pid;
 	sample->tid = speq->tid;
-	sample->addr = record->to_ip;
 	sample->period = 1;
 	sample->cpu = speq->cpu;
 
@@ -259,11 +260,11 @@
 	return ret;
 }
 
-static int
-arm_spe_synth_spe_events_sample(struct arm_spe_queue *speq,
-				u64 spe_events_id)
+static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
+				     u64 spe_events_id, u64 data_src)
 {
 	struct arm_spe *spe = speq->spe;
+	struct arm_spe_record *record = &speq->decoder->record;
 	union perf_event *event = speq->event_buf;
 	struct perf_sample sample = { .ip = 0, };
 
@@ -271,27 +272,102 @@
 
 	sample.id = spe_events_id;
 	sample.stream_id = spe_events_id;
+	sample.addr = record->virt_addr;
+	sample.phys_addr = record->phys_addr;
+	sample.data_src = data_src;
 
 	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
 
+static int arm_spe__synth_branch_sample(struct arm_spe_queue *speq,
+					u64 spe_events_id)
+{
+	struct arm_spe *spe = speq->spe;
+	struct arm_spe_record *record = &speq->decoder->record;
+	union perf_event *event = speq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	arm_spe_prep_sample(spe, speq, event, &sample);
+
+	sample.id = spe_events_id;
+	sample.stream_id = spe_events_id;
+	sample.addr = record->to_ip;
+
+	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
+}
+
+#define SPE_MEM_TYPE	(ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS | \
+			 ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS | \
+			 ARM_SPE_REMOTE_ACCESS)
+
+static bool arm_spe__is_memory_event(enum arm_spe_sample_type type)
+{
+	if (type & SPE_MEM_TYPE)
+		return true;
+
+	return false;
+}
+
+static u64 arm_spe__synth_data_source(const struct arm_spe_record *record)
+{
+	union perf_mem_data_src	data_src = { 0 };
+
+	if (record->op == ARM_SPE_LD)
+		data_src.mem_op = PERF_MEM_OP_LOAD;
+	else
+		data_src.mem_op = PERF_MEM_OP_STORE;
+
+	if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
+		data_src.mem_lvl = PERF_MEM_LVL_L3;
+
+		if (record->type & ARM_SPE_LLC_MISS)
+			data_src.mem_lvl |= PERF_MEM_LVL_MISS;
+		else
+			data_src.mem_lvl |= PERF_MEM_LVL_HIT;
+	} else if (record->type & (ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS)) {
+		data_src.mem_lvl = PERF_MEM_LVL_L1;
+
+		if (record->type & ARM_SPE_L1D_MISS)
+			data_src.mem_lvl |= PERF_MEM_LVL_MISS;
+		else
+			data_src.mem_lvl |= PERF_MEM_LVL_HIT;
+	}
+
+	if (record->type & ARM_SPE_REMOTE_ACCESS)
+		data_src.mem_lvl |= PERF_MEM_LVL_REM_CCE1;
+
+	if (record->type & (ARM_SPE_TLB_ACCESS | ARM_SPE_TLB_MISS)) {
+		data_src.mem_dtlb = PERF_MEM_TLB_WK;
+
+		if (record->type & ARM_SPE_TLB_MISS)
+			data_src.mem_dtlb |= PERF_MEM_TLB_MISS;
+		else
+			data_src.mem_dtlb |= PERF_MEM_TLB_HIT;
+	}
+
+	return data_src.val;
+}
+
 static int arm_spe_sample(struct arm_spe_queue *speq)
 {
 	const struct arm_spe_record *record = &speq->decoder->record;
 	struct arm_spe *spe = speq->spe;
+	u64 data_src;
 	int err;
 
+	data_src = arm_spe__synth_data_source(record);
+
 	if (spe->sample_flc) {
 		if (record->type & ARM_SPE_L1D_MISS) {
-			err = arm_spe_synth_spe_events_sample(
-					speq, spe->l1d_miss_id);
+			err = arm_spe__synth_mem_sample(speq, spe->l1d_miss_id,
+							data_src);
 			if (err)
 				return err;
 		}
 
 		if (record->type & ARM_SPE_L1D_ACCESS) {
-			err = arm_spe_synth_spe_events_sample(
-					speq, spe->l1d_access_id);
+			err = arm_spe__synth_mem_sample(speq, spe->l1d_access_id,
+							data_src);
 			if (err)
 				return err;
 		}
@@ -299,15 +375,15 @@
 
 	if (spe->sample_llc) {
 		if (record->type & ARM_SPE_LLC_MISS) {
-			err = arm_spe_synth_spe_events_sample(
-					speq, spe->llc_miss_id);
+			err = arm_spe__synth_mem_sample(speq, spe->llc_miss_id,
+							data_src);
 			if (err)
 				return err;
 		}
 
 		if (record->type & ARM_SPE_LLC_ACCESS) {
-			err = arm_spe_synth_spe_events_sample(
-					speq, spe->llc_access_id);
+			err = arm_spe__synth_mem_sample(speq, spe->llc_access_id,
+							data_src);
 			if (err)
 				return err;
 		}
@@ -315,31 +391,36 @@
 
 	if (spe->sample_tlb) {
 		if (record->type & ARM_SPE_TLB_MISS) {
-			err = arm_spe_synth_spe_events_sample(
-					speq, spe->tlb_miss_id);
+			err = arm_spe__synth_mem_sample(speq, spe->tlb_miss_id,
+							data_src);
 			if (err)
 				return err;
 		}
 
 		if (record->type & ARM_SPE_TLB_ACCESS) {
-			err = arm_spe_synth_spe_events_sample(
-					speq, spe->tlb_access_id);
+			err = arm_spe__synth_mem_sample(speq, spe->tlb_access_id,
+							data_src);
 			if (err)
 				return err;
 		}
 	}
 
 	if (spe->sample_branch && (record->type & ARM_SPE_BRANCH_MISS)) {
-		err = arm_spe_synth_spe_events_sample(speq,
-						      spe->branch_miss_id);
+		err = arm_spe__synth_branch_sample(speq, spe->branch_miss_id);
 		if (err)
 			return err;
 	}
 
 	if (spe->sample_remote_access &&
 	    (record->type & ARM_SPE_REMOTE_ACCESS)) {
-		err = arm_spe_synth_spe_events_sample(speq,
-						      spe->remote_access_id);
+		err = arm_spe__synth_mem_sample(speq, spe->remote_access_id,
+						data_src);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_memory && arm_spe__is_memory_event(record->type)) {
+		err = arm_spe__synth_mem_sample(speq, spe->memory_id, data_src);
 		if (err)
 			return err;
 	}
@@ -803,7 +884,7 @@
 	attr.type = PERF_TYPE_HARDWARE;
 	attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
 	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
-		PERF_SAMPLE_PERIOD;
+			    PERF_SAMPLE_PERIOD | PERF_SAMPLE_DATA_SRC;
 	if (spe->timeless_decoding)
 		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
 	else
@@ -907,6 +988,16 @@
 		id += 1;
 	}
 
+	if (spe->synth_opts.mem) {
+		spe->sample_memory = true;
+
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->memory_id = id;
+		arm_spe_set_event_name(evlist, id, "memory");
+	}
+
 	return 0;
 }
 
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index a608784..953f4af 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -788,6 +788,21 @@
 	return auxtrace_validate_aux_sample_size(evlist, opts);
 }
 
+void auxtrace_regroup_aux_output(struct evlist *evlist)
+{
+	struct evsel *evsel, *aux_evsel = NULL;
+	struct evsel_config_term *term;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel__is_aux_event(evsel))
+			aux_evsel = evsel;
+		term = evsel__get_config_term(evsel, AUX_OUTPUT);
+		/* If possible, group with the AUX event */
+		if (term && aux_evsel)
+			evlist__regroup(evlist, aux_evsel, evsel);
+	}
+}
+
 struct auxtrace_record *__weak
 auxtrace_record__init(struct evlist *evlist __maybe_unused, int *err)
 {
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 7e5c9e1..a4fbb33 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -559,6 +559,7 @@
 int auxtrace_parse_sample_options(struct auxtrace_record *itr,
 				  struct evlist *evlist,
 				  struct record_opts *opts, const char *str);
+void auxtrace_regroup_aux_output(struct evlist *evlist);
 int auxtrace_record__options(struct auxtrace_record *itr,
 			     struct evlist *evlist,
 			     struct record_opts *opts);
@@ -741,6 +742,11 @@
 }
 
 static inline
+void auxtrace_regroup_aux_output(struct evlist *evlist __maybe_unused)
+{
+}
+
+static inline
 int auxtrace__process_event(struct perf_session *session __maybe_unused,
 			    union perf_event *event __maybe_unused,
 			    struct perf_sample *sample __maybe_unused,
diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
new file mode 100644
index 0000000..04f8912
--- /dev/null
+++ b/tools/perf/util/bpf_counter.c
@@ -0,0 +1,314 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* Copyright (c) 2019 Facebook */
+
+#include <assert.h>
+#include <limits.h>
+#include <unistd.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+#include <linux/err.h>
+#include <linux/zalloc.h>
+#include <bpf/bpf.h>
+#include <bpf/btf.h>
+#include <bpf/libbpf.h>
+
+#include "bpf_counter.h"
+#include "counts.h"
+#include "debug.h"
+#include "evsel.h"
+#include "target.h"
+
+#include "bpf_skel/bpf_prog_profiler.skel.h"
+
+static inline void *u64_to_ptr(__u64 ptr)
+{
+	return (void *)(unsigned long)ptr;
+}
+
+static void set_max_rlimit(void)
+{
+	struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
+
+	setrlimit(RLIMIT_MEMLOCK, &rinf);
+}
+
+static struct bpf_counter *bpf_counter_alloc(void)
+{
+	struct bpf_counter *counter;
+
+	counter = zalloc(sizeof(*counter));
+	if (counter)
+		INIT_LIST_HEAD(&counter->list);
+	return counter;
+}
+
+static int bpf_program_profiler__destroy(struct evsel *evsel)
+{
+	struct bpf_counter *counter, *tmp;
+
+	list_for_each_entry_safe(counter, tmp,
+				 &evsel->bpf_counter_list, list) {
+		list_del_init(&counter->list);
+		bpf_prog_profiler_bpf__destroy(counter->skel);
+		free(counter);
+	}
+	assert(list_empty(&evsel->bpf_counter_list));
+
+	return 0;
+}
+
+static char *bpf_target_prog_name(int tgt_fd)
+{
+	struct bpf_prog_info_linear *info_linear;
+	struct bpf_func_info *func_info;
+	const struct btf_type *t;
+	char *name = NULL;
+	struct btf *btf;
+
+	info_linear = bpf_program__get_prog_info_linear(
+		tgt_fd, 1UL << BPF_PROG_INFO_FUNC_INFO);
+	if (IS_ERR_OR_NULL(info_linear)) {
+		pr_debug("failed to get info_linear for prog FD %d\n", tgt_fd);
+		return NULL;
+	}
+
+	if (info_linear->info.btf_id == 0 ||
+	    btf__get_from_id(info_linear->info.btf_id, &btf)) {
+		pr_debug("prog FD %d doesn't have valid btf\n", tgt_fd);
+		goto out;
+	}
+
+	func_info = u64_to_ptr(info_linear->info.func_info);
+	t = btf__type_by_id(btf, func_info[0].type_id);
+	if (!t) {
+		pr_debug("btf %d doesn't have type %d\n",
+			 info_linear->info.btf_id, func_info[0].type_id);
+		goto out;
+	}
+	name = strdup(btf__name_by_offset(btf, t->name_off));
+out:
+	free(info_linear);
+	return name;
+}
+
+static int bpf_program_profiler_load_one(struct evsel *evsel, u32 prog_id)
+{
+	struct bpf_prog_profiler_bpf *skel;
+	struct bpf_counter *counter;
+	struct bpf_program *prog;
+	char *prog_name;
+	int prog_fd;
+	int err;
+
+	prog_fd = bpf_prog_get_fd_by_id(prog_id);
+	if (prog_fd < 0) {
+		pr_err("Failed to open fd for bpf prog %u\n", prog_id);
+		return -1;
+	}
+	counter = bpf_counter_alloc();
+	if (!counter) {
+		close(prog_fd);
+		return -1;
+	}
+
+	skel = bpf_prog_profiler_bpf__open();
+	if (!skel) {
+		pr_err("Failed to open bpf skeleton\n");
+		goto err_out;
+	}
+
+	skel->rodata->num_cpu = evsel__nr_cpus(evsel);
+
+	bpf_map__resize(skel->maps.events, evsel__nr_cpus(evsel));
+	bpf_map__resize(skel->maps.fentry_readings, 1);
+	bpf_map__resize(skel->maps.accum_readings, 1);
+
+	prog_name = bpf_target_prog_name(prog_fd);
+	if (!prog_name) {
+		pr_err("Failed to get program name for bpf prog %u. Does it have BTF?\n", prog_id);
+		goto err_out;
+	}
+
+	bpf_object__for_each_program(prog, skel->obj) {
+		err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
+		if (err) {
+			pr_err("bpf_program__set_attach_target failed.\n"
+			       "Does bpf prog %u have BTF?\n", prog_id);
+			goto err_out;
+		}
+	}
+	set_max_rlimit();
+	err = bpf_prog_profiler_bpf__load(skel);
+	if (err) {
+		pr_err("bpf_prog_profiler_bpf__load failed\n");
+		goto err_out;
+	}
+
+	assert(skel != NULL);
+	counter->skel = skel;
+	list_add(&counter->list, &evsel->bpf_counter_list);
+	close(prog_fd);
+	return 0;
+err_out:
+	bpf_prog_profiler_bpf__destroy(skel);
+	free(counter);
+	close(prog_fd);
+	return -1;
+}
+
+static int bpf_program_profiler__load(struct evsel *evsel, struct target *target)
+{
+	char *bpf_str, *bpf_str_, *tok, *saveptr = NULL, *p;
+	u32 prog_id;
+	int ret;
+
+	bpf_str_ = bpf_str = strdup(target->bpf_str);
+	if (!bpf_str)
+		return -1;
+
+	while ((tok = strtok_r(bpf_str, ",", &saveptr)) != NULL) {
+		prog_id = strtoul(tok, &p, 10);
+		if (prog_id == 0 || prog_id == UINT_MAX ||
+		    (*p != '\0' && *p != ',')) {
+			pr_err("Failed to parse bpf prog ids %s\n",
+			       target->bpf_str);
+			return -1;
+		}
+
+		ret = bpf_program_profiler_load_one(evsel, prog_id);
+		if (ret) {
+			bpf_program_profiler__destroy(evsel);
+			free(bpf_str_);
+			return -1;
+		}
+		bpf_str = NULL;
+	}
+	free(bpf_str_);
+	return 0;
+}
+
+static int bpf_program_profiler__enable(struct evsel *evsel)
+{
+	struct bpf_counter *counter;
+	int ret;
+
+	list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
+		assert(counter->skel != NULL);
+		ret = bpf_prog_profiler_bpf__attach(counter->skel);
+		if (ret) {
+			bpf_program_profiler__destroy(evsel);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static int bpf_program_profiler__read(struct evsel *evsel)
+{
+	// perf_cpu_map uses /sys/devices/system/cpu/online
+	int num_cpu = evsel__nr_cpus(evsel);
+	// BPF_MAP_TYPE_PERCPU_ARRAY uses /sys/devices/system/cpu/possible
+	// Sometimes possible > online, like on a Ryzen 3900X that has 24
+	// threads but its possible showed 0-31 -acme
+	int num_cpu_bpf = libbpf_num_possible_cpus();
+	struct bpf_perf_event_value values[num_cpu_bpf];
+	struct bpf_counter *counter;
+	int reading_map_fd;
+	__u32 key = 0;
+	int err, cpu;
+
+	if (list_empty(&evsel->bpf_counter_list))
+		return -EAGAIN;
+
+	for (cpu = 0; cpu < num_cpu; cpu++) {
+		perf_counts(evsel->counts, cpu, 0)->val = 0;
+		perf_counts(evsel->counts, cpu, 0)->ena = 0;
+		perf_counts(evsel->counts, cpu, 0)->run = 0;
+	}
+	list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
+		struct bpf_prog_profiler_bpf *skel = counter->skel;
+
+		assert(skel != NULL);
+		reading_map_fd = bpf_map__fd(skel->maps.accum_readings);
+
+		err = bpf_map_lookup_elem(reading_map_fd, &key, values);
+		if (err) {
+			pr_err("failed to read value\n");
+			return err;
+		}
+
+		for (cpu = 0; cpu < num_cpu; cpu++) {
+			perf_counts(evsel->counts, cpu, 0)->val += values[cpu].counter;
+			perf_counts(evsel->counts, cpu, 0)->ena += values[cpu].enabled;
+			perf_counts(evsel->counts, cpu, 0)->run += values[cpu].running;
+		}
+	}
+	return 0;
+}
+
+static int bpf_program_profiler__install_pe(struct evsel *evsel, int cpu,
+					    int fd)
+{
+	struct bpf_prog_profiler_bpf *skel;
+	struct bpf_counter *counter;
+	int ret;
+
+	list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
+		skel = counter->skel;
+		assert(skel != NULL);
+
+		ret = bpf_map_update_elem(bpf_map__fd(skel->maps.events),
+					  &cpu, &fd, BPF_ANY);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+struct bpf_counter_ops bpf_program_profiler_ops = {
+	.load       = bpf_program_profiler__load,
+	.enable	    = bpf_program_profiler__enable,
+	.read       = bpf_program_profiler__read,
+	.destroy    = bpf_program_profiler__destroy,
+	.install_pe = bpf_program_profiler__install_pe,
+};
+
+int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd)
+{
+	if (list_empty(&evsel->bpf_counter_list))
+		return 0;
+	return evsel->bpf_counter_ops->install_pe(evsel, cpu, fd);
+}
+
+int bpf_counter__load(struct evsel *evsel, struct target *target)
+{
+	if (target__has_bpf(target))
+		evsel->bpf_counter_ops = &bpf_program_profiler_ops;
+
+	if (evsel->bpf_counter_ops)
+		return evsel->bpf_counter_ops->load(evsel, target);
+	return 0;
+}
+
+int bpf_counter__enable(struct evsel *evsel)
+{
+	if (list_empty(&evsel->bpf_counter_list))
+		return 0;
+	return evsel->bpf_counter_ops->enable(evsel);
+}
+
+int bpf_counter__read(struct evsel *evsel)
+{
+	if (list_empty(&evsel->bpf_counter_list))
+		return -EAGAIN;
+	return evsel->bpf_counter_ops->read(evsel);
+}
+
+void bpf_counter__destroy(struct evsel *evsel)
+{
+	if (list_empty(&evsel->bpf_counter_list))
+		return;
+	evsel->bpf_counter_ops->destroy(evsel);
+	evsel->bpf_counter_ops = NULL;
+}
diff --git a/tools/perf/util/bpf_counter.h b/tools/perf/util/bpf_counter.h
new file mode 100644
index 0000000..2eca210
--- /dev/null
+++ b/tools/perf/util/bpf_counter.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PERF_BPF_COUNTER_H
+#define __PERF_BPF_COUNTER_H 1
+
+#include <linux/list.h>
+
+struct evsel;
+struct target;
+struct bpf_counter;
+
+typedef int (*bpf_counter_evsel_op)(struct evsel *evsel);
+typedef int (*bpf_counter_evsel_target_op)(struct evsel *evsel,
+					   struct target *target);
+typedef int (*bpf_counter_evsel_install_pe_op)(struct evsel *evsel,
+					       int cpu,
+					       int fd);
+
+struct bpf_counter_ops {
+	bpf_counter_evsel_target_op load;
+	bpf_counter_evsel_op enable;
+	bpf_counter_evsel_op read;
+	bpf_counter_evsel_op destroy;
+	bpf_counter_evsel_install_pe_op install_pe;
+};
+
+struct bpf_counter {
+	void *skel;
+	struct list_head list;
+};
+
+#ifdef HAVE_BPF_SKEL
+
+int bpf_counter__load(struct evsel *evsel, struct target *target);
+int bpf_counter__enable(struct evsel *evsel);
+int bpf_counter__read(struct evsel *evsel);
+void bpf_counter__destroy(struct evsel *evsel);
+int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd);
+
+#else /* HAVE_BPF_SKEL */
+
+#include<linux/err.h>
+
+static inline int bpf_counter__load(struct evsel *evsel __maybe_unused,
+				    struct target *target __maybe_unused)
+{
+	return 0;
+}
+
+static inline int bpf_counter__enable(struct evsel *evsel __maybe_unused)
+{
+	return 0;
+}
+
+static inline int bpf_counter__read(struct evsel *evsel __maybe_unused)
+{
+	return -EAGAIN;
+}
+
+static inline void bpf_counter__destroy(struct evsel *evsel __maybe_unused)
+{
+}
+
+static inline int bpf_counter__install_pe(struct evsel *evsel __maybe_unused,
+					  int cpu __maybe_unused,
+					  int fd __maybe_unused)
+{
+	return 0;
+}
+
+#endif /* HAVE_BPF_SKEL */
+
+#endif /* __PERF_BPF_COUNTER_H */
diff --git a/tools/perf/util/bpf_skel/.gitignore b/tools/perf/util/bpf_skel/.gitignore
new file mode 100644
index 0000000..5263e9e
--- /dev/null
+++ b/tools/perf/util/bpf_skel/.gitignore
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+.tmp
+*.skel.h
\ No newline at end of file
diff --git a/tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c b/tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c
new file mode 100644
index 0000000..c7cec92
--- /dev/null
+++ b/tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c
@@ -0,0 +1,93 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+// Copyright (c) 2020 Facebook
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+/* map of perf event fds, num_cpu * num_metric entries */
+struct {
+	__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(int));
+} events SEC(".maps");
+
+/* readings at fentry */
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct bpf_perf_event_value));
+	__uint(max_entries, 1);
+} fentry_readings SEC(".maps");
+
+/* accumulated readings */
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct bpf_perf_event_value));
+	__uint(max_entries, 1);
+} accum_readings SEC(".maps");
+
+const volatile __u32 num_cpu = 1;
+
+SEC("fentry/XXX")
+int BPF_PROG(fentry_XXX)
+{
+	__u32 key = bpf_get_smp_processor_id();
+	struct bpf_perf_event_value *ptr;
+	__u32 zero = 0;
+	long err;
+
+	/* look up before reading, to reduce error */
+	ptr = bpf_map_lookup_elem(&fentry_readings, &zero);
+	if (!ptr)
+		return 0;
+
+	err = bpf_perf_event_read_value(&events, key, ptr, sizeof(*ptr));
+	if (err)
+		return 0;
+
+	return 0;
+}
+
+static inline void
+fexit_update_maps(struct bpf_perf_event_value *after)
+{
+	struct bpf_perf_event_value *before, diff, *accum;
+	__u32 zero = 0;
+
+	before = bpf_map_lookup_elem(&fentry_readings, &zero);
+	/* only account samples with a valid fentry_reading */
+	if (before && before->counter) {
+		struct bpf_perf_event_value *accum;
+
+		diff.counter = after->counter - before->counter;
+		diff.enabled = after->enabled - before->enabled;
+		diff.running = after->running - before->running;
+
+		accum = bpf_map_lookup_elem(&accum_readings, &zero);
+		if (accum) {
+			accum->counter += diff.counter;
+			accum->enabled += diff.enabled;
+			accum->running += diff.running;
+		}
+	}
+}
+
+SEC("fexit/XXX")
+int BPF_PROG(fexit_XXX)
+{
+	struct bpf_perf_event_value reading;
+	__u32 cpu = bpf_get_smp_processor_id();
+	__u32 one = 1, zero = 0;
+	int err;
+
+	/* read all events before updating the maps, to reduce error */
+	err = bpf_perf_event_read_value(&events, cpu, &reading, sizeof(reading));
+	if (err)
+		return 0;
+
+	fexit_update_maps(&reading);
+	return 0;
+}
+
+char LICENSE[] SEC("license") = "Dual BSD/GPL";
diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index 02df36b..e32e8f2 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -448,7 +448,8 @@
 	int i = 0;
 	while (isxdigit(d->d_name[i]) && i < SBUILD_ID_SIZE - 3)
 		i++;
-	return (i == SBUILD_ID_SIZE - 3) && (d->d_name[i] == '\0');
+	return (i >= SBUILD_ID_MIN_SIZE - 3) && (i <= SBUILD_ID_SIZE - 3) &&
+		(d->d_name[i] == '\0');
 }
 
 struct strlist *build_id_cache__list_all(bool validonly)
@@ -490,7 +491,7 @@
 		}
 		strlist__for_each_entry(nd2, linklist) {
 			if (snprintf(sbuild_id, SBUILD_ID_SIZE, "%s%s",
-				     nd->s, nd2->s) != SBUILD_ID_SIZE - 1)
+				     nd->s, nd2->s) > SBUILD_ID_SIZE - 1)
 				goto err_out;
 			if (validonly && !build_id_cache__valid_id(sbuild_id))
 				continue;
diff --git a/tools/perf/util/build-id.h b/tools/perf/util/build-id.h
index 02613f4..c196171 100644
--- a/tools/perf/util/build-id.h
+++ b/tools/perf/util/build-id.h
@@ -2,8 +2,10 @@
 #ifndef PERF_BUILD_ID_H_
 #define PERF_BUILD_ID_H_ 1
 
-#define BUILD_ID_SIZE	20
+#define BUILD_ID_SIZE	20 /* SHA-1 length in bytes */
+#define BUILD_ID_MIN_SIZE	16 /* MD5/UUID/GUID length in bytes */
 #define SBUILD_ID_SIZE	(BUILD_ID_SIZE * 2 + 1)
+#define SBUILD_ID_MIN_SIZE	(BUILD_ID_MIN_SIZE * 2 + 1)
 
 #include "machine.h"
 #include "tool.h"
diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index 5dff7e4..f24ab45 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -161,7 +161,7 @@
 
 /* helper function for ftw() in match_cgroups and list_cgroups */
 static int add_cgroup_name(const char *fpath, const struct stat *sb __maybe_unused,
-			   int typeflag)
+			   int typeflag, struct FTW *ftwbuf __maybe_unused)
 {
 	struct cgroup_name *cn;
 
@@ -209,12 +209,12 @@
 			if (!s)
 				return -1;
 			/* pretend if it's added by ftw() */
-			ret = add_cgroup_name(s, NULL, FTW_D);
+			ret = add_cgroup_name(s, NULL, FTW_D, NULL);
 			free(s);
 			if (ret)
 				return -1;
 		} else {
-			if (add_cgroup_name("", NULL, FTW_D) < 0)
+			if (add_cgroup_name("", NULL, FTW_D, NULL) < 0)
 				return -1;
 		}
 
@@ -247,7 +247,7 @@
 	prefix_len = strlen(mnt);
 
 	/* collect all cgroups in the cgroup_list */
-	if (ftw(mnt, add_cgroup_name, 20) < 0)
+	if (nftw(mnt, add_cgroup_name, 20, 0) < 0)
 		return -1;
 
 	for (;;) {
diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 6969f82..6984c77 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -489,7 +489,7 @@
 	return 0;
 }
 
-int perf_config_from_file(config_fn_t fn, const char *filename, void *data)
+static int perf_config_from_file(config_fn_t fn, const char *filename, void *data)
 {
 	int ret;
 	FILE *f = fopen(filename, "r");
@@ -521,16 +521,66 @@
 	return v ? perf_config_bool(k, v) : def;
 }
 
-static int perf_config_system(void)
+int perf_config_system(void)
 {
 	return !perf_env_bool("PERF_CONFIG_NOSYSTEM", 0);
 }
 
-static int perf_config_global(void)
+int perf_config_global(void)
 {
 	return !perf_env_bool("PERF_CONFIG_NOGLOBAL", 0);
 }
 
+static char *home_perfconfig(void)
+{
+	const char *home = NULL;
+	char *config;
+	struct stat st;
+
+	home = getenv("HOME");
+
+	/*
+	 * Skip reading user config if:
+	 *   - there is no place to read it from (HOME)
+	 *   - we are asked not to (PERF_CONFIG_NOGLOBAL=1)
+	 */
+	if (!home || !*home || !perf_config_global())
+		return NULL;
+
+	config = strdup(mkpath("%s/.perfconfig", home));
+	if (config == NULL) {
+		pr_warning("Not enough memory to process %s/.perfconfig, ignoring it.", home);
+		return NULL;
+	}
+
+	if (stat(config, &st) < 0)
+		goto out_free;
+
+	if (st.st_uid && (st.st_uid != geteuid())) {
+		pr_warning("File %s not owned by current user or root, ignoring it.", config);
+		goto out_free;
+	}
+
+	if (st.st_size)
+		return config;
+
+out_free:
+	free(config);
+	return NULL;
+}
+
+const char *perf_home_perfconfig(void)
+{
+	static const char *config;
+	static bool failed;
+
+	config = failed ? NULL : home_perfconfig();
+	if (!config)
+		failed = true;
+
+	return config;
+}
+
 static struct perf_config_section *find_section(struct list_head *sections,
 						const char *section_name)
 {
@@ -676,9 +726,6 @@
 static int perf_config_set__init(struct perf_config_set *set)
 {
 	int ret = -1;
-	const char *home = NULL;
-	char *user_config;
-	struct stat st;
 
 	/* Setting $PERF_CONFIG makes perf read _only_ the given config file. */
 	if (config_exclusive_filename)
@@ -687,41 +734,11 @@
 		if (perf_config_from_file(collect_config, perf_etc_perfconfig(), set) < 0)
 			goto out;
 	}
-
-	home = getenv("HOME");
-
-	/*
-	 * Skip reading user config if:
-	 *   - there is no place to read it from (HOME)
-	 *   - we are asked not to (PERF_CONFIG_NOGLOBAL=1)
-	 */
-	if (!home || !*home || !perf_config_global())
-		return 0;
-
-	user_config = strdup(mkpath("%s/.perfconfig", home));
-	if (user_config == NULL) {
-		pr_warning("Not enough memory to process %s/.perfconfig, ignoring it.", home);
-		goto out;
+	if (perf_config_global() && perf_home_perfconfig()) {
+		if (perf_config_from_file(collect_config, perf_home_perfconfig(), set) < 0)
+			goto out;
 	}
 
-	if (stat(user_config, &st) < 0) {
-		if (errno == ENOENT)
-			ret = 0;
-		goto out_free;
-	}
-
-	ret = 0;
-
-	if (st.st_uid && (st.st_uid != geteuid())) {
-		pr_warning("File %s not owned by current user or root, ignoring it.", user_config);
-		goto out_free;
-	}
-
-	if (st.st_size)
-		ret = perf_config_from_file(collect_config, user_config, set);
-
-out_free:
-	free(user_config);
 out:
 	return ret;
 }
@@ -738,6 +755,18 @@
 	return set;
 }
 
+struct perf_config_set *perf_config_set__load_file(const char *file)
+{
+	struct perf_config_set *set = zalloc(sizeof(*set));
+
+	if (set) {
+		INIT_LIST_HEAD(&set->sections);
+		perf_config_from_file(collect_config, file, set);
+	}
+
+	return set;
+}
+
 static int perf_config__init(void)
 {
 	if (config_set == NULL)
@@ -746,17 +775,15 @@
 	return config_set == NULL;
 }
 
-int perf_config(config_fn_t fn, void *data)
+int perf_config_set(struct perf_config_set *set,
+		    config_fn_t fn, void *data)
 {
 	int ret = 0;
 	char key[BUFSIZ];
 	struct perf_config_section *section;
 	struct perf_config_item *item;
 
-	if (config_set == NULL && perf_config__init())
-		return -1;
-
-	perf_config_set__for_each_entry(config_set, section, item) {
+	perf_config_set__for_each_entry(set, section, item) {
 		char *value = item->value;
 
 		if (value) {
@@ -778,6 +805,14 @@
 	return ret;
 }
 
+int perf_config(config_fn_t fn, void *data)
+{
+	if (config_set == NULL && perf_config__init())
+		return -1;
+
+	return perf_config_set(config_set, fn, data);
+}
+
 void perf_config__exit(void)
 {
 	perf_config_set__delete(config_set);
diff --git a/tools/perf/util/config.h b/tools/perf/util/config.h
index 8c881e3..2fd77aa 100644
--- a/tools/perf/util/config.h
+++ b/tools/perf/util/config.h
@@ -27,17 +27,22 @@
 
 typedef int (*config_fn_t)(const char *, const char *, void *);
 
-int perf_config_from_file(config_fn_t fn, const char *filename, void *data);
 int perf_default_config(const char *, const char *, void *);
 int perf_config(config_fn_t fn, void *);
+int perf_config_set(struct perf_config_set *set,
+		    config_fn_t fn, void *data);
 int perf_config_int(int *dest, const char *, const char *);
 int perf_config_u8(u8 *dest, const char *name, const char *value);
 int perf_config_u64(u64 *dest, const char *, const char *);
 int perf_config_bool(const char *, const char *);
 int config_error_nonbool(const char *);
 const char *perf_etc_perfconfig(void);
+const char *perf_home_perfconfig(void);
+int perf_config_system(void);
+int perf_config_global(void);
 
 struct perf_config_set *perf_config_set__new(void);
+struct perf_config_set *perf_config_set__load_file(const char *file);
 void perf_config_set__delete(struct perf_config_set *set);
 int perf_config_set__collect(struct perf_config_set *set, const char *file_name,
 			     const char *var, const char *value);
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index cd007cc..3f4bc40 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -419,19 +419,10 @@
 	packet->last_instr_subtype = elem->last_i_subtype;
 	packet->last_instr_cond = elem->last_instr_cond;
 
-	switch (elem->last_i_type) {
-	case OCSD_INSTR_BR:
-	case OCSD_INSTR_BR_INDIRECT:
+	if (elem->last_i_type == OCSD_INSTR_BR || elem->last_i_type == OCSD_INSTR_BR_INDIRECT)
 		packet->last_instr_taken_branch = elem->last_instr_exec;
-		break;
-	case OCSD_INSTR_ISB:
-	case OCSD_INSTR_DSB_DMB:
-	case OCSD_INSTR_WFI_WFE:
-	case OCSD_INSTR_OTHER:
-	default:
+	else
 		packet->last_instr_taken_branch = false;
-		break;
-	}
 
 	packet->last_instr_size = elem->last_instr_sz;
 
@@ -572,6 +563,8 @@
 	case OCSD_GEN_TRC_ELEM_EVENT:
 	case OCSD_GEN_TRC_ELEM_SWTRACE:
 	case OCSD_GEN_TRC_ELEM_CUSTOM:
+	case OCSD_GEN_TRC_ELEM_SYNC_MARKER:
+	case OCSD_GEN_TRC_ELEM_MEMTRANS:
 	default:
 		break;
 	}
diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 27c5fef..8b67bd9 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -948,7 +948,7 @@
 		goto out;
 	/*
 	 * Add '_' prefix to potential keywork.  According to
-	 * Mathieu Desnoyers (https://lkml.org/lkml/2015/1/23/652),
+	 * Mathieu Desnoyers (https://lore.kernel.org/lkml/1074266107.40857.1422045946295.JavaMail.zimbra@efficios.com),
 	 * futher CTF spec updating may require us to use '$'.
 	 */
 	if (dup < 0)
diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c
index db74471..5cd1891 100644
--- a/tools/perf/util/db-export.c
+++ b/tools/perf/util/db-export.c
@@ -438,6 +438,8 @@
 	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TX_ABORT, "transaction abort"},
 	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TRACE_BEGIN, "trace begin"},
 	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TRACE_END, "trace end"},
+	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_VMENTRY, "vm entry"},
+	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_VMEXIT, "vm exit"},
 	{0, NULL}
 };
 
diff --git a/tools/perf/util/debug.c b/tools/perf/util/debug.c
index 50fd6a4..2c06abf 100644
--- a/tools/perf/util/debug.c
+++ b/tools/perf/util/debug.c
@@ -10,6 +10,7 @@
 #include <api/debug.h>
 #include <linux/kernel.h>
 #include <linux/time64.h>
+#include <sys/time.h>
 #ifdef HAVE_BACKTRACE_SUPPORT
 #include <execinfo.h>
 #endif
@@ -31,21 +32,48 @@
 static int redirect_to_stderr;
 int debug_data_convert;
 static FILE *debug_file;
+bool debug_display_time;
 
 void debug_set_file(FILE *file)
 {
 	debug_file = file;
 }
 
+void debug_set_display_time(bool set)
+{
+	debug_display_time = set;
+}
+
+static int fprintf_time(FILE *file)
+{
+	struct timeval tod;
+	struct tm ltime;
+	char date[64];
+
+	if (!debug_display_time)
+		return 0;
+
+	if (gettimeofday(&tod, NULL) != 0)
+		return 0;
+
+	if (localtime_r(&tod.tv_sec, &ltime) == NULL)
+		return 0;
+
+	strftime(date, sizeof(date),  "%F %H:%M:%S", &ltime);
+	return fprintf(file, "[%s.%06lu] ", date, (long)tod.tv_usec);
+}
+
 int veprintf(int level, int var, const char *fmt, va_list args)
 {
 	int ret = 0;
 
 	if (var >= level) {
-		if (use_browser >= 1 && !redirect_to_stderr)
+		if (use_browser >= 1 && !redirect_to_stderr) {
 			ui_helpline__vshow(fmt, args);
-		else
-			ret = vfprintf(debug_file, fmt, args);
+		} else {
+			ret = fprintf_time(debug_file);
+			ret += vfprintf(debug_file, fmt, args);
+		}
 	}
 
 	return ret;
diff --git a/tools/perf/util/debug.h b/tools/perf/util/debug.h
index 43f7122..48f6319 100644
--- a/tools/perf/util/debug.h
+++ b/tools/perf/util/debug.h
@@ -64,6 +64,7 @@
 
 int perf_debug_option(const char *str);
 void debug_set_file(FILE *file);
+void debug_set_display_time(bool set);
 void perf_debug_setup(void);
 int perf_quiet_option(void);
 
diff --git a/tools/perf/util/demangle-ocaml.c b/tools/perf/util/demangle-ocaml.c
new file mode 100644
index 0000000..3df14e6
--- /dev/null
+++ b/tools/perf/util/demangle-ocaml.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <string.h>
+#include <stdlib.h>
+#include "util/string2.h"
+
+#include "demangle-ocaml.h"
+
+#include <linux/ctype.h>
+
+static const char *caml_prefix = "caml";
+static const size_t caml_prefix_len = 4;
+
+/* mangled OCaml symbols start with "caml" followed by an upper-case letter */
+static bool
+ocaml_is_mangled(const char *sym)
+{
+	return 0 == strncmp(sym, caml_prefix, caml_prefix_len)
+		&& isupper(sym[caml_prefix_len]);
+}
+
+/*
+ * input:
+ *     sym: a symbol which may have been mangled by the OCaml compiler
+ * return:
+ *     if the input doesn't look like a mangled OCaml symbol, NULL is returned
+ *     otherwise, a newly allocated string containing the demangled symbol is returned
+ */
+char *
+ocaml_demangle_sym(const char *sym)
+{
+	char *result;
+	int j = 0;
+	int i;
+	int len;
+
+	if (!ocaml_is_mangled(sym)) {
+		return NULL;
+	}
+
+	len = strlen(sym);
+
+	/* the demangled symbol is always smaller than the mangled symbol */
+	result = malloc(len + 1);
+	if (!result)
+		return NULL;
+
+	/* skip "caml" prefix */
+	i = caml_prefix_len;
+
+	while (i < len) {
+		if (sym[i] == '_' && sym[i + 1] == '_') {
+			/* "__" -> "." */
+			result[j++] = '.';
+			i += 2;
+		}
+		else if (sym[i] == '$' && isxdigit(sym[i + 1]) && isxdigit(sym[i + 2])) {
+			/* "$xx" is a hex-encoded character */
+			result[j++] = (hex(sym[i + 1]) << 4) | hex(sym[i + 2]);
+			i += 3;
+		}
+		else {
+			result[j++] = sym[i++];
+		}
+	}
+	result[j] = '\0';
+
+	/* scan backwards to remove an "_" followed by decimal digits */
+	if (j != 0 && isdigit(result[j - 1])) {
+		while (--j) {
+			if (!isdigit(result[j])) {
+				break;
+			}
+		}
+		if (result[j] == '_') {
+			result[j] = '\0';
+		}
+	}
+
+	return result;
+}
diff --git a/tools/perf/util/demangle-ocaml.h b/tools/perf/util/demangle-ocaml.h
new file mode 100644
index 0000000..843cc4f
--- /dev/null
+++ b/tools/perf/util/demangle-ocaml.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PERF_DEMANGLE_OCAML
+#define __PERF_DEMANGLE_OCAML 1
+
+char * ocaml_demangle_sym(const char *str);
+
+#endif /* __PERF_DEMANGLE_OCAML */
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 05616d413..ac70630 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -288,17 +288,36 @@
 
 size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp)
 {
-	return fprintf(fp, " %d/%d: [%#" PRI_lx64 "(%#" PRI_lx64 ") @ %#" PRI_lx64
-			   " %02x:%02x %"PRI_lu64" %"PRI_lu64"]: %c%c%c%c %s\n",
-		       event->mmap2.pid, event->mmap2.tid, event->mmap2.start,
-		       event->mmap2.len, event->mmap2.pgoff, event->mmap2.maj,
-		       event->mmap2.min, event->mmap2.ino,
-		       event->mmap2.ino_generation,
-		       (event->mmap2.prot & PROT_READ) ? 'r' : '-',
-		       (event->mmap2.prot & PROT_WRITE) ? 'w' : '-',
-		       (event->mmap2.prot & PROT_EXEC) ? 'x' : '-',
-		       (event->mmap2.flags & MAP_SHARED) ? 's' : 'p',
-		       event->mmap2.filename);
+	if (event->header.misc & PERF_RECORD_MISC_MMAP_BUILD_ID) {
+		char sbuild_id[SBUILD_ID_SIZE];
+		struct build_id bid;
+
+		build_id__init(&bid, event->mmap2.build_id,
+			       event->mmap2.build_id_size);
+		build_id__sprintf(&bid, sbuild_id);
+
+		return fprintf(fp, " %d/%d: [%#" PRI_lx64 "(%#" PRI_lx64 ") @ %#" PRI_lx64
+				   " <%s>]: %c%c%c%c %s\n",
+			       event->mmap2.pid, event->mmap2.tid, event->mmap2.start,
+			       event->mmap2.len, event->mmap2.pgoff, sbuild_id,
+			       (event->mmap2.prot & PROT_READ) ? 'r' : '-',
+			       (event->mmap2.prot & PROT_WRITE) ? 'w' : '-',
+			       (event->mmap2.prot & PROT_EXEC) ? 'x' : '-',
+			       (event->mmap2.flags & MAP_SHARED) ? 's' : 'p',
+			       event->mmap2.filename);
+	} else {
+		return fprintf(fp, " %d/%d: [%#" PRI_lx64 "(%#" PRI_lx64 ") @ %#" PRI_lx64
+				   " %02x:%02x %"PRI_lu64" %"PRI_lu64"]: %c%c%c%c %s\n",
+			       event->mmap2.pid, event->mmap2.tid, event->mmap2.start,
+			       event->mmap2.len, event->mmap2.pgoff, event->mmap2.maj,
+			       event->mmap2.min, event->mmap2.ino,
+			       event->mmap2.ino_generation,
+			       (event->mmap2.prot & PROT_READ) ? 'r' : '-',
+			       (event->mmap2.prot & PROT_WRITE) ? 'w' : '-',
+			       (event->mmap2.prot & PROT_EXEC) ? 'x' : '-',
+			       (event->mmap2.flags & MAP_SHARED) ? 's' : 'p',
+			       event->mmap2.filename);
+	}
 }
 
 size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp)
@@ -626,6 +645,19 @@
 	return al->sym;
 }
 
+static bool check_address_range(struct intlist *addr_list, int addr_range,
+				unsigned long addr)
+{
+	struct int_node *pos;
+
+	intlist__for_each_entry(pos, addr_list) {
+		if (addr >= pos->i && addr < pos->i + addr_range)
+			return true;
+	}
+
+	return false;
+}
+
 /*
  * Callers need to drop the reference to al->thread, obtained in
  * machine__findnew_thread()
@@ -673,6 +705,8 @@
 		}
 
 		al->sym = map__find_symbol(al->map, al->addr);
+	} else if (symbol_conf.dso_list) {
+		al->filtered |= (1 << HIST_FILTER__DSO);
 	}
 
 	if (symbol_conf.sym_list) {
@@ -690,6 +724,17 @@
 			ret = strlist__has_entry(symbol_conf.sym_list,
 						al_addr_str);
 		}
+		if (!ret && symbol_conf.addr_list && al->map) {
+			unsigned long addr = al->map->unmap_ip(al->map, al->addr);
+
+			ret = intlist__has_entry(symbol_conf.addr_list, addr);
+			if (!ret && symbol_conf.addr_range) {
+				ret = check_address_range(symbol_conf.addr_list,
+							  symbol_conf.addr_range,
+							  addr);
+			}
+		}
+
 		if (!ret)
 			al->filtered |= (1 << HIST_FILTER__SYMBOL);
 	}
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index ff403ea..f603edb 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -96,6 +96,8 @@
 	PERF_IP_FLAG_TRACE_BEGIN	= 1ULL << 8,
 	PERF_IP_FLAG_TRACE_END		= 1ULL << 9,
 	PERF_IP_FLAG_IN_TX		= 1ULL << 10,
+	PERF_IP_FLAG_VMENTRY		= 1ULL << 11,
+	PERF_IP_FLAG_VMEXIT		= 1ULL << 12,
 };
 
 #define PERF_IP_FLAG_CHARS "bcrosyiABEx"
@@ -110,7 +112,9 @@
 	PERF_IP_FLAG_INTERRUPT		|\
 	PERF_IP_FLAG_TX_ABORT		|\
 	PERF_IP_FLAG_TRACE_BEGIN	|\
-	PERF_IP_FLAG_TRACE_END)
+	PERF_IP_FLAG_TRACE_END		|\
+	PERF_IP_FLAG_VMENTRY		|\
+	PERF_IP_FLAG_VMEXIT)
 
 #define MAX_INSN 16
 
@@ -136,11 +140,13 @@
 	u64 data_src;
 	u64 phys_addr;
 	u64 data_page_size;
+	u64 code_page_size;
 	u64 cgroup;
 	u32 flags;
 	u16 insn_len;
 	u8  cpumode;
 	u16 misc;
+	u16 ins_lat;
 	bool no_hw_idx;		/* No hw_idx collected in branch_stack */
 	char insn[MAX_INSN];
 	void *raw_data;
@@ -171,6 +177,7 @@
 	PERF_SYNTH_INTEL_EXSTOP,
 	PERF_SYNTH_INTEL_PWRX,
 	PERF_SYNTH_INTEL_CBR,
+	PERF_SYNTH_INTEL_PSB,
 };
 
 /*
@@ -263,6 +270,12 @@
 	u32 reserved3;
 };
 
+struct perf_synth_intel_psb {
+	u32 padding;
+	u32 reserved;
+	u64 offset;
+};
+
 /*
  * raw_data is always 4 bytes from an 8-byte boundary, so subtract 4 to get
  * 8-byte alignment.
@@ -412,4 +425,7 @@
 #define PAGE_SIZE_NAME_LEN	32
 char *get_page_size_name(u64 size, char *str);
 
+void arch_perf_parse_sample_weight(struct perf_sample *data, const __u64 *array, u64 type);
+void arch_perf_synthesize_sample_weight(const struct perf_sample *data, __u64 *array, u64 type);
+
 #endif /* __PERF_RECORD_H */
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 05363a72..5121b4d 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -24,6 +24,7 @@
 #include "bpf-event.h"
 #include "util/string2.h"
 #include "util/perf_api_probe.h"
+#include "util/evsel_fprintf.h"
 #include <signal.h>
 #include <unistd.h>
 #include <sched.h>
@@ -303,6 +304,11 @@
 	return evlist__add_attrs(evlist, attrs, nr_attrs);
 }
 
+__weak int arch_evlist__add_default_attrs(struct evlist *evlist __maybe_unused)
+{
+	return 0;
+}
+
 struct evsel *evlist__find_tracepoint_by_id(struct evlist *evlist, int id)
 {
 	struct evsel *evsel;
@@ -572,6 +578,14 @@
 	return perf_evlist__filter_pollfd(&evlist->core, revents_and_mask);
 }
 
+#ifdef HAVE_EVENTFD_SUPPORT
+int evlist__add_wakeup_eventfd(struct evlist *evlist, int fd)
+{
+	return perf_evlist__add_pollfd(&evlist->core, fd, NULL, POLLIN,
+				       fdarray_flag__nonfilterable);
+}
+#endif
+
 int evlist__poll(struct evlist *evlist, int timeout)
 {
 	return perf_evlist__poll(&evlist->core, timeout);
@@ -1936,6 +1950,15 @@
 				    (sizeof(EVLIST_CTL_CMD_SNAPSHOT_TAG)-1))) {
 			*cmd = EVLIST_CTL_CMD_SNAPSHOT;
 			pr_debug("is snapshot\n");
+		} else if (!strncmp(cmd_data, EVLIST_CTL_CMD_EVLIST_TAG,
+				    (sizeof(EVLIST_CTL_CMD_EVLIST_TAG)-1))) {
+			*cmd = EVLIST_CTL_CMD_EVLIST;
+		} else if (!strncmp(cmd_data, EVLIST_CTL_CMD_STOP_TAG,
+				    (sizeof(EVLIST_CTL_CMD_STOP_TAG)-1))) {
+			*cmd = EVLIST_CTL_CMD_STOP;
+		} else if (!strncmp(cmd_data, EVLIST_CTL_CMD_PING_TAG,
+				    (sizeof(EVLIST_CTL_CMD_PING_TAG)-1))) {
+			*cmd = EVLIST_CTL_CMD_PING;
 		}
 	}
 
@@ -1957,6 +1980,98 @@
 	return err;
 }
 
+static int get_cmd_arg(char *cmd_data, size_t cmd_size, char **arg)
+{
+	char *data = cmd_data + cmd_size;
+
+	/* no argument */
+	if (!*data)
+		return 0;
+
+	/* there's argument */
+	if (*data == ' ') {
+		*arg = data + 1;
+		return 1;
+	}
+
+	/* malformed */
+	return -1;
+}
+
+static int evlist__ctlfd_enable(struct evlist *evlist, char *cmd_data, bool enable)
+{
+	struct evsel *evsel;
+	char *name;
+	int err;
+
+	err = get_cmd_arg(cmd_data,
+			  enable ? sizeof(EVLIST_CTL_CMD_ENABLE_TAG) - 1 :
+				   sizeof(EVLIST_CTL_CMD_DISABLE_TAG) - 1,
+			  &name);
+	if (err < 0) {
+		pr_info("failed: wrong command\n");
+		return -1;
+	}
+
+	if (err) {
+		evsel = evlist__find_evsel_by_str(evlist, name);
+		if (evsel) {
+			if (enable)
+				evlist__enable_evsel(evlist, name);
+			else
+				evlist__disable_evsel(evlist, name);
+			pr_info("Event %s %s\n", evsel->name,
+				enable ? "enabled" : "disabled");
+		} else {
+			pr_info("failed: can't find '%s' event\n", name);
+		}
+	} else {
+		if (enable) {
+			evlist__enable(evlist);
+			pr_info(EVLIST_ENABLED_MSG);
+		} else {
+			evlist__disable(evlist);
+			pr_info(EVLIST_DISABLED_MSG);
+		}
+	}
+
+	return 0;
+}
+
+static int evlist__ctlfd_list(struct evlist *evlist, char *cmd_data)
+{
+	struct perf_attr_details details = { .verbose = false, };
+	struct evsel *evsel;
+	char *arg;
+	int err;
+
+	err = get_cmd_arg(cmd_data,
+			  sizeof(EVLIST_CTL_CMD_EVLIST_TAG) - 1,
+			  &arg);
+	if (err < 0) {
+		pr_info("failed: wrong command\n");
+		return -1;
+	}
+
+	if (err) {
+		if (!strcmp(arg, "-v")) {
+			details.verbose = true;
+		} else if (!strcmp(arg, "-g")) {
+			details.event_group = true;
+		} else if (!strcmp(arg, "-F")) {
+			details.freq = true;
+		} else {
+			pr_info("failed: wrong command\n");
+			return -1;
+		}
+	}
+
+	evlist__for_each_entry(evlist, evsel)
+		evsel__fprintf(evsel, &details, stderr);
+
+	return 0;
+}
+
 int evlist__ctlfd_process(struct evlist *evlist, enum evlist_ctl_cmd *cmd)
 {
 	int err = 0;
@@ -1973,12 +2088,16 @@
 		if (err > 0) {
 			switch (*cmd) {
 			case EVLIST_CTL_CMD_ENABLE:
-				evlist__enable(evlist);
-				break;
 			case EVLIST_CTL_CMD_DISABLE:
-				evlist__disable(evlist);
+				err = evlist__ctlfd_enable(evlist, cmd_data,
+							   *cmd == EVLIST_CTL_CMD_ENABLE);
+				break;
+			case EVLIST_CTL_CMD_EVLIST:
+				err = evlist__ctlfd_list(evlist, cmd_data);
 				break;
 			case EVLIST_CTL_CMD_SNAPSHOT:
+			case EVLIST_CTL_CMD_STOP:
+			case EVLIST_CTL_CMD_PING:
 				break;
 			case EVLIST_CTL_CMD_ACK:
 			case EVLIST_CTL_CMD_UNSUPPORTED:
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 1aae758..b695ffa 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -110,6 +110,8 @@
 #define evlist__add_default_attrs(evlist, array) \
 	__evlist__add_default_attrs(evlist, array, ARRAY_SIZE(array))
 
+int arch_evlist__add_default_attrs(struct evlist *evlist);
+
 int evlist__add_dummy(struct evlist *evlist);
 
 int evlist__add_sb_event(struct evlist *evlist, struct perf_event_attr *attr,
@@ -142,6 +144,10 @@
 int evlist__add_pollfd(struct evlist *evlist, int fd);
 int evlist__filter_pollfd(struct evlist *evlist, short revents_and_mask);
 
+#ifdef HAVE_EVENTFD_SUPPORT
+int evlist__add_wakeup_eventfd(struct evlist *evlist, int fd);
+#endif
+
 int evlist__poll(struct evlist *evlist, int timeout);
 
 struct evsel *evlist__id2evsel(struct evlist *evlist, u64 id);
@@ -330,6 +336,9 @@
 #define EVLIST_CTL_CMD_DISABLE_TAG "disable"
 #define EVLIST_CTL_CMD_ACK_TAG     "ack\n"
 #define EVLIST_CTL_CMD_SNAPSHOT_TAG "snapshot"
+#define EVLIST_CTL_CMD_EVLIST_TAG "evlist"
+#define EVLIST_CTL_CMD_STOP_TAG "stop"
+#define EVLIST_CTL_CMD_PING_TAG "ping"
 
 #define EVLIST_CTL_CMD_MAX_LEN 64
 
@@ -339,6 +348,9 @@
 	EVLIST_CTL_CMD_DISABLE,
 	EVLIST_CTL_CMD_ACK,
 	EVLIST_CTL_CMD_SNAPSHOT,
+	EVLIST_CTL_CMD_EVLIST,
+	EVLIST_CTL_CMD_STOP,
+	EVLIST_CTL_CMD_PING,
 };
 
 int evlist__parse_control(const char *str, int *ctl_fd, int *ctl_fd_ack, bool *ctl_fd_close);
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index c26ea822..1bf7686 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -25,6 +25,7 @@
 #include <stdlib.h>
 #include <perf/evsel.h>
 #include "asm/bug.h"
+#include "bpf_counter.h"
 #include "callchain.h"
 #include "cgroup.h"
 #include "counts.h"
@@ -247,6 +248,7 @@
 	evsel->bpf_obj	   = NULL;
 	evsel->bpf_fd	   = -1;
 	INIT_LIST_HEAD(&evsel->config_terms);
+	INIT_LIST_HEAD(&evsel->bpf_counter_list);
 	perf_evsel__object.init(evsel);
 	evsel->sample_size = __evsel__sample_size(attr->sample_type);
 	evsel__calc_id_pos(evsel);
@@ -1012,6 +1014,11 @@
 	return found_term;
 }
 
+void __weak arch_evsel__set_sample_weight(struct evsel *evsel)
+{
+	evsel__set_sample_bit(evsel, WEIGHT);
+}
+
 /*
  * The enable_on_exec/disabled value strategy:
  *
@@ -1166,12 +1173,14 @@
 	}
 
 	if (opts->sample_weight)
-		evsel__set_sample_bit(evsel, WEIGHT);
+		arch_evsel__set_sample_weight(evsel);
 
-	attr->task  = track;
-	attr->mmap  = track;
-	attr->mmap2 = track && !perf_missing_features.mmap2;
-	attr->comm  = track;
+	attr->task     = track;
+	attr->mmap     = track;
+	attr->mmap2    = track && !perf_missing_features.mmap2;
+	attr->comm     = track;
+	attr->build_id = track && opts->build_id;
+
 	/*
 	 * ksymbol is tracked separately with text poke because it needs to be
 	 * system wide and enabled immediately.
@@ -1191,6 +1200,9 @@
 	if (opts->sample_data_page_size)
 		evsel__set_sample_bit(evsel, DATA_PAGE_SIZE);
 
+	if (opts->sample_code_page_size)
+		evsel__set_sample_bit(evsel, CODE_PAGE_SIZE);
+
 	if (opts->record_switch_events)
 		attr->context_switch = track;
 
@@ -1366,6 +1378,7 @@
 {
 	assert(list_empty(&evsel->core.node));
 	assert(evsel->evlist == NULL);
+	bpf_counter__destroy(evsel);
 	evsel__free_counts(evsel);
 	perf_evsel__free_fd(&evsel->core);
 	perf_evsel__free_id(&evsel->core);
@@ -1735,6 +1748,10 @@
 	}
 
 fallback_missing_features:
+	if (perf_missing_features.weight_struct) {
+		evsel__set_sample_bit(evsel, WEIGHT);
+		evsel__reset_sample_bit(evsel, WEIGHT_STRUCT);
+	}
 	if (perf_missing_features.clockid_wrong)
 		evsel->core.attr.clockid = CLOCK_MONOTONIC; /* should always work */
 	if (perf_missing_features.clockid) {
@@ -1781,6 +1798,8 @@
 
 			FD(evsel, cpu, thread) = fd;
 
+			bpf_counter__install_pe(evsel, cpu, fd);
+
 			if (unlikely(test_attr__enabled)) {
 				test_attr__open(&evsel->core.attr, pid, cpus->map[cpu],
 						fd, group_fd, flags);
@@ -1873,7 +1892,17 @@
 	 * Must probe features in the order they were added to the
 	 * perf_event_attr interface.
 	 */
-        if (!perf_missing_features.data_page_size &&
+	if (!perf_missing_features.weight_struct &&
+	    (evsel->core.attr.sample_type & PERF_SAMPLE_WEIGHT_STRUCT)) {
+		perf_missing_features.weight_struct = true;
+		pr_debug2("switching off weight struct support\n");
+		goto fallback_missing_features;
+	} else if (!perf_missing_features.code_page_size &&
+	    (evsel->core.attr.sample_type & PERF_SAMPLE_CODE_PAGE_SIZE)) {
+		perf_missing_features.code_page_size = true;
+		pr_debug2_peo("Kernel has no PERF_SAMPLE_CODE_PAGE_SIZE support, bailing out\n");
+		goto out_close;
+	} else if (!perf_missing_features.data_page_size &&
 	    (evsel->core.attr.sample_type & PERF_SAMPLE_DATA_PAGE_SIZE)) {
 		perf_missing_features.data_page_size = true;
 		pr_debug2_peo("Kernel has no PERF_SAMPLE_DATA_PAGE_SIZE support, bailing out\n");
@@ -2076,6 +2105,13 @@
 	return 0;
 }
 
+void __weak arch_perf_parse_sample_weight(struct perf_sample *data,
+					  const __u64 *array,
+					  u64 type __maybe_unused)
+{
+	data->weight = *array;
+}
+
 int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 			struct perf_sample *data)
 {
@@ -2316,9 +2352,9 @@
 		}
 	}
 
-	if (type & PERF_SAMPLE_WEIGHT) {
+	if (type & PERF_SAMPLE_WEIGHT_TYPE) {
 		OVERFLOW_CHECK_u64(array);
-		data->weight = *array;
+		arch_perf_parse_sample_weight(data, array, type);
 		array++;
 	}
 
@@ -2369,6 +2405,12 @@
 		array++;
 	}
 
+	data->code_page_size = 0;
+	if (type & PERF_SAMPLE_CODE_PAGE_SIZE) {
+		data->code_page_size = *array;
+		array++;
+	}
+
 	if (type & PERF_SAMPLE_AUX) {
 		OVERFLOW_CHECK_u64(array);
 		sz = *array++;
@@ -2678,6 +2720,8 @@
 	"We found oprofile daemon running, please stop it and try again.");
 		break;
 	case EINVAL:
+		if (evsel->core.attr.sample_type & PERF_SAMPLE_CODE_PAGE_SIZE && perf_missing_features.code_page_size)
+			return scnprintf(msg, size, "Asking for the code page size isn't supported by this kernel.");
 		if (evsel->core.attr.sample_type & PERF_SAMPLE_DATA_PAGE_SIZE && perf_missing_features.data_page_size)
 			return scnprintf(msg, size, "Asking for the data page size isn't supported by this kernel.");
 		if (evsel->core.attr.write_backward && perf_missing_features.write_backward)
@@ -2689,6 +2733,9 @@
 		if (perf_missing_features.aux_output)
 			return scnprintf(msg, size, "The 'aux_output' feature is not supported, update the kernel.");
 		break;
+	case ENODATA:
+		return scnprintf(msg, size, "Cannot collect data source with the load latency event alone. "
+				 "Please add an auxiliary event in front of the load latency event.");
 	default:
 		break;
 	}
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index cd1d8dd..4e8e49f 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -17,6 +17,8 @@
 struct perf_counts;
 struct perf_stat_evsel;
 union perf_event;
+struct bpf_counter_ops;
+struct target;
 
 typedef int (evsel__sb_cb_t)(union perf_event *event, void *data);
 
@@ -127,6 +129,8 @@
 	 * See also evsel__has_callchain().
 	 */
 	__u64			synth_sample_type;
+	struct list_head	bpf_counter_list;
+	struct bpf_counter_ops	*bpf_counter_ops;
 };
 
 struct perf_missing_features {
@@ -145,6 +149,8 @@
 	bool branch_hw_idx;
 	bool cgroup;
 	bool data_page_size;
+	bool code_page_size;
+	bool weight_struct;
 };
 
 extern struct perf_missing_features perf_missing_features;
@@ -239,6 +245,8 @@
 
 void evsel__set_sample_id(struct evsel *evsel, bool use_sample_identifier);
 
+void arch_evsel__set_sample_weight(struct evsel *evsel);
+
 int evsel__set_filter(struct evsel *evsel, const char *filter);
 int evsel__append_tp_filter(struct evsel *evsel, const char *filter);
 int evsel__append_addr_filter(struct evsel *evsel, const char *filter);
@@ -424,4 +432,5 @@
 struct perf_env *evsel__env(struct evsel *evsel);
 
 int evsel__store_ids(struct evsel *evsel, struct evlist *evlist);
+
 #endif /* __PERF_EVSEL_H */
diff --git a/tools/perf/util/evsel_fprintf.c b/tools/perf/util/evsel_fprintf.c
index fb498a7..bfedd7b 100644
--- a/tools/perf/util/evsel_fprintf.c
+++ b/tools/perf/util/evsel_fprintf.c
@@ -100,6 +100,7 @@
 	return ++printed;
 }
 
+#ifndef PYTHON_PERF
 int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
 			      unsigned int print_opts, struct callchain_cursor *cursor,
 			      struct strlist *bt_stop_list, FILE *fp)
@@ -239,3 +240,4 @@
 
 	return printed;
 }
+#endif /* PYTHON_PERF */
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index c4ed3dc..4fe9e2a 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -3806,7 +3806,7 @@
 	 * check for the pipe header regardless of source.
 	 */
 	err = perf_header__read_pipe(session);
-	if (!err || (err && perf_data__is_pipe(data))) {
+	if (!err || perf_data__is_pipe(data)) {
 		data->is_pipe = true;
 		return err;
 	}
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index a08fb9e..c82f5fc 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -208,10 +208,14 @@
 	hists__new_col_len(hists, HISTC_MEM_LVL, 21 + 3);
 	hists__new_col_len(hists, HISTC_LOCAL_WEIGHT, 12);
 	hists__new_col_len(hists, HISTC_GLOBAL_WEIGHT, 12);
+	hists__new_col_len(hists, HISTC_MEM_BLOCKED, 10);
+	hists__new_col_len(hists, HISTC_LOCAL_INS_LAT, 13);
+	hists__new_col_len(hists, HISTC_GLOBAL_INS_LAT, 13);
 	if (symbol_conf.nanosecs)
 		hists__new_col_len(hists, HISTC_TIME, 16);
 	else
 		hists__new_col_len(hists, HISTC_TIME, 12);
+	hists__new_col_len(hists, HISTC_CODE_PAGE_SIZE, 6);
 
 	if (h->srcline) {
 		len = MAX(strlen(h->srcline), strlen(sort_srcline.se_header));
@@ -285,12 +289,13 @@
 }
 
 static void he_stat__add_period(struct he_stat *he_stat, u64 period,
-				u64 weight)
+				u64 weight, u64 ins_lat)
 {
 
 	he_stat->period		+= period;
 	he_stat->weight		+= weight;
 	he_stat->nr_events	+= 1;
+	he_stat->ins_lat	+= ins_lat;
 }
 
 static void he_stat__add_stat(struct he_stat *dest, struct he_stat *src)
@@ -302,6 +307,7 @@
 	dest->period_guest_us	+= src->period_guest_us;
 	dest->nr_events		+= src->nr_events;
 	dest->weight		+= src->weight;
+	dest->ins_lat		+= src->ins_lat;
 }
 
 static void he_stat__decay(struct he_stat *he_stat)
@@ -590,6 +596,7 @@
 	int64_t cmp;
 	u64 period = entry->stat.period;
 	u64 weight = entry->stat.weight;
+	u64 ins_lat = entry->stat.ins_lat;
 	bool leftmost = true;
 
 	p = &hists->entries_in->rb_root.rb_node;
@@ -608,11 +615,11 @@
 
 		if (!cmp) {
 			if (sample_self) {
-				he_stat__add_period(&he->stat, period, weight);
+				he_stat__add_period(&he->stat, period, weight, ins_lat);
 				hist_entry__add_callchain_period(he, period);
 			}
 			if (symbol_conf.cumulate_callchain)
-				he_stat__add_period(he->stat_acc, period, weight);
+				he_stat__add_period(he->stat_acc, period, weight, ins_lat);
 
 			/*
 			 * This mem info was allocated from sample__resolve_mem
@@ -718,10 +725,12 @@
 		.cpumode = al->cpumode,
 		.ip	 = al->addr,
 		.level	 = al->level,
+		.code_page_size = sample->code_page_size,
 		.stat = {
 			.nr_events = 1,
 			.period	= sample->period,
 			.weight = sample->weight,
+			.ins_lat = sample->ins_lat,
 		},
 		.parent = sym_parent,
 		.filtered = symbol__parent_filter(sym_parent) | al->filtered,
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 14f6633..3c53723 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -53,6 +53,7 @@
 	HISTC_DSO_TO,
 	HISTC_LOCAL_WEIGHT,
 	HISTC_GLOBAL_WEIGHT,
+	HISTC_CODE_PAGE_SIZE,
 	HISTC_MEM_DADDR_SYMBOL,
 	HISTC_MEM_DADDR_DSO,
 	HISTC_MEM_PHYS_DADDR,
@@ -71,6 +72,9 @@
 	HISTC_SYM_SIZE,
 	HISTC_DSO_SIZE,
 	HISTC_SYMBOL_IPC,
+	HISTC_MEM_BLOCKED,
+	HISTC_LOCAL_INS_LAT,
+	HISTC_GLOBAL_INS_LAT,
 	HISTC_NR_COLS, /* Last entry */
 };
 
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index 697513f..8c59677 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -24,6 +24,13 @@
 #include "intel-pt-decoder.h"
 #include "intel-pt-log.h"
 
+#define BITULL(x) (1ULL << (x))
+
+/* IA32_RTIT_CTL MSR bits */
+#define INTEL_PT_CYC_ENABLE		BITULL(1)
+#define INTEL_PT_CYC_THRESHOLD		(BITULL(22) | BITULL(21) | BITULL(20) | BITULL(19))
+#define INTEL_PT_CYC_THRESHOLD_SHIFT	19
+
 #define INTEL_PT_BLK_SIZE 1024
 
 #define BIT63 (((uint64_t)1 << 63))
@@ -55,6 +62,7 @@
 	INTEL_PT_STATE_TIP_PGD,
 	INTEL_PT_STATE_FUP,
 	INTEL_PT_STATE_FUP_NO_TIP,
+	INTEL_PT_STATE_FUP_IN_PSB,
 	INTEL_PT_STATE_RESAMPLE,
 };
 
@@ -73,6 +81,7 @@
 	case INTEL_PT_STATE_TIP_PGD:
 	case INTEL_PT_STATE_FUP:
 	case INTEL_PT_STATE_FUP_NO_TIP:
+	case INTEL_PT_STATE_FUP_IN_PSB:
 		return false;
 	default:
 		return true;
@@ -112,13 +121,14 @@
 	bool have_last_ip;
 	bool in_psb;
 	bool hop;
-	bool hop_psb_fup;
 	bool leap;
+	bool nr;
+	bool next_nr;
 	enum intel_pt_param_flags flags;
 	uint64_t pos;
 	uint64_t last_ip;
 	uint64_t ip;
-	uint64_t cr3;
+	uint64_t pip_payload;
 	uint64_t timestamp;
 	uint64_t tsc_timestamp;
 	uint64_t ref_timestamp;
@@ -167,6 +177,8 @@
 	uint64_t sample_tot_cyc_cnt;
 	uint64_t base_cyc_cnt;
 	uint64_t cyc_cnt_timestamp;
+	uint64_t ctl;
+	uint64_t cyc_threshold;
 	double tsc_to_cyc;
 	bool continuous_period;
 	bool overflow;
@@ -189,6 +201,7 @@
 	int no_progress;
 	int stuck_ip_prd;
 	int stuck_ip_cnt;
+	uint64_t psb_ip;
 	const unsigned char *next_buf;
 	size_t next_len;
 	unsigned char temp_buf[INTEL_PT_PKT_MAX_SZ];
@@ -204,6 +217,14 @@
 	return x << i;
 }
 
+static uint64_t intel_pt_cyc_threshold(uint64_t ctl)
+{
+	if (!(ctl & INTEL_PT_CYC_ENABLE))
+		return 0;
+
+	return (ctl & INTEL_PT_CYC_THRESHOLD) >> INTEL_PT_CYC_THRESHOLD_SHIFT;
+}
+
 static void intel_pt_setup_period(struct intel_pt_decoder *decoder)
 {
 	if (decoder->period_type == INTEL_PT_PERIOD_TICKS) {
@@ -245,12 +266,15 @@
 
 	decoder->flags              = params->flags;
 
+	decoder->ctl                = params->ctl;
 	decoder->period             = params->period;
 	decoder->period_type        = params->period_type;
 
 	decoder->max_non_turbo_ratio    = params->max_non_turbo_ratio;
 	decoder->max_non_turbo_ratio_fp = params->max_non_turbo_ratio;
 
+	decoder->cyc_threshold = intel_pt_cyc_threshold(decoder->ctl);
+
 	intel_pt_setup_period(decoder);
 
 	decoder->mtc_shift = params->mtc_period;
@@ -481,6 +505,28 @@
 	decoder->tx_flags = decoder->packet.payload & INTEL_PT_IN_TX;
 }
 
+static inline void intel_pt_update_pip(struct intel_pt_decoder *decoder)
+{
+	decoder->pip_payload = decoder->packet.payload;
+}
+
+static inline void intel_pt_update_nr(struct intel_pt_decoder *decoder)
+{
+	decoder->next_nr = decoder->pip_payload & 1;
+}
+
+static inline void intel_pt_set_nr(struct intel_pt_decoder *decoder)
+{
+	decoder->nr = decoder->pip_payload & 1;
+	decoder->next_nr = decoder->nr;
+}
+
+static inline void intel_pt_set_pip(struct intel_pt_decoder *decoder)
+{
+	intel_pt_update_pip(decoder);
+	intel_pt_set_nr(decoder);
+}
+
 static int intel_pt_bad_packet(struct intel_pt_decoder *decoder)
 {
 	intel_pt_clear_tx_flags(decoder);
@@ -1218,6 +1264,7 @@
 		decoder->continuous_period = false;
 		decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
 		decoder->state.type |= INTEL_PT_TRACE_END;
+		intel_pt_update_nr(decoder);
 		return 0;
 	}
 	if (err == INTEL_PT_RETURN)
@@ -1225,6 +1272,8 @@
 	if (err)
 		return err;
 
+	intel_pt_update_nr(decoder);
+
 	if (intel_pt_insn.branch == INTEL_PT_BR_INDIRECT) {
 		if (decoder->pkt_state == INTEL_PT_STATE_TIP_PGD) {
 			decoder->pge = false;
@@ -1337,6 +1386,7 @@
 			decoder->state.from_ip = decoder->ip;
 			decoder->state.to_ip = decoder->last_ip;
 			decoder->ip = decoder->last_ip;
+			intel_pt_update_nr(decoder);
 			return 0;
 		}
 
@@ -1461,6 +1511,7 @@
 {
 	intel_pt_log("ERROR: Buffer overflow\n");
 	intel_pt_clear_tx_flags(decoder);
+	intel_pt_set_nr(decoder);
 	decoder->timestamp_insn_cnt = 0;
 	decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
 	decoder->overflow = true;
@@ -1735,18 +1786,14 @@
 			break;
 
 		case INTEL_PT_PIP:
-			decoder->cr3 = decoder->packet.payload & (BIT63 - 1);
+			intel_pt_set_pip(decoder);
 			break;
 
 		case INTEL_PT_FUP:
 			decoder->pge = true;
 			if (decoder->packet.count) {
 				intel_pt_set_last_ip(decoder);
-				if (decoder->hop) {
-					/* Act on FUP at PSBEND */
-					decoder->ip = decoder->last_ip;
-					decoder->hop_psb_fup = true;
-				}
+				decoder->psb_ip = decoder->last_ip;
 			}
 			break;
 
@@ -1761,6 +1808,9 @@
 			break;
 
 		case INTEL_PT_CYC:
+			intel_pt_calc_cyc_timestamp(decoder);
+			break;
+
 		case INTEL_PT_VMCS:
 		case INTEL_PT_MNT:
 		case INTEL_PT_PAD:
@@ -1835,6 +1885,7 @@
 			decoder->pge = false;
 			decoder->continuous_period = false;
 			decoder->state.type |= INTEL_PT_TRACE_END;
+			intel_pt_update_nr(decoder);
 			return 0;
 
 		case INTEL_PT_TIP_PGE:
@@ -1850,6 +1901,7 @@
 			}
 			decoder->state.type |= INTEL_PT_TRACE_BEGIN;
 			intel_pt_mtc_cyc_cnt_pge(decoder);
+			intel_pt_set_nr(decoder);
 			return 0;
 
 		case INTEL_PT_TIP:
@@ -1860,10 +1912,11 @@
 				intel_pt_set_ip(decoder);
 				decoder->state.to_ip = decoder->ip;
 			}
+			intel_pt_update_nr(decoder);
 			return 0;
 
 		case INTEL_PT_PIP:
-			decoder->cr3 = decoder->packet.payload & (BIT63 - 1);
+			intel_pt_update_pip(decoder);
 			break;
 
 		case INTEL_PT_MTC:
@@ -1922,21 +1975,27 @@
 		return HOP_IGNORE;
 
 	case INTEL_PT_TIP_PGD:
-		if (!decoder->packet.count)
+		if (!decoder->packet.count) {
+			intel_pt_set_nr(decoder);
 			return HOP_IGNORE;
+		}
 		intel_pt_set_ip(decoder);
 		decoder->state.type |= INTEL_PT_TRACE_END;
 		decoder->state.from_ip = 0;
 		decoder->state.to_ip = decoder->ip;
+		intel_pt_update_nr(decoder);
 		return HOP_RETURN;
 
 	case INTEL_PT_TIP:
-		if (!decoder->packet.count)
+		if (!decoder->packet.count) {
+			intel_pt_set_nr(decoder);
 			return HOP_IGNORE;
+		}
 		intel_pt_set_ip(decoder);
 		decoder->state.type = INTEL_PT_INSTRUCTION;
 		decoder->state.from_ip = decoder->ip;
 		decoder->state.to_ip = 0;
+		intel_pt_update_nr(decoder);
 		return HOP_RETURN;
 
 	case INTEL_PT_FUP:
@@ -1959,26 +2018,23 @@
 		return HOP_RETURN;
 
 	case INTEL_PT_PSB:
+		decoder->state.psb_offset = decoder->pos;
+		decoder->psb_ip = 0;
 		decoder->last_ip = 0;
 		decoder->have_last_ip = true;
-		decoder->hop_psb_fup = false;
 		*err = intel_pt_walk_psbend(decoder);
 		if (*err == -EAGAIN)
 			return HOP_AGAIN;
 		if (*err)
 			return HOP_RETURN;
-		if (decoder->hop_psb_fup) {
-			decoder->hop_psb_fup = false;
-			decoder->state.type = INTEL_PT_INSTRUCTION;
-			decoder->state.from_ip = decoder->ip;
-			decoder->state.to_ip = 0;
-			return HOP_RETURN;
+		decoder->state.type = INTEL_PT_PSB_EVT;
+		if (decoder->psb_ip) {
+			decoder->state.type |= INTEL_PT_INSTRUCTION;
+			decoder->ip = decoder->psb_ip;
 		}
-		if (decoder->cbr != decoder->cbr_seen) {
-			decoder->state.type = 0;
-			return HOP_RETURN;
-		}
-		return HOP_IGNORE;
+		decoder->state.from_ip = decoder->psb_ip;
+		decoder->state.to_ip = 0;
+		return HOP_RETURN;
 
 	case INTEL_PT_BAD:
 	case INTEL_PT_PAD:
@@ -2012,8 +2068,151 @@
 	}
 }
 
+struct intel_pt_psb_info {
+	struct intel_pt_pkt fup_packet;
+	bool fup;
+	int after_psbend;
+};
+
+/* Lookahead and get the FUP packet from PSB+ */
+static int intel_pt_psb_lookahead_cb(struct intel_pt_pkt_info *pkt_info)
+{
+	struct intel_pt_psb_info *data = pkt_info->data;
+
+	switch (pkt_info->packet.type) {
+	case INTEL_PT_PAD:
+	case INTEL_PT_MNT:
+	case INTEL_PT_TSC:
+	case INTEL_PT_TMA:
+	case INTEL_PT_MODE_EXEC:
+	case INTEL_PT_MODE_TSX:
+	case INTEL_PT_MTC:
+	case INTEL_PT_CYC:
+	case INTEL_PT_VMCS:
+	case INTEL_PT_CBR:
+	case INTEL_PT_PIP:
+		if (data->after_psbend) {
+			data->after_psbend -= 1;
+			if (!data->after_psbend)
+				return 1;
+		}
+		break;
+
+	case INTEL_PT_FUP:
+		if (data->after_psbend)
+			return 1;
+		if (data->fup || pkt_info->packet.count == 0)
+			return 1;
+		data->fup_packet = pkt_info->packet;
+		data->fup = true;
+		break;
+
+	case INTEL_PT_PSBEND:
+		if (!data->fup)
+			return 1;
+		/* Keep going to check for a TIP.PGE */
+		data->after_psbend = 6;
+		break;
+
+	case INTEL_PT_TIP_PGE:
+		/* Ignore FUP in PSB+ if followed by TIP.PGE */
+		if (data->after_psbend)
+			data->fup = false;
+		return 1;
+
+	case INTEL_PT_PTWRITE:
+	case INTEL_PT_PTWRITE_IP:
+	case INTEL_PT_EXSTOP:
+	case INTEL_PT_EXSTOP_IP:
+	case INTEL_PT_MWAIT:
+	case INTEL_PT_PWRE:
+	case INTEL_PT_PWRX:
+	case INTEL_PT_BBP:
+	case INTEL_PT_BIP:
+	case INTEL_PT_BEP:
+	case INTEL_PT_BEP_IP:
+		if (data->after_psbend) {
+			data->after_psbend -= 1;
+			if (!data->after_psbend)
+				return 1;
+			break;
+		}
+		return 1;
+
+	case INTEL_PT_OVF:
+	case INTEL_PT_BAD:
+	case INTEL_PT_TNT:
+	case INTEL_PT_TIP_PGD:
+	case INTEL_PT_TIP:
+	case INTEL_PT_PSB:
+	case INTEL_PT_TRACESTOP:
+	default:
+		return 1;
+	}
+
+	return 0;
+}
+
+static int intel_pt_psb(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	decoder->last_ip = 0;
+	decoder->psb_ip = 0;
+	decoder->have_last_ip = true;
+	intel_pt_clear_stack(&decoder->stack);
+	err = intel_pt_walk_psbend(decoder);
+	if (err)
+		return err;
+	decoder->state.type = INTEL_PT_PSB_EVT;
+	decoder->state.from_ip = decoder->psb_ip;
+	decoder->state.to_ip = 0;
+	return 0;
+}
+
+static int intel_pt_fup_in_psb(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	if (decoder->ip != decoder->last_ip) {
+		err = intel_pt_walk_fup(decoder);
+		if (!err || err != -EAGAIN)
+			return err;
+	}
+
+	decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+	err = intel_pt_psb(decoder);
+	if (err) {
+		decoder->pkt_state = INTEL_PT_STATE_ERR3;
+		return -ENOENT;
+	}
+
+	return 0;
+}
+
+static bool intel_pt_psb_with_fup(struct intel_pt_decoder *decoder, int *err)
+{
+	struct intel_pt_psb_info data = { .fup = false };
+
+	if (!decoder->branch_enable || !decoder->pge)
+		return false;
+
+	intel_pt_pkt_lookahead(decoder, intel_pt_psb_lookahead_cb, &data);
+	if (!data.fup)
+		return false;
+
+	decoder->packet = data.fup_packet;
+	intel_pt_set_last_ip(decoder);
+	decoder->pkt_state = INTEL_PT_STATE_FUP_IN_PSB;
+
+	*err = intel_pt_fup_in_psb(decoder);
+
+	return true;
+}
+
 static int intel_pt_walk_trace(struct intel_pt_decoder *decoder)
 {
+	int last_packet_type = INTEL_PT_PAD;
 	bool no_tip = false;
 	int err;
 
@@ -2022,6 +2221,12 @@
 		if (err)
 			return err;
 next:
+		if (decoder->cyc_threshold) {
+			if (decoder->sample_cyc && last_packet_type != INTEL_PT_CYC)
+				decoder->sample_cyc = false;
+			last_packet_type = decoder->packet.type;
+		}
+
 		if (decoder->hop) {
 			switch (intel_pt_hop_trace(decoder, &no_tip, &err)) {
 			case HOP_IGNORE:
@@ -2055,6 +2260,7 @@
 		case INTEL_PT_TIP_PGE: {
 			decoder->pge = true;
 			intel_pt_mtc_cyc_cnt_pge(decoder);
+			intel_pt_set_nr(decoder);
 			if (decoder->packet.count == 0) {
 				intel_pt_log_at("Skipping zero TIP.PGE",
 						decoder->pos);
@@ -2120,27 +2326,17 @@
 			break;
 
 		case INTEL_PT_PSB:
-			decoder->last_ip = 0;
-			decoder->have_last_ip = true;
-			intel_pt_clear_stack(&decoder->stack);
-			err = intel_pt_walk_psbend(decoder);
+			decoder->state.psb_offset = decoder->pos;
+			decoder->psb_ip = 0;
+			if (intel_pt_psb_with_fup(decoder, &err))
+				return err;
+			err = intel_pt_psb(decoder);
 			if (err == -EAGAIN)
 				goto next;
-			if (err)
-				return err;
-			/*
-			 * PSB+ CBR will not have changed but cater for the
-			 * possibility of another CBR change that gets caught up
-			 * in the PSB+.
-			 */
-			if (decoder->cbr != decoder->cbr_seen) {
-				decoder->state.type = 0;
-				return 0;
-			}
-			break;
+			return err;
 
 		case INTEL_PT_PIP:
-			decoder->cr3 = decoder->packet.payload & (BIT63 - 1);
+			intel_pt_update_pip(decoder);
 			break;
 
 		case INTEL_PT_MTC:
@@ -2351,6 +2547,7 @@
 				uint64_t current_ip = decoder->ip;
 
 				intel_pt_set_ip(decoder);
+				decoder->psb_ip = decoder->ip;
 				if (current_ip)
 					intel_pt_log_to("Setting IP",
 							decoder->ip);
@@ -2378,7 +2575,7 @@
 			break;
 
 		case INTEL_PT_PIP:
-			decoder->cr3 = decoder->packet.payload & (BIT63 - 1);
+			intel_pt_set_pip(decoder);
 			break;
 
 		case INTEL_PT_MODE_EXEC:
@@ -2497,7 +2694,7 @@
 			break;
 
 		case INTEL_PT_PIP:
-			decoder->cr3 = decoder->packet.payload & (BIT63 - 1);
+			intel_pt_set_pip(decoder);
 			break;
 
 		case INTEL_PT_MODE_EXEC:
@@ -2522,18 +2719,18 @@
 			break;
 
 		case INTEL_PT_PSB:
+			decoder->state.psb_offset = decoder->pos;
+			decoder->psb_ip = 0;
 			decoder->last_ip = 0;
 			decoder->have_last_ip = true;
 			intel_pt_clear_stack(&decoder->stack);
 			err = intel_pt_walk_psb(decoder);
 			if (err)
 				return err;
-			if (decoder->ip) {
-				/* Do not have a sample */
-				decoder->state.type = 0;
-				return 0;
-			}
-			break;
+			decoder->state.type = INTEL_PT_PSB_EVT;
+			decoder->state.from_ip = decoder->psb_ip;
+			decoder->state.to_ip = 0;
+			return 0;
 
 		case INTEL_PT_TNT:
 		case INTEL_PT_PSBEND:
@@ -2577,7 +2774,7 @@
 
 	intel_pt_log("Scanning for full IP\n");
 	err = intel_pt_walk_to_ip(decoder);
-	if (err)
+	if (err || ((decoder->state.type & INTEL_PT_PSB_EVT) && !decoder->ip))
 		return err;
 
 	/* In hop mode, resample to get the to_ip as an "instruction" sample */
@@ -2689,10 +2886,10 @@
 	decoder->continuous_period = false;
 	decoder->have_last_ip = false;
 	decoder->last_ip = 0;
+	decoder->psb_ip = 0;
 	decoder->ip = 0;
 	intel_pt_clear_stack(&decoder->stack);
 
-leap:
 	err = intel_pt_scan_for_psb(decoder);
 	if (err)
 		return err;
@@ -2704,8 +2901,11 @@
 	if (err)
 		return err;
 
+	decoder->state.type = INTEL_PT_PSB_EVT; /* Only PSB sample */
+	decoder->state.from_ip = decoder->psb_ip;
+	decoder->state.to_ip = 0;
+
 	if (decoder->ip) {
-		decoder->state.type = 0; /* Do not have a sample */
 		/*
 		 * In hop mode, resample to get the PSB FUP ip as an
 		 * "instruction" sample.
@@ -2714,14 +2914,6 @@
 			decoder->pkt_state = INTEL_PT_STATE_RESAMPLE;
 		else
 			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
-	} else if (decoder->leap) {
-		/*
-		 * In leap mode, only PSB+ is decoded, so keeping leaping to the
-		 * next PSB until there is an ip.
-		 */
-		goto leap;
-	} else {
-		return intel_pt_sync_ip(decoder);
 	}
 
 	return 0;
@@ -2783,6 +2975,9 @@
 			if (err == -EAGAIN)
 				err = intel_pt_walk_trace(decoder);
 			break;
+		case INTEL_PT_STATE_FUP_IN_PSB:
+			err = intel_pt_fup_in_psb(decoder);
+			break;
 		case INTEL_PT_STATE_RESAMPLE:
 			err = intel_pt_resample(decoder);
 			break;
@@ -2797,6 +2992,7 @@
 		decoder->state.from_ip = decoder->ip;
 		intel_pt_update_sample_time(decoder);
 		decoder->sample_tot_cyc_cnt = decoder->tot_cyc_cnt;
+		intel_pt_set_nr(decoder);
 	} else {
 		decoder->state.err = 0;
 		if (decoder->cbr != decoder->cbr_seen) {
@@ -2811,14 +3007,30 @@
 		}
 		if (intel_pt_sample_time(decoder->pkt_state)) {
 			intel_pt_update_sample_time(decoder);
-			if (decoder->sample_cyc)
+			if (decoder->sample_cyc) {
 				decoder->sample_tot_cyc_cnt = decoder->tot_cyc_cnt;
+				decoder->state.flags |= INTEL_PT_SAMPLE_IPC;
+				decoder->sample_cyc = false;
+			}
 		}
+		/*
+		 * When using only TSC/MTC to compute cycles, IPC can be
+		 * sampled as soon as the cycle count changes.
+		 */
+		if (!decoder->have_cyc)
+			decoder->state.flags |= INTEL_PT_SAMPLE_IPC;
 	}
 
+	 /* Let PSB event always have TSC timestamp */
+	if ((decoder->state.type & INTEL_PT_PSB_EVT) && decoder->tsc_timestamp)
+		decoder->sample_timestamp = decoder->tsc_timestamp;
+
+	decoder->state.from_nr = decoder->nr;
+	decoder->state.to_nr = decoder->next_nr;
+	decoder->nr = decoder->next_nr;
+
 	decoder->state.timestamp = decoder->sample_timestamp;
 	decoder->state.est_timestamp = intel_pt_est_timestamp(decoder);
-	decoder->state.cr3 = decoder->cr3;
 	decoder->state.tot_insn_cnt = decoder->tot_insn_cnt;
 	decoder->state.tot_cyc_cnt = decoder->sample_tot_cyc_cnt;
 
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
index 8645fc2..d9e62a7 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
@@ -17,6 +17,7 @@
 #define INTEL_PT_ABORT_TX	(1 << 1)
 #define INTEL_PT_ASYNC		(1 << 2)
 #define INTEL_PT_FUP_IP		(1 << 3)
+#define INTEL_PT_SAMPLE_IPC	(1 << 4)
 
 enum intel_pt_sample_type {
 	INTEL_PT_BRANCH		= 1 << 0,
@@ -31,6 +32,7 @@
 	INTEL_PT_TRACE_BEGIN	= 1 << 9,
 	INTEL_PT_TRACE_END	= 1 << 10,
 	INTEL_PT_BLK_ITEMS	= 1 << 11,
+	INTEL_PT_PSB_EVT	= 1 << 12,
 };
 
 enum intel_pt_period_type {
@@ -199,10 +201,11 @@
 
 struct intel_pt_state {
 	enum intel_pt_sample_type type;
+	bool from_nr;
+	bool to_nr;
 	int err;
 	uint64_t from_ip;
 	uint64_t to_ip;
-	uint64_t cr3;
 	uint64_t tot_insn_cnt;
 	uint64_t tot_cyc_cnt;
 	uint64_t timestamp;
@@ -213,6 +216,7 @@
 	uint64_t pwre_payload;
 	uint64_t pwrx_payload;
 	uint64_t cbr_payload;
+	uint64_t psb_offset;
 	uint32_t cbr;
 	uint32_t flags;
 	enum intel_pt_insn_op insn_op;
@@ -243,6 +247,7 @@
 	void *data;
 	bool return_compression;
 	bool branch_enable;
+	uint64_t ctl;
 	uint64_t period;
 	enum intel_pt_period_type period_type;
 	unsigned max_non_turbo_ratio;
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
index fb8a355..2f6cc7e 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
@@ -43,6 +43,17 @@
 	switch (insn->opcode.bytes[0]) {
 	case 0xf:
 		switch (insn->opcode.bytes[1]) {
+		case 0x01:
+			switch (insn->modrm.bytes[0]) {
+			case 0xc2: /* vmlaunch */
+			case 0xc3: /* vmresume */
+				op = INTEL_PT_OP_VMENTRY;
+				branch = INTEL_PT_BR_INDIRECT;
+				break;
+			default:
+				break;
+			}
+			break;
 		case 0x05: /* syscall */
 		case 0x34: /* sysenter */
 			op = INTEL_PT_OP_SYSCALL;
@@ -213,6 +224,7 @@
 	[INTEL_PT_OP_INT]	= "Int",
 	[INTEL_PT_OP_SYSCALL]	= "Syscall",
 	[INTEL_PT_OP_SYSRET]	= "Sysret",
+	[INTEL_PT_OP_VMENTRY]	= "VMentry",
 };
 
 const char *intel_pt_insn_name(enum intel_pt_insn_op op)
@@ -267,6 +279,9 @@
 	case INTEL_PT_OP_SYSRET:
 		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN |
 		       PERF_IP_FLAG_SYSCALLRET;
+	case INTEL_PT_OP_VMENTRY:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+		       PERF_IP_FLAG_VMENTRY;
 	default:
 		return 0;
 	}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
index 95a1eb0..c2861cf 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
@@ -24,6 +24,7 @@
 	INTEL_PT_OP_INT,
 	INTEL_PT_OP_SYSCALL,
 	INTEL_PT_OP_SYSRET,
+	INTEL_PT_OP_VMENTRY,
 };
 
 enum intel_pt_insn_branch {
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
index 4ce1099..02a3395 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
@@ -16,8 +16,6 @@
 
 #define BIT63		((uint64_t)1 << 63)
 
-#define NR_FLAG		BIT63
-
 #if __BYTE_ORDER == __BIG_ENDIAN
 #define le16_to_cpu bswap_16
 #define le32_to_cpu bswap_32
@@ -106,9 +104,7 @@
 
 	packet->type = INTEL_PT_PIP;
 	memcpy_le64(&payload, buf + 2, 6);
-	packet->payload = payload >> 1;
-	if (payload & 1)
-		packet->payload |= NR_FLAG;
+	packet->payload = payload;
 
 	return 8;
 }
@@ -719,10 +715,10 @@
 				name, (unsigned)(payload >> 1) & 1,
 				(unsigned)payload & 1);
 	case INTEL_PT_PIP:
-		nr = packet->payload & NR_FLAG ? 1 : 0;
-		payload &= ~NR_FLAG;
+		nr = packet->payload & INTEL_PT_VMX_NR_FLAG ? 1 : 0;
+		payload &= ~INTEL_PT_VMX_NR_FLAG;
 		ret = snprintf(buf, buf_len, "%s 0x%llx (NR=%d)",
-			       name, payload, nr);
+			       name, payload >> 1, nr);
 		return ret;
 	case INTEL_PT_PTWRITE:
 		return snprintf(buf, buf_len, "%s 0x%llx IP:0", name, payload);
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
index 17ca9b5..996090c 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
@@ -21,6 +21,8 @@
 
 #define INTEL_PT_PKT_MAX_SZ		16
 
+#define INTEL_PT_VMX_NR_FLAG		1
+
 enum intel_pt_pkt_type {
 	INTEL_PT_BAD,
 	INTEL_PT_PAD,
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 60214de..f6e28ac 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -108,6 +108,7 @@
 	u64 exstop_id;
 	u64 pwrx_id;
 	u64 cbr_id;
+	u64 psb_id;
 
 	bool sample_pebs;
 	struct evsel *pebs_evsel;
@@ -162,6 +163,9 @@
 	int switch_state;
 	pid_t next_tid;
 	struct thread *thread;
+	struct machine *guest_machine;
+	struct thread *unknown_guest_thread;
+	pid_t guest_machine_pid;
 	bool exclude_kernel;
 	bool have_sample;
 	u64 time;
@@ -549,13 +553,59 @@
 	auxtrace_cache__remove(dso->auxtrace_cache, offset);
 }
 
-static inline u8 intel_pt_cpumode(struct intel_pt *pt, uint64_t ip)
+static inline bool intel_pt_guest_kernel_ip(uint64_t ip)
 {
-	return ip >= pt->kernel_start ?
+	/* Assumes 64-bit kernel */
+	return ip & (1ULL << 63);
+}
+
+static inline u8 intel_pt_nr_cpumode(struct intel_pt_queue *ptq, uint64_t ip, bool nr)
+{
+	if (nr) {
+		return intel_pt_guest_kernel_ip(ip) ?
+		       PERF_RECORD_MISC_GUEST_KERNEL :
+		       PERF_RECORD_MISC_GUEST_USER;
+	}
+
+	return ip >= ptq->pt->kernel_start ?
 	       PERF_RECORD_MISC_KERNEL :
 	       PERF_RECORD_MISC_USER;
 }
 
+static inline u8 intel_pt_cpumode(struct intel_pt_queue *ptq, uint64_t from_ip, uint64_t to_ip)
+{
+	/* No support for non-zero CS base */
+	if (from_ip)
+		return intel_pt_nr_cpumode(ptq, from_ip, ptq->state->from_nr);
+	return intel_pt_nr_cpumode(ptq, to_ip, ptq->state->to_nr);
+}
+
+static int intel_pt_get_guest(struct intel_pt_queue *ptq)
+{
+	struct machines *machines = &ptq->pt->session->machines;
+	struct machine *machine;
+	pid_t pid = ptq->pid <= 0 ? DEFAULT_GUEST_KERNEL_ID : ptq->pid;
+
+	if (ptq->guest_machine && pid == ptq->guest_machine_pid)
+		return 0;
+
+	ptq->guest_machine = NULL;
+	thread__zput(ptq->unknown_guest_thread);
+
+	machine = machines__find_guest(machines, pid);
+	if (!machine)
+		return -1;
+
+	ptq->unknown_guest_thread = machine__idle_thread(machine);
+	if (!ptq->unknown_guest_thread)
+		return -1;
+
+	ptq->guest_machine = machine;
+	ptq->guest_machine_pid = pid;
+
+	return 0;
+}
+
 static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
 				   uint64_t *insn_cnt_ptr, uint64_t *ip,
 				   uint64_t to_ip, uint64_t max_insn_cnt,
@@ -572,19 +622,29 @@
 	u64 offset, start_offset, start_ip;
 	u64 insn_cnt = 0;
 	bool one_map = true;
+	bool nr;
 
 	intel_pt_insn->length = 0;
 
 	if (to_ip && *ip == to_ip)
 		goto out_no_cache;
 
-	cpumode = intel_pt_cpumode(ptq->pt, *ip);
+	nr = ptq->state->to_nr;
+	cpumode = intel_pt_nr_cpumode(ptq, *ip, nr);
 
-	thread = ptq->thread;
-	if (!thread) {
-		if (cpumode != PERF_RECORD_MISC_KERNEL)
+	if (nr) {
+		if (cpumode != PERF_RECORD_MISC_GUEST_KERNEL ||
+		    intel_pt_get_guest(ptq))
 			return -EINVAL;
-		thread = ptq->pt->unknown_thread;
+		machine = ptq->guest_machine;
+		thread = ptq->unknown_guest_thread;
+	} else {
+		thread = ptq->thread;
+		if (!thread) {
+			if (cpumode != PERF_RECORD_MISC_KERNEL)
+				return -EINVAL;
+			thread = ptq->pt->unknown_thread;
+		}
 	}
 
 	while (1) {
@@ -732,8 +792,14 @@
 	u8 cpumode;
 	u64 offset;
 
-	if (ip >= ptq->pt->kernel_start)
+	if (ptq->state->to_nr) {
+		if (intel_pt_guest_kernel_ip(ip))
+			return intel_pt_match_pgd_ip(ptq->pt, ip, ip, NULL);
+		/* No support for decoding guest user space */
+		return -EINVAL;
+	} else if (ip >= ptq->pt->kernel_start) {
 		return intel_pt_match_pgd_ip(ptq->pt, ip, ip, NULL);
+	}
 
 	cpumode = PERF_RECORD_MISC_USER;
 
@@ -893,6 +959,18 @@
 	return false;
 }
 
+static u64 intel_pt_ctl(struct intel_pt *pt)
+{
+	struct evsel *evsel;
+	u64 config;
+
+	evlist__for_each_entry(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->core.attr, &config))
+			return config;
+	}
+	return 0;
+}
+
 static u64 intel_pt_ns_to_ticks(const struct intel_pt *pt, u64 ns)
 {
 	u64 quot, rem;
@@ -1026,6 +1104,7 @@
 	params.data = ptq;
 	params.return_compression = intel_pt_return_compression(pt);
 	params.branch_enable = intel_pt_branch_enable(pt);
+	params.ctl = intel_pt_ctl(pt);
 	params.max_non_turbo_ratio = pt->max_non_turbo_ratio;
 	params.mtc_period = intel_pt_mtc_period(pt);
 	params.tsc_ctc_ratio_n = pt->tsc_ctc_ratio_n;
@@ -1087,6 +1166,7 @@
 	if (!ptq)
 		return;
 	thread__zput(ptq->thread);
+	thread__zput(ptq->unknown_guest_thread);
 	intel_pt_decoder_free(ptq->decoder);
 	zfree(&ptq->event_buf);
 	zfree(&ptq->last_branch);
@@ -1121,13 +1201,16 @@
 	if (ptq->state->flags & INTEL_PT_ABORT_TX) {
 		ptq->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TX_ABORT;
 	} else if (ptq->state->flags & INTEL_PT_ASYNC) {
-		if (ptq->state->to_ip)
+		if (!ptq->state->to_ip)
+			ptq->flags = PERF_IP_FLAG_BRANCH |
+				     PERF_IP_FLAG_TRACE_END;
+		else if (ptq->state->from_nr && !ptq->state->to_nr)
+			ptq->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+				     PERF_IP_FLAG_VMEXIT;
+		else
 			ptq->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
 				     PERF_IP_FLAG_ASYNC |
 				     PERF_IP_FLAG_INTERRUPT;
-		else
-			ptq->flags = PERF_IP_FLAG_BRANCH |
-				     PERF_IP_FLAG_TRACE_END;
 		ptq->insn_len = 0;
 	} else {
 		if (ptq->state->from_ip)
@@ -1301,8 +1384,8 @@
 		sample->time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
 
 	sample->ip = ptq->state->from_ip;
-	sample->cpumode = intel_pt_cpumode(pt, sample->ip);
 	sample->addr = ptq->state->to_ip;
+	sample->cpumode = intel_pt_cpumode(ptq, sample->ip, sample->addr);
 	sample->period = 1;
 	sample->flags = ptq->flags;
 
@@ -1381,7 +1464,8 @@
 		sample.branch_stack = (struct branch_stack *)&dummy_bs;
 	}
 
-	sample.cyc_cnt = ptq->ipc_cyc_cnt - ptq->last_br_cyc_cnt;
+	if (ptq->state->flags & INTEL_PT_SAMPLE_IPC)
+		sample.cyc_cnt = ptq->ipc_cyc_cnt - ptq->last_br_cyc_cnt;
 	if (sample.cyc_cnt) {
 		sample.insn_cnt = ptq->ipc_insn_cnt - ptq->last_br_insn_cnt;
 		ptq->last_br_insn_cnt = ptq->ipc_insn_cnt;
@@ -1431,7 +1515,8 @@
 	else
 		sample.period = ptq->state->tot_insn_cnt - ptq->last_insn_cnt;
 
-	sample.cyc_cnt = ptq->ipc_cyc_cnt - ptq->last_in_cyc_cnt;
+	if (ptq->state->flags & INTEL_PT_SAMPLE_IPC)
+		sample.cyc_cnt = ptq->ipc_cyc_cnt - ptq->last_in_cyc_cnt;
 	if (sample.cyc_cnt) {
 		sample.insn_cnt = ptq->ipc_insn_cnt - ptq->last_in_insn_cnt;
 		ptq->last_in_insn_cnt = ptq->ipc_insn_cnt;
@@ -1533,6 +1618,32 @@
 					    pt->pwr_events_sample_type);
 }
 
+static int intel_pt_synth_psb_sample(struct intel_pt_queue *ptq)
+{
+	struct intel_pt *pt = ptq->pt;
+	union perf_event *event = ptq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+	struct perf_synth_intel_psb raw;
+
+	if (intel_pt_skip_event(pt))
+		return 0;
+
+	intel_pt_prep_p_sample(pt, ptq, event, &sample);
+
+	sample.id = ptq->pt->psb_id;
+	sample.stream_id = ptq->pt->psb_id;
+	sample.flags = 0;
+
+	raw.reserved = 0;
+	raw.offset = ptq->state->psb_offset;
+
+	sample.raw_size = perf_synth__raw_size(raw);
+	sample.raw_data = perf_synth__raw_data(&raw);
+
+	return intel_pt_deliver_synth_event(pt, event, &sample,
+					    pt->pwr_events_sample_type);
+}
+
 static int intel_pt_synth_mwait_sample(struct intel_pt_queue *ptq)
 {
 	struct intel_pt *pt = ptq->pt;
@@ -1791,10 +1902,7 @@
 	else
 		sample.ip = ptq->state->from_ip;
 
-	/* No support for guest mode at this time */
-	cpumode = sample.ip < ptq->pt->kernel_start ?
-		  PERF_RECORD_MISC_USER :
-		  PERF_RECORD_MISC_KERNEL;
+	cpumode = intel_pt_cpumode(ptq, sample.ip, 0);
 
 	event->sample.header.misc = cpumode | PERF_RECORD_MISC_EXACT_IP;
 
@@ -1853,13 +1961,30 @@
 	if (sample_type & PERF_SAMPLE_ADDR && items->has_mem_access_address)
 		sample.addr = items->mem_access_address;
 
-	if (sample_type & PERF_SAMPLE_WEIGHT) {
+	if (sample_type & PERF_SAMPLE_WEIGHT_TYPE) {
 		/*
 		 * Refer kernel's setup_pebs_adaptive_sample_data() and
 		 * intel_hsw_weight().
 		 */
-		if (items->has_mem_access_latency)
-			sample.weight = items->mem_access_latency;
+		if (items->has_mem_access_latency) {
+			u64 weight = items->mem_access_latency >> 32;
+
+			/*
+			 * Starts from SPR, the mem access latency field
+			 * contains both cache latency [47:32] and instruction
+			 * latency [15:0]. The cache latency is the same as the
+			 * mem access latency on previous platforms.
+			 *
+			 * In practice, no memory access could last than 4G
+			 * cycles. Use latency >> 32 to distinguish the
+			 * different format of the mem access latency field.
+			 */
+			if (weight > 0) {
+				sample.weight = weight & 0xffff;
+				sample.ins_lat = items->mem_access_latency & 0xffff;
+			} else
+				sample.weight = items->mem_access_latency;
+		}
 		if (!sample.weight && items->has_tsx_aux_info) {
 			/* Cycles last block */
 			sample.weight = (u32)items->tsx_aux_info;
@@ -1966,14 +2091,8 @@
 
 	ptq->have_sample = false;
 
-	if (ptq->state->tot_cyc_cnt > ptq->ipc_cyc_cnt) {
-		/*
-		 * Cycle count and instruction count only go together to create
-		 * a valid IPC ratio when the cycle count changes.
-		 */
-		ptq->ipc_insn_cnt = ptq->state->tot_insn_cnt;
-		ptq->ipc_cyc_cnt = ptq->state->tot_cyc_cnt;
-	}
+	ptq->ipc_insn_cnt = ptq->state->tot_insn_cnt;
+	ptq->ipc_cyc_cnt = ptq->state->tot_cyc_cnt;
 
 	/*
 	 * Do PEBS first to allow for the possibility that the PEBS timestamp
@@ -1986,6 +2105,11 @@
 	}
 
 	if (pt->sample_pwr_events) {
+		if (state->type & INTEL_PT_PSB_EVT) {
+			err = intel_pt_synth_psb_sample(ptq);
+			if (err)
+				return err;
+		}
 		if (ptq->state->cbr != ptq->cbr_seen) {
 			err = intel_pt_synth_cbr_sample(ptq);
 			if (err)
@@ -2047,7 +2171,27 @@
 	}
 
 	if (pt->sample_branches) {
-		err = intel_pt_synth_branch_sample(ptq);
+		if (state->from_nr != state->to_nr &&
+		    state->from_ip && state->to_ip) {
+			struct intel_pt_state *st = (struct intel_pt_state *)state;
+			u64 to_ip = st->to_ip;
+			u64 from_ip = st->from_ip;
+
+			/*
+			 * perf cannot handle having different machines for ip
+			 * and addr, so create 2 branches.
+			 */
+			st->to_ip = 0;
+			err = intel_pt_synth_branch_sample(ptq);
+			if (err)
+				return err;
+			st->from_ip = 0;
+			st->to_ip = to_ip;
+			err = intel_pt_synth_branch_sample(ptq);
+			st->from_ip = from_ip;
+		} else {
+			err = intel_pt_synth_branch_sample(ptq);
+		}
 		if (err)
 			return err;
 	}
@@ -3083,6 +3227,14 @@
 		pt->cbr_id = id;
 		intel_pt_set_event_name(evlist, id, "cbr");
 		id += 1;
+
+		attr.config = PERF_SYNTH_INTEL_PSB;
+		err = intel_pt_synth_event(session, "psb", &attr, id);
+		if (err)
+			return err;
+		pt->psb_id = id;
+		intel_pt_set_event_name(evlist, id, "psb");
+		id += 1;
 	}
 
 	if (pt->synth_opts.pwr_events && (evsel->core.attr.config & 0x10)) {
diff --git a/tools/perf/util/intlist.c b/tools/perf/util/intlist.c
index 84e5304..9340921 100644
--- a/tools/perf/util/intlist.c
+++ b/tools/perf/util/intlist.c
@@ -13,7 +13,7 @@
 static struct rb_node *intlist__node_new(struct rblist *rblist __maybe_unused,
 					 const void *entry)
 {
-	int i = (int)((long)entry);
+	unsigned long i = (unsigned long)entry;
 	struct rb_node *rc = NULL;
 	struct int_node *node = malloc(sizeof(*node));
 
@@ -41,15 +41,20 @@
 
 static int intlist__node_cmp(struct rb_node *rb_node, const void *entry)
 {
-	int i = (int)((long)entry);
+	unsigned long i = (unsigned long)entry;
 	struct int_node *node = container_of(rb_node, struct int_node, rb_node);
 
-	return node->i - i;
+	if (node->i > i)
+		return 1;
+	else if (node->i < i)
+		return -1;
+
+	return 0;
 }
 
-int intlist__add(struct intlist *ilist, int i)
+int intlist__add(struct intlist *ilist, unsigned long i)
 {
-	return rblist__add_node(&ilist->rblist, (void *)((long)i));
+	return rblist__add_node(&ilist->rblist, (void *)i);
 }
 
 void intlist__remove(struct intlist *ilist, struct int_node *node)
@@ -58,7 +63,7 @@
 }
 
 static struct int_node *__intlist__findnew(struct intlist *ilist,
-					   int i, bool create)
+					   unsigned long i, bool create)
 {
 	struct int_node *node = NULL;
 	struct rb_node *rb_node;
@@ -67,9 +72,9 @@
 		return NULL;
 
 	if (create)
-		rb_node = rblist__findnew(&ilist->rblist, (void *)((long)i));
+		rb_node = rblist__findnew(&ilist->rblist, (void *)i);
 	else
-		rb_node = rblist__find(&ilist->rblist, (void *)((long)i));
+		rb_node = rblist__find(&ilist->rblist, (void *)i);
 
 	if (rb_node)
 		node = container_of(rb_node, struct int_node, rb_node);
@@ -77,12 +82,12 @@
 	return node;
 }
 
-struct int_node *intlist__find(struct intlist *ilist, int i)
+struct int_node *intlist__find(struct intlist *ilist, unsigned long i)
 {
 	return __intlist__findnew(ilist, i, false);
 }
 
-struct int_node *intlist__findnew(struct intlist *ilist, int i)
+struct int_node *intlist__findnew(struct intlist *ilist, unsigned long i)
 {
 	return __intlist__findnew(ilist, i, true);
 }
@@ -93,7 +98,7 @@
 	int err;
 
 	do {
-		long value = strtol(s, &sep, 10);
+		unsigned long value = strtol(s, &sep, 10);
 		err = -EINVAL;
 		if (*sep != ',' && *sep != '\0')
 			break;
diff --git a/tools/perf/util/intlist.h b/tools/perf/util/intlist.h
index 5c19ee0..e336b17 100644
--- a/tools/perf/util/intlist.h
+++ b/tools/perf/util/intlist.h
@@ -9,7 +9,7 @@
 
 struct int_node {
 	struct rb_node rb_node;
-	int i;
+	unsigned long i;
 	void *priv;
 };
 
@@ -21,13 +21,13 @@
 void intlist__delete(struct intlist *ilist);
 
 void intlist__remove(struct intlist *ilist, struct int_node *in);
-int intlist__add(struct intlist *ilist, int i);
+int intlist__add(struct intlist *ilist, unsigned long i);
 
 struct int_node *intlist__entry(const struct intlist *ilist, unsigned int idx);
-struct int_node *intlist__find(struct intlist *ilist, int i);
-struct int_node *intlist__findnew(struct intlist *ilist, int i);
+struct int_node *intlist__find(struct intlist *ilist, unsigned long i);
+struct int_node *intlist__findnew(struct intlist *ilist, unsigned long i);
 
-static inline bool intlist__has_entry(struct intlist *ilist, int i)
+static inline bool intlist__has_entry(struct intlist *ilist, unsigned long i)
 {
 	return intlist__find(ilist, i) != NULL;
 }
diff --git a/tools/perf/util/jit.h b/tools/perf/util/jit.h
index 6817ffc..fb810e1 100644
--- a/tools/perf/util/jit.h
+++ b/tools/perf/util/jit.h
@@ -5,7 +5,7 @@
 #include <data.h>
 
 int jit_process(struct perf_session *session, struct perf_data *output,
-		struct machine *machine, char *filename, pid_t pid, u64 *nbytes);
+		struct machine *machine, char *filename, pid_t pid, pid_t tid, u64 *nbytes);
 
 int jit_inject_record(const char *filename);
 
diff --git a/tools/perf/util/jitdump.c b/tools/perf/util/jitdump.c
index 055bab7..9760d8e 100644
--- a/tools/perf/util/jitdump.c
+++ b/tools/perf/util/jitdump.c
@@ -18,6 +18,7 @@
 #include "event.h"
 #include "debug.h"
 #include "evlist.h"
+#include "namespaces.h"
 #include "symbol.h"
 #include <elf.h>
 
@@ -35,6 +36,7 @@
 	struct perf_data *output;
 	struct perf_session *session;
 	struct machine *machine;
+	struct nsinfo  *nsi;
 	union jr_entry   *entry;
 	void             *buf;
 	uint64_t	 sample_type;
@@ -72,7 +74,8 @@
 #define get_jit_tool(t) (container_of(tool, struct jit_tool, tool))
 
 static int
-jit_emit_elf(char *filename,
+jit_emit_elf(struct jit_buf_desc *jd,
+	     char *filename,
 	     const char *sym,
 	     uint64_t code_addr,
 	     const void *code,
@@ -83,14 +86,18 @@
 	     uint32_t unwinding_header_size,
 	     uint32_t unwinding_size)
 {
-	int ret, fd;
+	int ret, fd, saved_errno;
+	struct nscookie nsc;
 
 	if (verbose > 0)
 		fprintf(stderr, "write ELF image %s\n", filename);
 
+	nsinfo__mountns_enter(jd->nsi, &nsc);
 	fd = open(filename, O_CREAT|O_TRUNC|O_WRONLY, 0644);
+	saved_errno = errno;
+	nsinfo__mountns_exit(&nsc);
 	if (fd == -1) {
-		pr_warning("cannot create jit ELF %s: %s\n", filename, strerror(errno));
+		pr_warning("cannot create jit ELF %s: %s\n", filename, strerror(saved_errno));
 		return -1;
 	}
 
@@ -99,8 +106,11 @@
 
         close(fd);
 
-        if (ret)
-                unlink(filename);
+	if (ret) {
+		nsinfo__mountns_enter(jd->nsi, &nsc);
+		unlink(filename);
+		nsinfo__mountns_exit(&nsc);
+	}
 
 	return ret;
 }
@@ -134,12 +144,15 @@
 jit_open(struct jit_buf_desc *jd, const char *name)
 {
 	struct jitheader header;
+	struct nscookie nsc;
 	struct jr_prefix *prefix;
 	ssize_t bs, bsz = 0;
 	void *n, *buf = NULL;
 	int ret, retval = -1;
 
+	nsinfo__mountns_enter(jd->nsi, &nsc);
 	jd->in = fopen(name, "r");
+	nsinfo__mountns_exit(&nsc);
 	if (!jd->in)
 		return -1;
 
@@ -367,6 +380,20 @@
 	return 0;
 }
 
+static pid_t jr_entry_pid(struct jit_buf_desc *jd, union jr_entry *jr)
+{
+	if (jd->nsi && jd->nsi->in_pidns)
+		return jd->nsi->tgid;
+	return jr->load.pid;
+}
+
+static pid_t jr_entry_tid(struct jit_buf_desc *jd, union jr_entry *jr)
+{
+	if (jd->nsi && jd->nsi->in_pidns)
+		return jd->nsi->pid;
+	return jr->load.tid;
+}
+
 static uint64_t convert_timestamp(struct jit_buf_desc *jd, uint64_t timestamp)
 {
 	struct perf_tsc_conversion tc;
@@ -402,14 +429,15 @@
 	const char *sym;
 	uint64_t count;
 	int ret, csize, usize;
-	pid_t pid, tid;
+	pid_t nspid, pid, tid;
 	struct {
 		u32 pid, tid;
 		u64 time;
 	} *id;
 
-	pid   = jr->load.pid;
-	tid   = jr->load.tid;
+	nspid = jr->load.pid;
+	pid   = jr_entry_pid(jd, jr);
+	tid   = jr_entry_tid(jd, jr);
 	csize = jr->load.code_size;
 	usize = jd->unwinding_mapped_size;
 	addr  = jr->load.code_addr;
@@ -425,14 +453,14 @@
 	filename = event->mmap2.filename;
 	size = snprintf(filename, PATH_MAX, "%s/jitted-%d-%" PRIu64 ".so",
 			jd->dir,
-			pid,
+			nspid,
 			count);
 
 	size++; /* for \0 */
 
 	size = PERF_ALIGN(size, sizeof(u64));
 	uaddr = (uintptr_t)code;
-	ret = jit_emit_elf(filename, sym, addr, (const void *)uaddr, csize, jd->debug_data, jd->nr_debug_entries,
+	ret = jit_emit_elf(jd, filename, sym, addr, (const void *)uaddr, csize, jd->debug_data, jd->nr_debug_entries,
 			   jd->unwinding_data, jd->eh_frame_hdr_size, jd->unwinding_size);
 
 	if (jd->debug_data && jd->nr_debug_entries) {
@@ -451,7 +479,7 @@
 		free(event);
 		return -1;
 	}
-	if (stat(filename, &st))
+	if (nsinfo__stat(filename, &st, jd->nsi))
 		memset(&st, 0, sizeof(st));
 
 	event->mmap2.header.type = PERF_RECORD_MMAP2;
@@ -515,14 +543,15 @@
 	int usize;
 	u16 idr_size;
 	int ret;
-	pid_t pid, tid;
+	pid_t nspid, pid, tid;
 	struct {
 		u32 pid, tid;
 		u64 time;
 	} *id;
 
-	pid = jr->move.pid;
-	tid =  jr->move.tid;
+	nspid = jr->load.pid;
+	pid   = jr_entry_pid(jd, jr);
+	tid   = jr_entry_tid(jd, jr);
 	usize = jd->unwinding_mapped_size;
 	idr_size = jd->machine->id_hdr_size;
 
@@ -536,12 +565,12 @@
 	filename = event->mmap2.filename;
 	size = snprintf(filename, PATH_MAX, "%s/jitted-%d-%" PRIu64 ".so",
 	         jd->dir,
-	         pid,
+		 nspid,
 		 jr->move.code_index);
 
 	size++; /* for \0 */
 
-	if (stat(filename, &st))
+	if (nsinfo__stat(filename, &st, jd->nsi))
 		memset(&st, 0, sizeof(st));
 
 	size = PERF_ALIGN(size, sizeof(u64));
@@ -700,7 +729,7 @@
  * as captured in the RECORD_MMAP record
  */
 static int
-jit_detect(char *mmap_name, pid_t pid)
+jit_detect(char *mmap_name, pid_t pid, struct nsinfo *nsi)
  {
 	char *p;
 	char *end = NULL;
@@ -740,7 +769,7 @@
 	 * pid does not match mmap pid
 	 * pid==0 in system-wide mode (synthesized)
 	 */
-	if (pid && pid2 != pid)
+	if (pid && pid2 != nsi->nstgid)
 		return -1;
 	/*
 	 * validate suffix
@@ -782,16 +811,30 @@
 	    struct machine *machine,
 	    char *filename,
 	    pid_t pid,
+	    pid_t tid,
 	    u64 *nbytes)
 {
+	struct thread *thread;
+	struct nsinfo *nsi;
 	struct evsel *first;
 	struct jit_buf_desc jd;
 	int ret;
 
+	thread = machine__findnew_thread(machine, pid, tid);
+	if (thread == NULL) {
+		pr_err("problem processing JIT mmap event, skipping it.\n");
+		return 0;
+	}
+
+	nsi = nsinfo__get(thread->nsinfo);
+	thread__put(thread);
+
 	/*
 	 * first, detect marker mmap (i.e., the jitdump mmap)
 	 */
-	if (jit_detect(filename, pid)) {
+	if (jit_detect(filename, pid, nsi)) {
+		nsinfo__put(nsi);
+
 		// Strip //anon* mmaps if we processed a jitdump for this pid
 		if (jit_has_pid(machine, pid) && (strncmp(filename, "//anon", 6) == 0))
 			return 1;
@@ -804,6 +847,7 @@
 	jd.session = session;
 	jd.output  = output;
 	jd.machine = machine;
+	jd.nsi = nsi;
 
 	/*
 	 * track sample_type to compute id_all layout
@@ -821,5 +865,7 @@
 		ret = 1;
 	}
 
+	nsinfo__put(jd.nsi);
+
 	return ret;
 }
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 1e9d3f9..b5c2d8b 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -369,6 +369,15 @@
 	return machine;
 }
 
+struct machine *machines__find_guest(struct machines *machines, pid_t pid)
+{
+	struct machine *machine = machines__find(machines, pid);
+
+	if (!machine)
+		machine = machines__findnew(machines, DEFAULT_GUEST_KERNEL_ID);
+	return machine;
+}
+
 void machines__process_guests(struct machines *machines,
 			      machine__process_t process, void *data)
 {
@@ -589,6 +598,24 @@
 	return th;
 }
 
+/*
+ * Threads are identified by pid and tid, and the idle task has pid == tid == 0.
+ * So here a single thread is created for that, but actually there is a separate
+ * idle task per cpu, so there should be one 'struct thread' per cpu, but there
+ * is only 1. That causes problems for some tools, requiring workarounds. For
+ * example get_idle_thread() in builtin-sched.c, or thread_stack__per_cpu().
+ */
+struct thread *machine__idle_thread(struct machine *machine)
+{
+	struct thread *thread = machine__findnew_thread(machine, 0, 0);
+
+	if (!thread || thread__set_comm(thread, "swapper", 0) ||
+	    thread__set_namespaces(thread, 0, NULL))
+		pr_err("problem inserting idle task for machine pid %d\n", machine->pid);
+
+	return thread;
+}
+
 struct comm *machine__thread_exec_comm(struct machine *machine,
 				       struct thread *thread)
 {
@@ -1599,7 +1626,8 @@
 }
 
 static int machine__process_kernel_mmap_event(struct machine *machine,
-					      struct extra_kernel_map *xm)
+					      struct extra_kernel_map *xm,
+					      struct build_id *bid)
 {
 	struct map *map;
 	enum dso_space_type dso_space;
@@ -1624,6 +1652,10 @@
 			goto out_problem;
 
 		map->end = map->start + xm->end - xm->start;
+
+		if (build_id__is_defined(bid))
+			dso__set_build_id(map->dso, bid);
+
 	} else if (is_kernel_mmap) {
 		const char *symbol_name = (xm->name + strlen(machine->mmap_name));
 		/*
@@ -1681,6 +1713,9 @@
 
 		machine__update_kernel_mmap(machine, xm->start, xm->end);
 
+		if (build_id__is_defined(bid))
+			dso__set_build_id(kernel, bid);
+
 		/*
 		 * Avoid using a zero address (kptr_restrict) for the ref reloc
 		 * symbol. Effectively having zero here means that at record
@@ -1718,11 +1753,17 @@
 		.ino = event->mmap2.ino,
 		.ino_generation = event->mmap2.ino_generation,
 	};
+	struct build_id __bid, *bid = NULL;
 	int ret = 0;
 
 	if (dump_trace)
 		perf_event__fprintf_mmap2(event, stdout);
 
+	if (event->header.misc & PERF_RECORD_MISC_MMAP_BUILD_ID) {
+		bid = &__bid;
+		build_id__init(bid, event->mmap2.build_id, event->mmap2.build_id_size);
+	}
+
 	if (sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
 	    sample->cpumode == PERF_RECORD_MISC_KERNEL) {
 		struct extra_kernel_map xm = {
@@ -1732,7 +1773,7 @@
 		};
 
 		strlcpy(xm.name, event->mmap2.filename, KMAP_NAME_LEN);
-		ret = machine__process_kernel_mmap_event(machine, &xm);
+		ret = machine__process_kernel_mmap_event(machine, &xm, bid);
 		if (ret < 0)
 			goto out_problem;
 		return 0;
@@ -1746,7 +1787,7 @@
 	map = map__new(machine, event->mmap2.start,
 			event->mmap2.len, event->mmap2.pgoff,
 			&dso_id, event->mmap2.prot,
-			event->mmap2.flags,
+			event->mmap2.flags, bid,
 			event->mmap2.filename, thread);
 
 	if (map == NULL)
@@ -1789,7 +1830,7 @@
 		};
 
 		strlcpy(xm.name, event->mmap.filename, KMAP_NAME_LEN);
-		ret = machine__process_kernel_mmap_event(machine, &xm);
+		ret = machine__process_kernel_mmap_event(machine, &xm, NULL);
 		if (ret < 0)
 			goto out_problem;
 		return 0;
@@ -1805,7 +1846,7 @@
 
 	map = map__new(machine, event->mmap.start,
 			event->mmap.len, event->mmap.pgoff,
-			NULL, prot, 0, event->mmap.filename, thread);
+			NULL, prot, 0, NULL, event->mmap.filename, thread);
 
 	if (map == NULL)
 		goto out_problem_map;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 26368d3..7377ed6 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -106,6 +106,7 @@
 
 struct thread *machine__find_thread(struct machine *machine, pid_t pid,
 				    pid_t tid);
+struct thread *machine__idle_thread(struct machine *machine);
 struct comm *machine__thread_exec_comm(struct machine *machine,
 				       struct thread *thread);
 
@@ -162,6 +163,7 @@
 struct machine *machines__find_host(struct machines *machines);
 struct machine *machines__find(struct machines *machines, pid_t pid);
 struct machine *machines__findnew(struct machines *machines, pid_t pid);
+struct machine *machines__find_guest(struct machines *machines, pid_t pid);
 
 void machines__set_id_hdr_size(struct machines *machines, u16 id_hdr_size);
 void machines__set_comm_exec(struct machines *machines, bool comm_exec);
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index f44ede4..692e56d 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -130,8 +130,8 @@
 
 struct map *map__new(struct machine *machine, u64 start, u64 len,
 		     u64 pgoff, struct dso_id *id,
-		     u32 prot, u32 flags, char *filename,
-		     struct thread *thread)
+		     u32 prot, u32 flags, struct build_id *bid,
+		     char *filename, struct thread *thread)
 {
 	struct map *map = malloc(sizeof(*map));
 	struct nsinfo *nsi = NULL;
@@ -194,6 +194,10 @@
 				dso__set_loaded(dso);
 		}
 		dso->nsinfo = nsi;
+
+		if (build_id__is_defined(bid))
+			dso__set_build_id(dso, bid);
+
 		dso__put(dso);
 	}
 	return map;
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index b1c0686..9f32825 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -104,10 +104,11 @@
 	       u64 start, u64 end, u64 pgoff, struct dso *dso);
 
 struct dso_id;
+struct build_id;
 
 struct map *map__new(struct machine *machine, u64 start, u64 len,
 		     u64 pgoff, struct dso_id *id, u32 prot, u32 flags,
-		     char *filename, struct thread *thread);
+		     struct build_id *bid, char *filename, struct thread *thread);
 struct map *map__new2(u64 start, struct dso *dso);
 void map__delete(struct map *map);
 struct map *map__clone(struct map *map);
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 19007e4..f93a852 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -56,6 +56,11 @@
 	return (char *)e->name;
 }
 
+__weak bool is_mem_loads_aux_event(struct evsel *leader __maybe_unused)
+{
+	return false;
+}
+
 int perf_mem_events__parse(const char *str)
 {
 	char *tok, *saveptr = NULL;
@@ -332,6 +337,29 @@
 	return l;
 }
 
+int perf_mem__blk_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
+{
+	size_t l = 0;
+	u64 mask = PERF_MEM_BLK_NA;
+
+	sz -= 1; /* -1 for null termination */
+	out[0] = '\0';
+
+	if (mem_info)
+		mask = mem_info->data_src.mem_blk;
+
+	if (!mask || (mask & PERF_MEM_BLK_NA)) {
+		l += scnprintf(out + l, sz - l, " N/A");
+		return l;
+	}
+	if (mask & PERF_MEM_BLK_DATA)
+		l += scnprintf(out + l, sz - l, " Data");
+	if (mask & PERF_MEM_BLK_ADDR)
+		l += scnprintf(out + l, sz - l, " Addr");
+
+	return l;
+}
+
 int perf_script__meminfo_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
 {
 	int i = 0;
@@ -343,6 +371,8 @@
 	i += perf_mem__tlb_scnprintf(out + i, sz - i, mem_info);
 	i += scnprintf(out + i, sz - i, "|LCK ");
 	i += perf_mem__lck_scnprintf(out + i, sz - i, mem_info);
+	i += scnprintf(out + i, sz - i, "|BLK ");
+	i += perf_mem__blk_scnprintf(out + i, sz - i, mem_info);
 
 	return i;
 }
@@ -355,6 +385,7 @@
 	u64 lvl    = data_src->mem_lvl;
 	u64 snoop  = data_src->mem_snoop;
 	u64 lock   = data_src->mem_lock;
+	u64 blk    = data_src->mem_blk;
 	/*
 	 * Skylake might report unknown remote level via this
 	 * bit, consider it when evaluating remote HITMs.
@@ -374,6 +405,9 @@
 
 	if (lock & P(LOCK, LOCKED)) stats->locks++;
 
+	if (blk & P(BLK, DATA)) stats->blk_data++;
+	if (blk & P(BLK, ADDR)) stats->blk_addr++;
+
 	if (op & P(OP, LOAD)) {
 		/* load */
 		stats->load++;
@@ -485,6 +519,8 @@
 	stats->rmt_hit		+= add->rmt_hit;
 	stats->lcl_dram		+= add->lcl_dram;
 	stats->rmt_dram		+= add->rmt_dram;
+	stats->blk_data		+= add->blk_data;
+	stats->blk_addr		+= add->blk_addr;
 	stats->nomap		+= add->nomap;
 	stats->noparse		+= add->noparse;
 }
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 5ef1782..755cef7 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -9,6 +9,7 @@
 #include <linux/refcount.h>
 #include <linux/perf_event.h>
 #include "stat.h"
+#include "evsel.h"
 
 struct perf_mem_event {
 	bool		record;
@@ -39,6 +40,7 @@
 
 char *perf_mem_events__name(int i);
 struct perf_mem_event *perf_mem_events__ptr(int i);
+bool is_mem_loads_aux_event(struct evsel *leader);
 
 void perf_mem_events__list(void);
 
@@ -47,6 +49,7 @@
 int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info);
 int perf_mem__snp_scnprintf(char *out, size_t sz, struct mem_info *mem_info);
 int perf_mem__lck_scnprintf(char *out, size_t sz, struct mem_info *mem_info);
+int perf_mem__blk_scnprintf(char *out, size_t sz, struct mem_info *mem_info);
 
 int perf_script__meminfo_scnprintf(char *bf, size_t size, struct mem_info *mem_info);
 
@@ -76,6 +79,8 @@
 	u32	rmt_hit;             /* count of loads with remote hit clean; */
 	u32	lcl_dram;            /* count of loads miss to local DRAM */
 	u32	rmt_dram;            /* count of loads miss to remote DRAM */
+	u32	blk_data;            /* count of loads blocked by data */
+	u32	blk_addr;            /* count of loads blocked by address conflict */
 	u32	nomap;               /* count of load/stores with no phys adrs */
 	u32	noparse;             /* count of unparsable data sources */
 };
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index e6d3452..26c990e 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -379,7 +379,7 @@
 				metric_refs[i].metric_expr = ref->metric_expr;
 				i++;
 			}
-		};
+		}
 
 		expr->metric_refs = metric_refs;
 		expr->metric_expr = m->metric_expr;
diff --git a/tools/perf/util/namespaces.c b/tools/perf/util/namespaces.c
index 285d6f3..608b20c 100644
--- a/tools/perf/util/namespaces.c
+++ b/tools/perf/util/namespaces.c
@@ -66,6 +66,7 @@
 	char spath[PATH_MAX];
 	char *newns = NULL;
 	char *statln = NULL;
+	char *nspid;
 	struct stat old_stat;
 	struct stat new_stat;
 	FILE *f = NULL;
@@ -112,8 +113,12 @@
 		}
 
 		if (strstr(statln, "NStgid:") != NULL) {
-			nsi->nstgid = (pid_t)strtol(strrchr(statln, '\t'),
-						     NULL, 10);
+			nspid = strrchr(statln, '\t');
+			nsi->nstgid = (pid_t)strtol(nspid, NULL, 10);
+			/* If innermost tgid is not the first, process is in a different
+			 * PID namespace.
+			 */
+			nsi->in_pidns = (statln + sizeof("NStgid:") - 1) != nspid;
 			break;
 		}
 	}
@@ -140,6 +145,7 @@
 		nsi->tgid = pid;
 		nsi->nstgid = pid;
 		nsi->need_setns = false;
+		nsi->in_pidns = false;
 		/* Init may fail if the process exits while we're trying to look
 		 * at its proc information.  In that case, save the pid but
 		 * don't try to enter the namespace.
@@ -166,6 +172,7 @@
 		nnsi->tgid = nsi->tgid;
 		nnsi->nstgid = nsi->nstgid;
 		nnsi->need_setns = nsi->need_setns;
+		nnsi->in_pidns = nsi->in_pidns;
 		if (nsi->mntns_path) {
 			nnsi->mntns_path = strdup(nsi->mntns_path);
 			if (!nnsi->mntns_path) {
@@ -280,3 +287,15 @@
 
 	return rpath;
 }
+
+int nsinfo__stat(const char *filename, struct stat *st, struct nsinfo *nsi)
+{
+	int ret;
+	struct nscookie nsc;
+
+	nsinfo__mountns_enter(nsi, &nsc);
+	ret = stat(filename, st);
+	nsinfo__mountns_exit(&nsc);
+
+	return ret;
+}
diff --git a/tools/perf/util/namespaces.h b/tools/perf/util/namespaces.h
index 4b33f68..ad9775d 100644
--- a/tools/perf/util/namespaces.h
+++ b/tools/perf/util/namespaces.h
@@ -8,6 +8,7 @@
 #define __PERF_NAMESPACES_H
 
 #include <sys/types.h>
+#include <sys/stat.h>
 #include <linux/stddef.h>
 #include <linux/perf_event.h>
 #include <linux/refcount.h>
@@ -33,6 +34,7 @@
 	pid_t			tgid;
 	pid_t			nstgid;
 	bool			need_setns;
+	bool			in_pidns;
 	char			*mntns_path;
 	refcount_t		refcnt;
 };
@@ -55,6 +57,7 @@
 void nsinfo__mountns_exit(struct nscookie *nc);
 
 char *nsinfo__realpath(const char *path, struct nsinfo *nsi);
+int nsinfo__stat(const char *filename, struct stat *st, struct nsinfo *nsi);
 
 static inline void __nsinfo__zput(struct nsinfo **nsip)
 {
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 9db5097..0b36285 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -356,6 +356,7 @@
 cycles-ct				|
 cycles-t				|
 mem-loads				|
+mem-loads-aux				|
 mem-stores				|
 topdown-[a-z-]+				|
 tx-capacity-[a-z-]+			|
diff --git a/tools/perf/util/perf_api_probe.c b/tools/perf/util/perf_api_probe.c
index 3840d02..829af17 100644
--- a/tools/perf/util/perf_api_probe.c
+++ b/tools/perf/util/perf_api_probe.c
@@ -98,6 +98,11 @@
 	evsel->core.attr.text_poke = 1;
 }
 
+static void perf_probe_build_id(struct evsel *evsel)
+{
+	evsel->core.attr.build_id = 1;
+}
+
 bool perf_can_sample_identifier(void)
 {
 	return perf_probe_api(perf_probe_sample_identifier);
@@ -172,3 +177,8 @@
 
 	return true;
 }
+
+bool perf_can_record_build_id(void)
+{
+	return perf_probe_api(perf_probe_build_id);
+}
diff --git a/tools/perf/util/perf_api_probe.h b/tools/perf/util/perf_api_probe.h
index d5506a9..f12ca55 100644
--- a/tools/perf/util/perf_api_probe.h
+++ b/tools/perf/util/perf_api_probe.h
@@ -11,5 +11,6 @@
 bool perf_can_record_switch_events(void);
 bool perf_can_record_text_poke_events(void);
 bool perf_can_sample_identifier(void);
+bool perf_can_record_build_id(void);
 
 #endif // __PERF_API_PROBE_H
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index fb0bb66..3048182 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -35,7 +35,8 @@
 		bit_name(BRANCH_STACK), bit_name(REGS_USER), bit_name(STACK_USER),
 		bit_name(IDENTIFIER), bit_name(REGS_INTR), bit_name(DATA_SRC),
 		bit_name(WEIGHT), bit_name(PHYS_ADDR), bit_name(AUX),
-		bit_name(CGROUP), bit_name(DATA_PAGE_SIZE),
+		bit_name(CGROUP), bit_name(DATA_PAGE_SIZE), bit_name(CODE_PAGE_SIZE),
+		bit_name(WEIGHT_STRUCT),
 		{ .name = NULL, }
 	};
 #undef bit_name
@@ -134,6 +135,8 @@
 	PRINT_ATTRf(bpf_event, p_unsigned);
 	PRINT_ATTRf(aux_output, p_unsigned);
 	PRINT_ATTRf(cgroup, p_unsigned);
+	PRINT_ATTRf(text_poke, p_unsigned);
+	PRINT_ATTRf(build_id, p_unsigned);
 
 	PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
 	PRINT_ATTRf(bp_type, p_unsigned);
diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
index a454991..eeac181 100644
--- a/tools/perf/util/perf_regs.h
+++ b/tools/perf/util/perf_regs.h
@@ -33,6 +33,13 @@
 
 int perf_reg_value(u64 *valp, struct regs_dump *regs, int id);
 
+static inline const char *perf_reg_name(int id)
+{
+	const char *reg_name = __perf_reg_name(id);
+
+	return reg_name ?: "unknown";
+}
+
 #else
 #define PERF_REGS_MASK	0
 #define PERF_REGS_MAX	0
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 8eae2aff..a9cff3a 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -894,6 +894,16 @@
 	struct debuginfo *dinfo;
 	int ntevs, ret = 0;
 
+	/* Workaround for gcc #98776 issue.
+	 * Perf failed to add kretprobe event with debuginfo of vmlinux which is
+	 * compiled by gcc with -fpatchable-function-entry option enabled. The
+	 * same issue with kernel module. The retprobe doesn`t need debuginfo.
+	 * This workaround solution use map to query the probe function address
+	 * for retprobe event.
+	 */
+	if (pev->point.retprobe)
+		return 0;
+
 	dinfo = open_debuginfo(pev->target, pev->nsi, !need_dwarf);
 	if (!dinfo) {
 		if (need_dwarf)
@@ -1074,7 +1084,7 @@
 	}
 
 	intlist__for_each_entry(ln, lr->line_list) {
-		for (; ln->i > l; l++) {
+		for (; ln->i > (unsigned long)l; l++) {
 			ret = show_one_line(fp, l - lr->offset);
 			if (ret < 0)
 				goto end;
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index bbecb44..5227354 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -794,6 +794,8 @@
 	char *ret = NULL;
 	int i, args_count, err;
 	unsigned long long ref_ctr_offset;
+	char *arg;
+	int arg_idx = 0;
 
 	if (strbuf_init(&buf, 32) < 0)
 		return NULL;
@@ -818,11 +820,43 @@
 		if (args == NULL)
 			goto error;
 
-		for (i = 0; i < args_count; ++i) {
-			if (synthesize_sdt_probe_arg(&buf, i, args[i]) < 0) {
+		for (i = 0; i < args_count; ) {
+			/*
+			 * FIXUP: Arm64 ELF section '.note.stapsdt' uses string
+			 * format "-4@[sp, NUM]" if a probe is to access data in
+			 * the stack, e.g. below is an example for the SDT
+			 * Arguments:
+			 *
+			 *   Arguments: -4@[sp, 12] -4@[sp, 8] -4@[sp, 4]
+			 *
+			 * Since the string introduces an extra space character
+			 * in the middle of square brackets, the argument is
+			 * divided into two items.  Fixup for this case, if an
+			 * item contains sub string "[sp,", need to concatenate
+			 * the two items.
+			 */
+			if (strstr(args[i], "[sp,") && (i+1) < args_count) {
+				err = asprintf(&arg, "%s %s", args[i], args[i+1]);
+				i += 2;
+			} else {
+				err = asprintf(&arg, "%s", args[i]);
+				i += 1;
+			}
+
+			/* Failed to allocate memory */
+			if (err < 0) {
 				argv_free(args);
 				goto error;
 			}
+
+			if (synthesize_sdt_probe_arg(&buf, arg_idx, arg) < 0) {
+				free(arg);
+				argv_free(args);
+				goto error;
+			}
+
+			free(arg);
+			arg_idx++;
 		}
 
 		argv_free(args);
diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index 76dd349..1b118c9 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -1187,8 +1187,10 @@
 	while (!dwarf_nextcu(dbg->dbg, off, &noff, &cuhl, NULL, NULL, NULL)) {
 		/* Get the DIE(Debugging Information Entry) of this CU */
 		diep = dwarf_offdie(dbg->dbg, off + cuhl, &pf->cu_die);
-		if (!diep)
+		if (!diep) {
+			off = noff;
 			continue;
+		}
 
 		/* Check if target file is included. */
 		if (pp->file)
@@ -1949,8 +1951,10 @@
 
 		/* Get the DIE(Debugging Information Entry) of this CU */
 		diep = dwarf_offdie(dbg->dbg, off + cuhl, &lf.cu_die);
-		if (!diep)
+		if (!diep) {
+			off = noff;
 			continue;
+		}
 
 		/* Check if target file is included. */
 		if (lr->file)
diff --git a/tools/perf/util/python-ext-sources b/tools/perf/util/python-ext-sources
index a9d9c14..71b7535 100644
--- a/tools/perf/util/python-ext-sources
+++ b/tools/perf/util/python-ext-sources
@@ -10,6 +10,7 @@
 util/cap.c
 util/evlist.c
 util/evsel.c
+util/evsel_fprintf.c
 util/perf_event_attr_fprintf.c
 util/cpumap.c
 util/memswap.c
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index cc5ade8..278abec 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -80,6 +80,27 @@
 }
 
 /*
+ * XXX: All these evsel destructors need some better mechanism, like a linked
+ * list of destructors registered when the relevant code indeed is used instead
+ * of having more and more calls in perf_evsel__delete(). -- acme
+ *
+ * For now, add some more:
+ *
+ * Not to drag the BPF bandwagon...
+ */
+void bpf_counter__destroy(struct evsel *evsel);
+int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd);
+
+void bpf_counter__destroy(struct evsel *evsel __maybe_unused)
+{
+}
+
+int bpf_counter__install_pe(struct evsel *evsel __maybe_unused, int cpu __maybe_unused, int fd __maybe_unused)
+{
+	return 0;
+}
+
+/*
  * Support debug printing even though util/debug.c is not linked.  That means
  * implementing 'verbose' and 'eprintf'.
  */
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index e70c9dd..f99852d 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -15,6 +15,8 @@
 #include "record.h"
 #include "../perf-sys.h"
 #include "topdown.h"
+#include "map_symbol.h"
+#include "mem-events.h"
 
 /*
  * evsel__config_leader_sampling() uses special rules for leader sampling.
@@ -25,7 +27,8 @@
 {
 	struct evsel *leader = evsel->leader;
 
-	if (evsel__is_aux_event(leader) || arch_topdown_sample_read(leader)) {
+	if (evsel__is_aux_event(leader) || arch_topdown_sample_read(leader) ||
+	    is_mem_loads_aux_event(leader)) {
 		evlist__for_each_entry(evlist, evsel) {
 			if (evsel->leader == leader && evsel != evsel->leader)
 				return evsel;
@@ -201,10 +204,10 @@
 	 * Default frequency is over current maximum.
 	 */
 	if (max_rate < opts->freq) {
-		pr_warning("Lowering default frequency rate to %u.\n"
+		pr_warning("Lowering default frequency rate from %u to %u.\n"
 			   "Please consider tweaking "
 			   "/proc/sys/kernel/perf_event_max_sample_rate.\n",
-			   max_rate);
+			   opts->freq, max_rate);
 		opts->freq = max_rate;
 	}
 
diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
index 694b351..68f471d 100644
--- a/tools/perf/util/record.h
+++ b/tools/perf/util/record.h
@@ -23,6 +23,7 @@
 	bool	      sample_address;
 	bool	      sample_phys_addr;
 	bool	      sample_data_page_size;
+	bool	      sample_code_page_size;
 	bool	      sample_weight;
 	bool	      sample_time;
 	bool	      sample_time_set;
@@ -50,6 +51,7 @@
 	bool	      no_bpf_event;
 	bool	      kcore;
 	bool	      text_poke;
+	bool	      build_id;
 	unsigned int  freq;
 	unsigned int  mmap_pages;
 	unsigned int  auxtrace_mmap_pages;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 25adbcc..859832a 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -593,10 +593,13 @@
 	event->mmap2.start = bswap_64(event->mmap2.start);
 	event->mmap2.len   = bswap_64(event->mmap2.len);
 	event->mmap2.pgoff = bswap_64(event->mmap2.pgoff);
-	event->mmap2.maj   = bswap_32(event->mmap2.maj);
-	event->mmap2.min   = bswap_32(event->mmap2.min);
-	event->mmap2.ino   = bswap_64(event->mmap2.ino);
-	event->mmap2.ino_generation = bswap_64(event->mmap2.ino_generation);
+
+	if (!(event->header.misc & PERF_RECORD_MISC_MMAP_BUILD_ID)) {
+		event->mmap2.maj   = bswap_32(event->mmap2.maj);
+		event->mmap2.min   = bswap_32(event->mmap2.min);
+		event->mmap2.ino   = bswap_64(event->mmap2.ino);
+		event->mmap2.ino_generation = bswap_64(event->mmap2.ino_generation);
+	}
 
 	if (sample_id_all) {
 		void *data = &event->mmap2.filename;
@@ -1297,8 +1300,12 @@
 	if (sample_type & PERF_SAMPLE_STACK_USER)
 		stack_user__printf(&sample->user_stack);
 
-	if (sample_type & PERF_SAMPLE_WEIGHT)
-		printf("... weight: %" PRIu64 "\n", sample->weight);
+	if (sample_type & PERF_SAMPLE_WEIGHT_TYPE) {
+		printf("... weight: %" PRIu64 "", sample->weight);
+			if (sample_type & PERF_SAMPLE_WEIGHT_STRUCT)
+				printf(",0x%"PRIx16"", sample->ins_lat);
+		printf("\n");
+	}
 
 	if (sample_type & PERF_SAMPLE_DATA_SRC)
 		printf(" . data_src: 0x%"PRIx64"\n", sample->data_src);
@@ -1309,6 +1316,9 @@
 	if (sample_type & PERF_SAMPLE_DATA_PAGE_SIZE)
 		printf(" .. data page size: %s\n", get_page_size_name(sample->data_page_size, str));
 
+	if (sample_type & PERF_SAMPLE_CODE_PAGE_SIZE)
+		printf(" .. code page size: %s\n", get_page_size_name(sample->code_page_size, str));
+
 	if (sample_type & PERF_SAMPLE_TRANSACTION)
 		printf("... transaction: %" PRIx64 "\n", sample->transaction);
 
@@ -1346,8 +1356,6 @@
 					       union perf_event *event,
 					       struct perf_sample *sample)
 {
-	struct machine *machine;
-
 	if (perf_guest &&
 	    ((sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL) ||
 	     (sample->cpumode == PERF_RECORD_MISC_GUEST_USER))) {
@@ -1359,10 +1367,7 @@
 		else
 			pid = sample->pid;
 
-		machine = machines__find(machines, pid);
-		if (!machine)
-			machine = machines__findnew(machines, DEFAULT_GUEST_KERNEL_ID);
-		return machine;
+		return machines__find_guest(machines, pid);
 	}
 
 	return &machines->host;
@@ -1784,32 +1789,13 @@
 	return machine__findnew_thread(&session->machines.host, -1, pid);
 }
 
-/*
- * Threads are identified by pid and tid, and the idle task has pid == tid == 0.
- * So here a single thread is created for that, but actually there is a separate
- * idle task per cpu, so there should be one 'struct thread' per cpu, but there
- * is only 1. That causes problems for some tools, requiring workarounds. For
- * example get_idle_thread() in builtin-sched.c, or thread_stack__per_cpu().
- */
 int perf_session__register_idle_thread(struct perf_session *session)
 {
-	struct thread *thread;
-	int err = 0;
+	struct thread *thread = machine__idle_thread(&session->machines.host);
 
-	thread = machine__findnew_thread(&session->machines.host, 0, 0);
-	if (thread == NULL || thread__set_comm(thread, "swapper", 0)) {
-		pr_err("problem inserting idle task.\n");
-		err = -1;
-	}
-
-	if (thread == NULL || thread__set_namespaces(thread, 0, NULL)) {
-		pr_err("problem inserting idle task.\n");
-		err = -1;
-	}
-
-	/* machine__findnew_thread() got the thread, so put it */
+	/* machine__idle_thread() got the thread, so put it */
 	thread__put(thread);
-	return err;
+	return thread ? 0 : -1;
 }
 
 static void
diff --git a/tools/perf/util/setup.py b/tools/perf/util/setup.py
index c5e3e9a..483f050 100644
--- a/tools/perf/util/setup.py
+++ b/tools/perf/util/setup.py
@@ -43,7 +43,7 @@
 
 cflags = getenv('CFLAGS', '').split()
 # switch off several checks (need to be at the end of cflags list)
-cflags += ['-fno-strict-aliasing', '-Wno-write-strings', '-Wno-unused-parameter', '-Wno-redundant-decls' ]
+cflags += ['-fno-strict-aliasing', '-Wno-write-strings', '-Wno-unused-parameter', '-Wno-redundant-decls', '-DPYTHON_PERF' ]
 if not cc_is_clang:
     cflags += ['-Wno-cast-function-type' ]
 
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 80907bc..0d5ad42 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -36,7 +36,7 @@
 const char	*parent_pattern = default_parent_pattern;
 const char	*default_sort_order = "comm,dso,symbol";
 const char	default_branch_sort_order[] = "comm,dso_from,symbol_from,symbol_to,cycles";
-const char	default_mem_sort_order[] = "local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked";
+const char	default_mem_sort_order[] = "local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat";
 const char	default_top_sort_order[] = "dso,symbol";
 const char	default_diff_sort_order[] = "dso,symbol";
 const char	default_tracepoint_sort_order[] = "trace";
@@ -1365,6 +1365,49 @@
 	.se_width_idx	= HISTC_GLOBAL_WEIGHT,
 };
 
+static u64 he_ins_lat(struct hist_entry *he)
+{
+		return he->stat.nr_events ? he->stat.ins_lat / he->stat.nr_events : 0;
+}
+
+static int64_t
+sort__local_ins_lat_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+		return he_ins_lat(left) - he_ins_lat(right);
+}
+
+static int hist_entry__local_ins_lat_snprintf(struct hist_entry *he, char *bf,
+					      size_t size, unsigned int width)
+{
+		return repsep_snprintf(bf, size, "%-*u", width, he_ins_lat(he));
+}
+
+struct sort_entry sort_local_ins_lat = {
+	.se_header	= "Local INSTR Latency",
+	.se_cmp		= sort__local_ins_lat_cmp,
+	.se_snprintf	= hist_entry__local_ins_lat_snprintf,
+	.se_width_idx	= HISTC_LOCAL_INS_LAT,
+};
+
+static int64_t
+sort__global_ins_lat_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+		return left->stat.ins_lat - right->stat.ins_lat;
+}
+
+static int hist_entry__global_ins_lat_snprintf(struct hist_entry *he, char *bf,
+					       size_t size, unsigned int width)
+{
+		return repsep_snprintf(bf, size, "%-*u", width, he->stat.ins_lat);
+}
+
+struct sort_entry sort_global_ins_lat = {
+	.se_header	= "INSTR Latency",
+	.se_cmp		= sort__global_ins_lat_cmp,
+	.se_snprintf	= hist_entry__global_ins_lat_snprintf,
+	.se_width_idx	= HISTC_GLOBAL_INS_LAT,
+};
+
 struct sort_entry sort_mem_daddr_sym = {
 	.se_header	= "Data Symbol",
 	.se_cmp		= sort__daddr_cmp,
@@ -1422,6 +1465,41 @@
 };
 
 static int64_t
+sort__blocked_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	union perf_mem_data_src data_src_l;
+	union perf_mem_data_src data_src_r;
+
+	if (left->mem_info)
+		data_src_l = left->mem_info->data_src;
+	else
+		data_src_l.mem_blk = PERF_MEM_BLK_NA;
+
+	if (right->mem_info)
+		data_src_r = right->mem_info->data_src;
+	else
+		data_src_r.mem_blk = PERF_MEM_BLK_NA;
+
+	return (int64_t)(data_src_r.mem_blk - data_src_l.mem_blk);
+}
+
+static int hist_entry__blocked_snprintf(struct hist_entry *he, char *bf,
+					size_t size, unsigned int width)
+{
+	char out[16];
+
+	perf_mem__blk_scnprintf(out, sizeof(out), he->mem_info);
+	return repsep_snprintf(bf, size, "%.*s", width, out);
+}
+
+struct sort_entry sort_mem_blocked = {
+	.se_header	= "Blocked",
+	.se_cmp		= sort__blocked_cmp,
+	.se_snprintf	= hist_entry__blocked_snprintf,
+	.se_width_idx	= HISTC_MEM_BLOCKED,
+};
+
+static int64_t
 sort__phys_daddr_cmp(struct hist_entry *left, struct hist_entry *right)
 {
 	uint64_t l = 0, r = 0;
@@ -1492,6 +1570,31 @@
 };
 
 static int64_t
+sort__code_page_size_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	uint64_t l = left->code_page_size;
+	uint64_t r = right->code_page_size;
+
+	return (int64_t)(r - l);
+}
+
+static int hist_entry__code_page_size_snprintf(struct hist_entry *he, char *bf,
+					  size_t size, unsigned int width)
+{
+	char str[PAGE_SIZE_NAME_LEN];
+
+	return repsep_snprintf(bf, size, "%-*s", width,
+			       get_page_size_name(he->code_page_size, str));
+}
+
+struct sort_entry sort_code_page_size = {
+	.se_header	= "Code Page Size",
+	.se_cmp		= sort__code_page_size_cmp,
+	.se_snprintf	= hist_entry__code_page_size_snprintf,
+	.se_width_idx	= HISTC_CODE_PAGE_SIZE,
+};
+
+static int64_t
 sort__abort_cmp(struct hist_entry *left, struct hist_entry *right)
 {
 	if (!left->branch_info || !right->branch_info)
@@ -1735,6 +1838,9 @@
 	DIM(SORT_CGROUP_ID, "cgroup_id", sort_cgroup_id),
 	DIM(SORT_SYM_IPC_NULL, "ipc_null", sort_sym_ipc_null),
 	DIM(SORT_TIME, "time", sort_time),
+	DIM(SORT_CODE_PAGE_SIZE, "code_page_size", sort_code_page_size),
+	DIM(SORT_LOCAL_INS_LAT, "local_ins_lat", sort_local_ins_lat),
+	DIM(SORT_GLOBAL_INS_LAT, "ins_lat", sort_global_ins_lat),
 };
 
 #undef DIM
@@ -1770,6 +1876,7 @@
 	DIM(SORT_MEM_DCACHELINE, "dcacheline", sort_mem_dcacheline),
 	DIM(SORT_MEM_PHYS_DADDR, "phys_daddr", sort_mem_phys_daddr),
 	DIM(SORT_MEM_DATA_PAGE_SIZE, "data_page_size", sort_mem_data_page_size),
+	DIM(SORT_MEM_BLOCKED, "blocked", sort_mem_blocked),
 };
 
 #undef DIM
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index e50f2b6..63f67a3 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -50,6 +50,7 @@
 	u64			period_guest_sys;
 	u64			period_guest_us;
 	u64			weight;
+	u64			ins_lat;
 	u32			nr_events;
 };
 
@@ -106,6 +107,7 @@
 	u64			transaction;
 	s32			socket;
 	s32			cpu;
+	u64			code_page_size;
 	u8			cpumode;
 	u8			depth;
 
@@ -229,6 +231,9 @@
 	SORT_CGROUP_ID,
 	SORT_SYM_IPC_NULL,
 	SORT_TIME,
+	SORT_CODE_PAGE_SIZE,
+	SORT_LOCAL_INS_LAT,
+	SORT_GLOBAL_INS_LAT,
 
 	/* branch stack specific sort keys */
 	__SORT_BRANCH_STACK,
@@ -256,6 +261,7 @@
 	SORT_MEM_IADDR_SYMBOL,
 	SORT_MEM_PHYS_DADDR,
 	SORT_MEM_DATA_PAGE_SIZE,
+	SORT_MEM_BLOCKED,
 };
 
 /*
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 583ae4f..cce7a76 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -1045,7 +1045,9 @@
 	if (!config->csv_output) {
 		fprintf(output, "\n");
 		fprintf(output, " Performance counter stats for ");
-		if (_target->system_wide)
+		if (_target->bpf_str)
+			fprintf(output, "\'BPF program(s) %s", _target->bpf_str);
+		else if (_target->system_wide)
 			fprintf(output, "\'system wide");
 		else if (_target->cpu_list)
 			fprintf(output, "\'CPU(s) %s", _target->cpu_list);
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 12eafd1..6ccf21a 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -273,6 +273,18 @@
 	else if (perf_stat_evsel__is(counter, TOPDOWN_BE_BOUND))
 		update_runtime_stat(st, STAT_TOPDOWN_BE_BOUND,
 				    cpu, count, &rsd);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_HEAVY_OPS))
+		update_runtime_stat(st, STAT_TOPDOWN_HEAVY_OPS,
+				    cpu, count, &rsd);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_BR_MISPREDICT))
+		update_runtime_stat(st, STAT_TOPDOWN_BR_MISPREDICT,
+				    cpu, count, &rsd);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_FETCH_LAT))
+		update_runtime_stat(st, STAT_TOPDOWN_FETCH_LAT,
+				    cpu, count, &rsd);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_MEM_BOUND))
+		update_runtime_stat(st, STAT_TOPDOWN_MEM_BOUND,
+				    cpu, count, &rsd);
 	else if (evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
 		update_runtime_stat(st, STAT_STALLED_CYCLES_FRONT,
 				    cpu, count, &rsd);
@@ -1174,6 +1186,86 @@
 			color = PERF_COLOR_RED;
 		print_metric(config, ctxp, color, "%8.1f%%", "bad speculation",
 				bad_spec * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_HEAVY_OPS) &&
+			full_td(cpu, st, &rsd) && (config->topdown_level > 1)) {
+		double retiring = td_metric_ratio(cpu,
+						  STAT_TOPDOWN_RETIRING, st,
+						  &rsd);
+		double heavy_ops = td_metric_ratio(cpu,
+						   STAT_TOPDOWN_HEAVY_OPS, st,
+						   &rsd);
+		double light_ops = retiring - heavy_ops;
+
+		if (retiring > 0.7 && heavy_ops > 0.1)
+			color = PERF_COLOR_GREEN;
+		print_metric(config, ctxp, color, "%8.1f%%", "heavy operations",
+				heavy_ops * 100.);
+		if (retiring > 0.7 && light_ops > 0.6)
+			color = PERF_COLOR_GREEN;
+		else
+			color = NULL;
+		print_metric(config, ctxp, color, "%8.1f%%", "light operations",
+				light_ops * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_BR_MISPREDICT) &&
+			full_td(cpu, st, &rsd) && (config->topdown_level > 1)) {
+		double bad_spec = td_metric_ratio(cpu,
+						  STAT_TOPDOWN_BAD_SPEC, st,
+						  &rsd);
+		double br_mis = td_metric_ratio(cpu,
+						STAT_TOPDOWN_BR_MISPREDICT, st,
+						&rsd);
+		double m_clears = bad_spec - br_mis;
+
+		if (bad_spec > 0.1 && br_mis > 0.05)
+			color = PERF_COLOR_RED;
+		print_metric(config, ctxp, color, "%8.1f%%", "branch mispredict",
+				br_mis * 100.);
+		if (bad_spec > 0.1 && m_clears > 0.05)
+			color = PERF_COLOR_RED;
+		else
+			color = NULL;
+		print_metric(config, ctxp, color, "%8.1f%%", "machine clears",
+				m_clears * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_LAT) &&
+			full_td(cpu, st, &rsd) && (config->topdown_level > 1)) {
+		double fe_bound = td_metric_ratio(cpu,
+						  STAT_TOPDOWN_FE_BOUND, st,
+						  &rsd);
+		double fetch_lat = td_metric_ratio(cpu,
+						   STAT_TOPDOWN_FETCH_LAT, st,
+						   &rsd);
+		double fetch_bw = fe_bound - fetch_lat;
+
+		if (fe_bound > 0.2 && fetch_lat > 0.15)
+			color = PERF_COLOR_RED;
+		print_metric(config, ctxp, color, "%8.1f%%", "fetch latency",
+				fetch_lat * 100.);
+		if (fe_bound > 0.2 && fetch_bw > 0.1)
+			color = PERF_COLOR_RED;
+		else
+			color = NULL;
+		print_metric(config, ctxp, color, "%8.1f%%", "fetch bandwidth",
+				fetch_bw * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_MEM_BOUND) &&
+			full_td(cpu, st, &rsd) && (config->topdown_level > 1)) {
+		double be_bound = td_metric_ratio(cpu,
+						  STAT_TOPDOWN_BE_BOUND, st,
+						  &rsd);
+		double mem_bound = td_metric_ratio(cpu,
+						   STAT_TOPDOWN_MEM_BOUND, st,
+						   &rsd);
+		double core_bound = be_bound - mem_bound;
+
+		if (be_bound > 0.2 && mem_bound > 0.2)
+			color = PERF_COLOR_RED;
+		print_metric(config, ctxp, color, "%8.1f%%", "memory bound",
+				mem_bound * 100.);
+		if (be_bound > 0.2 && core_bound > 0.1)
+			color = PERF_COLOR_RED;
+		else
+			color = NULL;
+		print_metric(config, ctxp, color, "%8.1f%%", "Core bound",
+				core_bound * 100.);
 	} else if (evsel->metric_expr) {
 		generic_metric(config, evsel->metric_expr, evsel->metric_events, NULL,
 				evsel->name, evsel->metric_name, NULL, 1, cpu, out, st);
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 8ce1479..5d8af29 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -99,6 +99,10 @@
 	ID(TOPDOWN_BAD_SPEC, topdown-bad-spec),
 	ID(TOPDOWN_FE_BOUND, topdown-fe-bound),
 	ID(TOPDOWN_BE_BOUND, topdown-be-bound),
+	ID(TOPDOWN_HEAVY_OPS, topdown-heavy-ops),
+	ID(TOPDOWN_BR_MISPREDICT, topdown-br-mispredict),
+	ID(TOPDOWN_FETCH_LAT, topdown-fetch-lat),
+	ID(TOPDOWN_MEM_BOUND, topdown-mem-bound),
 	ID(SMI_NUM, msr/smi/),
 	ID(APERF, msr/aperf/),
 };
@@ -527,7 +531,7 @@
 	if (leader->core.nr_members > 1)
 		attr->read_format |= PERF_FORMAT_ID|PERF_FORMAT_GROUP;
 
-	attr->inherit = !config->no_inherit;
+	attr->inherit = !config->no_inherit && list_empty(&evsel->bpf_counter_list);
 
 	/*
 	 * Some events get initialized with sample_(period/type) set,
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index b536973..d85c292 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -33,6 +33,10 @@
 	PERF_STAT_EVSEL_ID__TOPDOWN_BAD_SPEC,
 	PERF_STAT_EVSEL_ID__TOPDOWN_FE_BOUND,
 	PERF_STAT_EVSEL_ID__TOPDOWN_BE_BOUND,
+	PERF_STAT_EVSEL_ID__TOPDOWN_HEAVY_OPS,
+	PERF_STAT_EVSEL_ID__TOPDOWN_BR_MISPREDICT,
+	PERF_STAT_EVSEL_ID__TOPDOWN_FETCH_LAT,
+	PERF_STAT_EVSEL_ID__TOPDOWN_MEM_BOUND,
 	PERF_STAT_EVSEL_ID__SMI_NUM,
 	PERF_STAT_EVSEL_ID__APERF,
 	PERF_STAT_EVSEL_ID__MAX,
@@ -91,6 +95,10 @@
 	STAT_TOPDOWN_BAD_SPEC,
 	STAT_TOPDOWN_FE_BOUND,
 	STAT_TOPDOWN_BE_BOUND,
+	STAT_TOPDOWN_HEAVY_OPS,
+	STAT_TOPDOWN_BR_MISPREDICT,
+	STAT_TOPDOWN_FETCH_LAT,
+	STAT_TOPDOWN_MEM_BOUND,
 	STAT_SMI_NUM,
 	STAT_APERF,
 	STAT_MAX
@@ -148,6 +156,7 @@
 	int			 ctl_fd_ack;
 	bool			 ctl_fd_close;
 	const char		*cgroup_list;
+	unsigned int		topdown_level;
 };
 
 void perf_stat__set_big_num(int set);
diff --git a/tools/perf/util/string.c b/tools/perf/util/string.c
index 5260387..f6d90cd 100644
--- a/tools/perf/util/string.c
+++ b/tools/perf/util/string.c
@@ -293,3 +293,12 @@
 
 	return ret;
 }
+
+unsigned int hex(char c)
+{
+	if (c >= '0' && c <= '9')
+		return c - '0';
+	if (c >= 'a' && c <= 'f')
+		return c - 'a' + 10;
+	return c - 'A' + 10;
+}
diff --git a/tools/perf/util/string2.h b/tools/perf/util/string2.h
index 73df616..56c30fe 100644
--- a/tools/perf/util/string2.h
+++ b/tools/perf/util/string2.h
@@ -38,4 +38,6 @@
 char *strpbrk_esc(char *str, const char *stopset);
 char *strdup_esc(const char *str);
 
+unsigned int hex(char c);
+
 #endif /* PERF_STRING_H */
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index f3577f7..6dff843 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -12,6 +12,7 @@
 #include "maps.h"
 #include "symbol.h"
 #include "symsrc.h"
+#include "demangle-ocaml.h"
 #include "demangle-java.h"
 #include "demangle-rust.h"
 #include "machine.h"
@@ -251,8 +252,12 @@
 	    return demangled;
 
 	demangled = bfd_demangle(NULL, elf_name, demangle_flags);
-	if (demangled == NULL)
-		demangled = java_demangle_sym(elf_name, JAVA_DEMANGLE_NORET);
+	if (demangled == NULL) {
+		demangled = ocaml_demangle_sym(elf_name);
+		if (demangled == NULL) {
+			demangled = java_demangle_sym(elf_name, JAVA_DEMANGLE_NORET);
+		}
+	}
 	else if (rust_is_mangled(demangled))
 		/*
 		    * Input to Rust demangling is the BFD-demangled
@@ -1226,12 +1231,26 @@
 		if (sym.st_shndx == SHN_ABS)
 			continue;
 
-		sec = elf_getscn(runtime_ss->elf, sym.st_shndx);
+		sec = elf_getscn(syms_ss->elf, sym.st_shndx);
 		if (!sec)
 			goto out_elf_end;
 
 		gelf_getshdr(sec, &shdr);
 
+		/*
+		 * We have to fallback to runtime when syms' section header has
+		 * NOBITS set. NOBITS results in file offset (sh_offset) not
+		 * being incremented. So sh_offset used below has different
+		 * values for syms (invalid) and runtime (valid).
+		 */
+		if (shdr.sh_type == SHT_NOBITS) {
+			sec = elf_getscn(runtime_ss->elf, sym.st_shndx);
+			if (!sec)
+				goto out_elf_end;
+
+			gelf_getshdr(sec, &shdr);
+		}
+
 		if (is_label && !elf_sec__filter(&shdr, secstrs))
 			continue;
 
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 64a039c..77fc46ca0 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1561,15 +1561,14 @@
 int dso__load_bfd_symbols(struct dso *dso, const char *debugfile)
 {
 	int err = -1;
-	long symbols_size, symbols_count;
+	long symbols_size, symbols_count, i;
 	asection *section;
 	asymbol **symbols, *sym;
 	struct symbol *symbol;
 	bfd *abfd;
-	u_int i;
 	u64 start, len;
 
-	abfd = bfd_openr(dso->long_name, NULL);
+	abfd = bfd_openr(debugfile, NULL);
 	if (!abfd)
 		return -1;
 
@@ -1586,21 +1585,6 @@
 	if (section)
 		dso->text_offset = section->vma - section->filepos;
 
-	bfd_close(abfd);
-
-	abfd = bfd_openr(debugfile, NULL);
-	if (!abfd)
-		return -1;
-
-	if (!bfd_check_format(abfd, bfd_object)) {
-		pr_debug2("%s: cannot read %s bfd file.\n", __func__,
-			  debugfile);
-		goto out_close;
-	}
-
-	if (bfd_get_flavour(abfd) == bfd_target_elf_flavour)
-		goto out_close;
-
 	symbols_size = bfd_get_symtab_upper_bound(abfd);
 	if (symbols_size == 0) {
 		bfd_close(abfd);
@@ -1867,8 +1851,10 @@
 		if (nsexit)
 			nsinfo__mountns_enter(dso->nsinfo, &nsc);
 
-		if (bfdrc == 0)
+		if (bfdrc == 0) {
+			ret = 0;
 			break;
+		}
 
 		if (!is_reg || sirc < 0)
 			continue;
@@ -2406,6 +2392,49 @@
 	return 0;
 }
 
+static int setup_addrlist(struct intlist **addr_list, struct strlist *sym_list)
+{
+	struct str_node *pos, *tmp;
+	unsigned long val;
+	char *sep;
+	const char *end;
+	int i = 0, err;
+
+	*addr_list = intlist__new(NULL);
+	if (!*addr_list)
+		return -1;
+
+	strlist__for_each_entry_safe(pos, tmp, sym_list) {
+		errno = 0;
+		val = strtoul(pos->s, &sep, 16);
+		if (errno || (sep == pos->s))
+			continue;
+
+		if (*sep != '\0') {
+			end = pos->s + strlen(pos->s) - 1;
+			while (end >= sep && isspace(*end))
+				end--;
+
+			if (end >= sep)
+				continue;
+		}
+
+		err = intlist__add(*addr_list, val);
+		if (err)
+			break;
+
+		strlist__remove(sym_list, pos);
+		i++;
+	}
+
+	if (i == 0) {
+		intlist__delete(*addr_list);
+		*addr_list = NULL;
+	}
+
+	return 0;
+}
+
 static bool symbol__read_kptr_restrict(void)
 {
 	bool value = false;
@@ -2489,6 +2518,10 @@
 		       symbol_conf.sym_list_str, "symbol") < 0)
 		goto out_free_tid_list;
 
+	if (symbol_conf.sym_list &&
+	    setup_addrlist(&symbol_conf.addr_list, symbol_conf.sym_list) < 0)
+		goto out_free_sym_list;
+
 	if (setup_list(&symbol_conf.bt_stop_list,
 		       symbol_conf.bt_stop_list_str, "symbol") < 0)
 		goto out_free_sym_list;
@@ -2512,6 +2545,7 @@
 
 out_free_sym_list:
 	strlist__delete(symbol_conf.sym_list);
+	intlist__delete(symbol_conf.addr_list);
 out_free_tid_list:
 	intlist__delete(symbol_conf.tid_list);
 out_free_pid_list:
@@ -2533,6 +2567,7 @@
 	strlist__delete(symbol_conf.comm_list);
 	intlist__delete(symbol_conf.tid_list);
 	intlist__delete(symbol_conf.pid_list);
+	intlist__delete(symbol_conf.addr_list);
 	vmlinux_path__exit();
 	symbol_conf.sym_list = symbol_conf.dso_list = symbol_conf.comm_list = NULL;
 	symbol_conf.bt_stop_list = NULL;
diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
index b916afb..a70b3ec 100644
--- a/tools/perf/util/symbol_conf.h
+++ b/tools/perf/util/symbol_conf.h
@@ -42,7 +42,8 @@
 			report_block,
 			report_individual_block,
 			inline_name,
-			disable_add2line_warn;
+			disable_add2line_warn,
+			buildid_mmap2;
 	const char	*vmlinux_name,
 			*kallsyms_name,
 			*source_prefix,
@@ -69,11 +70,13 @@
 			*sym_to_list,
 			*bt_stop_list;
 	struct intlist	*pid_list,
-			*tid_list;
+			*tid_list,
+			*addr_list;
 	const char	*symfs;
 	int		res_sample;
 	int		pad_output_len_dso;
 	int		group_sort_idx;
+	int		addr_range;
 };
 
 extern struct symbol_conf symbol_conf;
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 2947e3f..b698046 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -24,7 +24,6 @@
 #include <linux/perf_event.h>
 #include <asm/bug.h>
 #include <perf/evsel.h>
-#include <internal/cpumap.h>
 #include <perf/cpumap.h>
 #include <internal/lib.h> // page_size
 #include <internal/threadmap.h>
@@ -69,19 +68,22 @@
  * Assumes that the first 4095 bytes of /proc/pid/stat contains
  * the comm, tgid and ppid.
  */
-static int perf_event__get_comm_ids(pid_t pid, char *comm, size_t len,
-				    pid_t *tgid, pid_t *ppid)
+static int perf_event__get_comm_ids(pid_t pid, pid_t tid, char *comm, size_t len,
+				    pid_t *tgid, pid_t *ppid, bool *kernel)
 {
 	char bf[4096];
 	int fd;
 	size_t size = 0;
 	ssize_t n;
-	char *name, *tgids, *ppids;
+	char *name, *tgids, *ppids, *vmpeak, *threads;
 
 	*tgid = -1;
 	*ppid = -1;
 
-	snprintf(bf, sizeof(bf), "/proc/%d/status", pid);
+	if (pid)
+		snprintf(bf, sizeof(bf), "/proc/%d/task/%d/status", pid, tid);
+	else
+		snprintf(bf, sizeof(bf), "/proc/%d/status", tid);
 
 	fd = open(bf, O_RDONLY);
 	if (fd < 0) {
@@ -93,14 +95,20 @@
 	close(fd);
 	if (n <= 0) {
 		pr_warning("Couldn't get COMM, tigd and ppid for pid %d\n",
-			   pid);
+			   tid);
 		return -1;
 	}
 	bf[n] = '\0';
 
 	name = strstr(bf, "Name:");
-	tgids = strstr(bf, "Tgid:");
-	ppids = strstr(bf, "PPid:");
+	tgids = strstr(name ?: bf, "Tgid:");
+	ppids = strstr(tgids ?: bf, "PPid:");
+	vmpeak = strstr(ppids ?: bf, "VmPeak:");
+
+	if (vmpeak)
+		threads = NULL;
+	else
+		threads = strstr(ppids ?: bf, "Threads:");
 
 	if (name) {
 		char *nl;
@@ -116,29 +124,34 @@
 		memcpy(comm, name, size);
 		comm[size] = '\0';
 	} else {
-		pr_debug("Name: string not found for pid %d\n", pid);
+		pr_debug("Name: string not found for pid %d\n", tid);
 	}
 
 	if (tgids) {
 		tgids += 5;  /* strlen("Tgid:") */
 		*tgid = atoi(tgids);
 	} else {
-		pr_debug("Tgid: string not found for pid %d\n", pid);
+		pr_debug("Tgid: string not found for pid %d\n", tid);
 	}
 
 	if (ppids) {
 		ppids += 5;  /* strlen("PPid:") */
 		*ppid = atoi(ppids);
 	} else {
-		pr_debug("PPid: string not found for pid %d\n", pid);
+		pr_debug("PPid: string not found for pid %d\n", tid);
 	}
 
+	if (!vmpeak && threads)
+		*kernel = true;
+	else
+		*kernel = false;
+
 	return 0;
 }
 
-static int perf_event__prepare_comm(union perf_event *event, pid_t pid,
+static int perf_event__prepare_comm(union perf_event *event, pid_t pid, pid_t tid,
 				    struct machine *machine,
-				    pid_t *tgid, pid_t *ppid)
+				    pid_t *tgid, pid_t *ppid, bool *kernel)
 {
 	size_t size;
 
@@ -147,9 +160,9 @@
 	memset(&event->comm, 0, sizeof(event->comm));
 
 	if (machine__is_host(machine)) {
-		if (perf_event__get_comm_ids(pid, event->comm.comm,
+		if (perf_event__get_comm_ids(pid, tid, event->comm.comm,
 					     sizeof(event->comm.comm),
-					     tgid, ppid) != 0) {
+					     tgid, ppid, kernel) != 0) {
 			return -1;
 		}
 	} else {
@@ -168,7 +181,7 @@
 	event->comm.header.size = (sizeof(event->comm) -
 				(sizeof(event->comm.comm) - size) +
 				machine->id_hdr_size);
-	event->comm.tid = pid;
+	event->comm.tid = tid;
 
 	return 0;
 }
@@ -179,8 +192,10 @@
 					 struct machine *machine)
 {
 	pid_t tgid, ppid;
+	bool kernel_thread;
 
-	if (perf_event__prepare_comm(event, pid, machine, &tgid, &ppid) != 0)
+	if (perf_event__prepare_comm(event, 0, pid, machine, &tgid, &ppid,
+				     &kernel_thread) != 0)
 		return -1;
 
 	if (perf_tool__process_synth_event(tool, event, machine, process) != 0)
@@ -347,6 +362,31 @@
 	}
 }
 
+static void perf_record_mmap2__read_build_id(struct perf_record_mmap2 *event,
+					     bool is_kernel)
+{
+	struct build_id bid;
+	int rc;
+
+	if (is_kernel)
+		rc = sysfs__read_build_id("/sys/kernel/notes", &bid);
+	else
+		rc = filename__read_build_id(event->filename, &bid) > 0 ? 0 : -1;
+
+	if (rc == 0) {
+		memcpy(event->build_id, bid.data, sizeof(bid.data));
+		event->build_id_size = (u8) bid.size;
+		event->header.misc |= PERF_RECORD_MISC_MMAP_BUILD_ID;
+		event->__reserved_1 = 0;
+		event->__reserved_2 = 0;
+	} else {
+		if (event->filename[0] == '/') {
+			pr_debug2("Failed to read build ID for %s\n",
+				  event->filename);
+		}
+	}
+}
+
 int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 				       union perf_event *event,
 				       pid_t pid, pid_t tgid,
@@ -453,6 +493,9 @@
 		event->mmap2.pid = tgid;
 		event->mmap2.tid = pid;
 
+		if (symbol_conf.buildid_mmap2)
+			perf_record_mmap2__read_build_id(&event->mmap2, false);
+
 		if (perf_tool__process_synth_event(tool, event, machine, process) != 0) {
 			rc = -1;
 			break;
@@ -596,16 +639,17 @@
 	int rc = 0;
 	struct map *pos;
 	struct maps *maps = machine__kernel_maps(machine);
-	union perf_event *event = zalloc((sizeof(event->mmap) +
-					  machine->id_hdr_size));
+	union perf_event *event;
+	size_t size = symbol_conf.buildid_mmap2 ?
+			sizeof(event->mmap2) : sizeof(event->mmap);
+
+	event = zalloc(size + machine->id_hdr_size);
 	if (event == NULL) {
 		pr_debug("Not enough memory synthesizing mmap event "
 			 "for kernel modules\n");
 		return -1;
 	}
 
-	event->header.type = PERF_RECORD_MMAP;
-
 	/*
 	 * kernel uses 0 for user space maps, see kernel/perf_event.c
 	 * __perf_event_mmap
@@ -616,23 +660,39 @@
 		event->header.misc = PERF_RECORD_MISC_GUEST_KERNEL;
 
 	maps__for_each_entry(maps, pos) {
-		size_t size;
-
 		if (!__map__is_kmodule(pos))
 			continue;
 
-		size = PERF_ALIGN(pos->dso->long_name_len + 1, sizeof(u64));
-		event->mmap.header.type = PERF_RECORD_MMAP;
-		event->mmap.header.size = (sizeof(event->mmap) -
-				        (sizeof(event->mmap.filename) - size));
-		memset(event->mmap.filename + size, 0, machine->id_hdr_size);
-		event->mmap.header.size += machine->id_hdr_size;
-		event->mmap.start = pos->start;
-		event->mmap.len   = pos->end - pos->start;
-		event->mmap.pid   = machine->pid;
+		if (symbol_conf.buildid_mmap2) {
+			size = PERF_ALIGN(pos->dso->long_name_len + 1, sizeof(u64));
+			event->mmap2.header.type = PERF_RECORD_MMAP2;
+			event->mmap2.header.size = (sizeof(event->mmap2) -
+						(sizeof(event->mmap2.filename) - size));
+			memset(event->mmap2.filename + size, 0, machine->id_hdr_size);
+			event->mmap2.header.size += machine->id_hdr_size;
+			event->mmap2.start = pos->start;
+			event->mmap2.len   = pos->end - pos->start;
+			event->mmap2.pid   = machine->pid;
 
-		memcpy(event->mmap.filename, pos->dso->long_name,
-		       pos->dso->long_name_len + 1);
+			memcpy(event->mmap2.filename, pos->dso->long_name,
+			       pos->dso->long_name_len + 1);
+
+			perf_record_mmap2__read_build_id(&event->mmap2, false);
+		} else {
+			size = PERF_ALIGN(pos->dso->long_name_len + 1, sizeof(u64));
+			event->mmap.header.type = PERF_RECORD_MMAP;
+			event->mmap.header.size = (sizeof(event->mmap) -
+						(sizeof(event->mmap.filename) - size));
+			memset(event->mmap.filename + size, 0, machine->id_hdr_size);
+			event->mmap.header.size += machine->id_hdr_size;
+			event->mmap.start = pos->start;
+			event->mmap.len   = pos->end - pos->start;
+			event->mmap.pid   = machine->pid;
+
+			memcpy(event->mmap.filename, pos->dso->long_name,
+			       pos->dso->long_name_len + 1);
+		}
+
 		if (perf_tool__process_synth_event(tool, event, machine, process) != 0) {
 			rc = -1;
 			break;
@@ -643,6 +703,11 @@
 	return rc;
 }
 
+static int filter_task(const struct dirent *dirent)
+{
+	return isdigit(dirent->d_name[0]);
+}
+
 static int __event__synthesize_thread(union perf_event *comm_event,
 				      union perf_event *mmap_event,
 				      union perf_event *fork_event,
@@ -651,10 +716,10 @@
 				      struct perf_tool *tool, struct machine *machine, bool mmap_data)
 {
 	char filename[PATH_MAX];
-	DIR *tasks;
-	struct dirent *dirent;
+	struct dirent **dirent;
 	pid_t tgid, ppid;
 	int rc = 0;
+	int i, n;
 
 	/* special case: only send one comm event using passed in pid */
 	if (!full) {
@@ -686,23 +751,22 @@
 	snprintf(filename, sizeof(filename), "%s/proc/%d/task",
 		 machine->root_dir, pid);
 
-	tasks = opendir(filename);
-	if (tasks == NULL) {
-		pr_debug("couldn't open %s\n", filename);
-		return 0;
-	}
+	n = scandir(filename, &dirent, filter_task, alphasort);
+	if (n < 0)
+		return n;
 
-	while ((dirent = readdir(tasks)) != NULL) {
+	for (i = 0; i < n; i++) {
 		char *end;
 		pid_t _pid;
+		bool kernel_thread;
 
-		_pid = strtol(dirent->d_name, &end, 10);
+		_pid = strtol(dirent[i]->d_name, &end, 10);
 		if (*end)
 			continue;
 
 		rc = -1;
-		if (perf_event__prepare_comm(comm_event, _pid, machine,
-					     &tgid, &ppid) != 0)
+		if (perf_event__prepare_comm(comm_event, pid, _pid, machine,
+					     &tgid, &ppid, &kernel_thread) != 0)
 			break;
 
 		if (perf_event__synthesize_fork(tool, fork_event, _pid, tgid,
@@ -720,7 +784,7 @@
 			break;
 
 		rc = 0;
-		if (_pid == pid) {
+		if (_pid == pid && !kernel_thread) {
 			/* process the parent's maps too */
 			rc = perf_event__synthesize_mmap_events(tool, mmap_event, pid, tgid,
 						process, machine, mmap_data);
@@ -729,7 +793,10 @@
 		}
 	}
 
-	closedir(tasks);
+	for (i = 0; i < n; i++)
+		zfree(&dirent[i]);
+	free(dirent);
+
 	return rc;
 }
 
@@ -914,7 +981,7 @@
 		return 0;
 
 	snprintf(proc_path, sizeof(proc_path), "%s/proc", machine->root_dir);
-	n = scandir(proc_path, &dirent, 0, alphasort);
+	n = scandir(proc_path, &dirent, filter_task, alphasort);
 	if (n < 0)
 		return err;
 
@@ -991,11 +1058,12 @@
 						perf_event__handler_t process,
 						struct machine *machine)
 {
-	size_t size;
+	union perf_event *event;
+	size_t size = symbol_conf.buildid_mmap2 ?
+			sizeof(event->mmap2) : sizeof(event->mmap);
 	struct map *map = machine__kernel_map(machine);
 	struct kmap *kmap;
 	int err;
-	union perf_event *event;
 
 	if (map == NULL)
 		return -1;
@@ -1009,7 +1077,7 @@
 	 * available use this, and after it is use this as a fallback for older
 	 * kernels.
 	 */
-	event = zalloc((sizeof(event->mmap) + machine->id_hdr_size));
+	event = zalloc(size + machine->id_hdr_size);
 	if (event == NULL) {
 		pr_debug("Not enough memory synthesizing mmap event "
 			 "for kernel modules\n");
@@ -1026,16 +1094,31 @@
 		event->header.misc = PERF_RECORD_MISC_GUEST_KERNEL;
 	}
 
-	size = snprintf(event->mmap.filename, sizeof(event->mmap.filename),
-			"%s%s", machine->mmap_name, kmap->ref_reloc_sym->name) + 1;
-	size = PERF_ALIGN(size, sizeof(u64));
-	event->mmap.header.type = PERF_RECORD_MMAP;
-	event->mmap.header.size = (sizeof(event->mmap) -
-			(sizeof(event->mmap.filename) - size) + machine->id_hdr_size);
-	event->mmap.pgoff = kmap->ref_reloc_sym->addr;
-	event->mmap.start = map->start;
-	event->mmap.len   = map->end - event->mmap.start;
-	event->mmap.pid   = machine->pid;
+	if (symbol_conf.buildid_mmap2) {
+		size = snprintf(event->mmap2.filename, sizeof(event->mmap2.filename),
+				"%s%s", machine->mmap_name, kmap->ref_reloc_sym->name) + 1;
+		size = PERF_ALIGN(size, sizeof(u64));
+		event->mmap2.header.type = PERF_RECORD_MMAP2;
+		event->mmap2.header.size = (sizeof(event->mmap2) -
+				(sizeof(event->mmap2.filename) - size) + machine->id_hdr_size);
+		event->mmap2.pgoff = kmap->ref_reloc_sym->addr;
+		event->mmap2.start = map->start;
+		event->mmap2.len   = map->end - event->mmap.start;
+		event->mmap2.pid   = machine->pid;
+
+		perf_record_mmap2__read_build_id(&event->mmap2, true);
+	} else {
+		size = snprintf(event->mmap.filename, sizeof(event->mmap.filename),
+				"%s%s", machine->mmap_name, kmap->ref_reloc_sym->name) + 1;
+		size = PERF_ALIGN(size, sizeof(u64));
+		event->mmap.header.type = PERF_RECORD_MMAP;
+		event->mmap.header.size = (sizeof(event->mmap) -
+				(sizeof(event->mmap.filename) - size) + machine->id_hdr_size);
+		event->mmap.pgoff = kmap->ref_reloc_sym->addr;
+		event->mmap.start = map->start;
+		event->mmap.len   = map->end - event->mmap.start;
+		event->mmap.pid   = machine->pid;
+	}
 
 	err = perf_tool__process_synth_event(tool, event, machine, process);
 	free(event);
@@ -1384,7 +1467,7 @@
 		}
 	}
 
-	if (type & PERF_SAMPLE_WEIGHT)
+	if (type & PERF_SAMPLE_WEIGHT_TYPE)
 		result += sizeof(u64);
 
 	if (type & PERF_SAMPLE_DATA_SRC)
@@ -1412,6 +1495,9 @@
 	if (type & PERF_SAMPLE_DATA_PAGE_SIZE)
 		result += sizeof(u64);
 
+	if (type & PERF_SAMPLE_CODE_PAGE_SIZE)
+		result += sizeof(u64);
+
 	if (type & PERF_SAMPLE_AUX) {
 		result += sizeof(u64);
 		result += sample->aux_sample.size;
@@ -1420,6 +1506,12 @@
 	return result;
 }
 
+void __weak arch_perf_synthesize_sample_weight(const struct perf_sample *data,
+					       __u64 *array, u64 type __maybe_unused)
+{
+	*array = data->weight;
+}
+
 int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format,
 				  const struct perf_sample *sample)
 {
@@ -1555,8 +1647,8 @@
 		}
 	}
 
-	if (type & PERF_SAMPLE_WEIGHT) {
-		*array = sample->weight;
+	if (type & PERF_SAMPLE_WEIGHT_TYPE) {
+		arch_perf_synthesize_sample_weight(sample, array, type);
 		array++;
 	}
 
@@ -1596,6 +1688,11 @@
 		array++;
 	}
 
+	if (type & PERF_SAMPLE_CODE_PAGE_SIZE) {
+		*array = sample->code_page_size;
+		array++;
+	}
+
 	if (type & PERF_SAMPLE_AUX) {
 		sz = sample->aux_sample.size;
 		*array++ = sz;
diff --git a/tools/perf/util/target.c b/tools/perf/util/target.c
index a3db13d..0f38341 100644
--- a/tools/perf/util/target.c
+++ b/tools/perf/util/target.c
@@ -56,6 +56,34 @@
 			ret = TARGET_ERRNO__UID_OVERRIDE_SYSTEM;
 	}
 
+	/* BPF and CPU are mutually exclusive */
+	if (target->bpf_str && target->cpu_list) {
+		target->cpu_list = NULL;
+		if (ret == TARGET_ERRNO__SUCCESS)
+			ret = TARGET_ERRNO__BPF_OVERRIDE_CPU;
+	}
+
+	/* BPF and PID/TID are mutually exclusive */
+	if (target->bpf_str && target->tid) {
+		target->tid = NULL;
+		if (ret == TARGET_ERRNO__SUCCESS)
+			ret = TARGET_ERRNO__BPF_OVERRIDE_PID;
+	}
+
+	/* BPF and UID are mutually exclusive */
+	if (target->bpf_str && target->uid_str) {
+		target->uid_str = NULL;
+		if (ret == TARGET_ERRNO__SUCCESS)
+			ret = TARGET_ERRNO__BPF_OVERRIDE_UID;
+	}
+
+	/* BPF and THREADS are mutually exclusive */
+	if (target->bpf_str && target->per_thread) {
+		target->per_thread = false;
+		if (ret == TARGET_ERRNO__SUCCESS)
+			ret = TARGET_ERRNO__BPF_OVERRIDE_THREAD;
+	}
+
 	/* THREAD and SYSTEM/CPU are mutually exclusive */
 	if (target->per_thread && (target->system_wide || target->cpu_list)) {
 		target->per_thread = false;
@@ -109,6 +137,10 @@
 	"PID/TID switch overriding SYSTEM",
 	"UID switch overriding SYSTEM",
 	"SYSTEM/CPU switch overriding PER-THREAD",
+	"BPF switch overriding CPU",
+	"BPF switch overriding PID/TID",
+	"BPF switch overriding UID",
+	"BPF switch overriding THREAD",
 	"Invalid User: %s",
 	"Problems obtaining information for user %s",
 };
@@ -134,7 +166,7 @@
 
 	switch (errnum) {
 	case TARGET_ERRNO__PID_OVERRIDE_CPU ...
-	     TARGET_ERRNO__SYSTEM_OVERRIDE_THREAD:
+	     TARGET_ERRNO__BPF_OVERRIDE_THREAD:
 		snprintf(buf, buflen, "%s", msg);
 		break;
 
diff --git a/tools/perf/util/target.h b/tools/perf/util/target.h
index 6ef01a8..f132c6c 100644
--- a/tools/perf/util/target.h
+++ b/tools/perf/util/target.h
@@ -10,6 +10,7 @@
 	const char   *tid;
 	const char   *cpu_list;
 	const char   *uid_str;
+	const char   *bpf_str;
 	uid_t	     uid;
 	bool	     system_wide;
 	bool	     uses_mmap;
@@ -36,6 +37,10 @@
 	TARGET_ERRNO__PID_OVERRIDE_SYSTEM,
 	TARGET_ERRNO__UID_OVERRIDE_SYSTEM,
 	TARGET_ERRNO__SYSTEM_OVERRIDE_THREAD,
+	TARGET_ERRNO__BPF_OVERRIDE_CPU,
+	TARGET_ERRNO__BPF_OVERRIDE_PID,
+	TARGET_ERRNO__BPF_OVERRIDE_UID,
+	TARGET_ERRNO__BPF_OVERRIDE_THREAD,
 
 	/* for target__parse_uid() */
 	TARGET_ERRNO__INVALID_UID,
@@ -59,6 +64,11 @@
 	return target->system_wide || target->cpu_list;
 }
 
+static inline bool target__has_bpf(struct target *target)
+{
+	return target->bpf_str;
+}
+
 static inline bool target__none(struct target *target)
 {
 	return !target__has_task(target) && !target__has_cpu(target);
diff --git a/tools/perf/util/trace-event-info.c b/tools/perf/util/trace-event-info.c
index 0e5c478..a65f65d 100644
--- a/tools/perf/util/trace-event-info.c
+++ b/tools/perf/util/trace-event-info.c
@@ -152,7 +152,7 @@
 	return false;
 }
 
-#define for_each_event(dir, dent, tps)				\
+#define for_each_event_tps(dir, dent, tps)			\
 	while ((dent = readdir(dir)))				\
 		if (dent->d_type == DT_DIR &&			\
 		    (strcmp(dent->d_name, ".")) &&		\
@@ -174,7 +174,7 @@
 		return -errno;
 	}
 
-	for_each_event(dir, dent, tps) {
+	for_each_event_tps(dir, dent, tps) {
 		if (!name_in_tp_list(dent->d_name, tps))
 			continue;
 
@@ -196,7 +196,7 @@
 	}
 
 	rewinddir(dir);
-	for_each_event(dir, dent, tps) {
+	for_each_event_tps(dir, dent, tps) {
 		if (!name_in_tp_list(dent->d_name, tps))
 			continue;
 
@@ -274,7 +274,7 @@
 		goto out;
 	}
 
-	for_each_event(dir, dent, tps) {
+	for_each_event_tps(dir, dent, tps) {
 		if (strcmp(dent->d_name, "ftrace") == 0 ||
 		    !system_in_tp_list(dent->d_name, tps))
 			continue;
@@ -289,7 +289,7 @@
 	}
 
 	rewinddir(dir);
-	for_each_event(dir, dent, tps) {
+	for_each_event_tps(dir, dent, tps) {
 		if (strcmp(dent->d_name, "ftrace") == 0 ||
 		    !system_in_tp_list(dent->d_name, tps))
 			continue;
diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c
index 0ada907..a74b517 100644
--- a/tools/perf/util/unwind-libdw.c
+++ b/tools/perf/util/unwind-libdw.c
@@ -60,10 +60,8 @@
 	mod = dwfl_addrmodule(ui->dwfl, ip);
 	if (mod) {
 		Dwarf_Addr s;
-		void **userdatap;
 
-		dwfl_module_info(mod, &userdatap, &s, NULL, NULL, NULL, NULL, NULL);
-		*userdatap = dso;
+		dwfl_module_info(mod, NULL, &s, NULL, NULL, NULL, NULL, NULL);
 		if (s != al->map->start - al->map->pgoff)
 			mod = 0;
 	}
@@ -79,6 +77,13 @@
 					      al->map->start - al->map->pgoff, false);
 	}
 
+	if (mod) {
+		void **userdatap;
+
+		dwfl_module_info(mod, &userdatap, NULL, NULL, NULL, NULL, NULL, NULL);
+		*userdatap = dso;
+	}
+
 	return mod && dwfl_addrmodule(ui->dwfl, ip) == mod ? 0 : -1;
 }
 
diff --git a/tools/perf/util/xyarray.c b/tools/perf/util/xyarray.c
deleted file mode 100644
index 86889eb..0000000
--- a/tools/perf/util/xyarray.c
+++ /dev/null
@@ -1,33 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include "xyarray.h"
-#include <stdlib.h>
-#include <string.h>
-#include <linux/zalloc.h>
-
-struct xyarray *xyarray__new(int xlen, int ylen, size_t entry_size)
-{
-	size_t row_size = ylen * entry_size;
-	struct xyarray *xy = zalloc(sizeof(*xy) + xlen * row_size);
-
-	if (xy != NULL) {
-		xy->entry_size = entry_size;
-		xy->row_size   = row_size;
-		xy->entries    = xlen * ylen;
-		xy->max_x      = xlen;
-		xy->max_y      = ylen;
-	}
-
-	return xy;
-}
-
-void xyarray__reset(struct xyarray *xy)
-{
-	size_t n = xy->entries * xy->entry_size;
-
-	memset(xy->contents, 0, n);
-}
-
-void xyarray__delete(struct xyarray *xy)
-{
-	free(xy);
-}
diff --git a/tools/scripts/Makefile.include b/tools/scripts/Makefile.include
index 4255e71..a402f32a 100644
--- a/tools/scripts/Makefile.include
+++ b/tools/scripts/Makefile.include
@@ -134,6 +134,7 @@
 			 $(MAKE) $(PRINT_DIR) -C $$subdir
 	QUIET_FLEX     = @echo '  FLEX     '$@;
 	QUIET_BISON    = @echo '  BISON    '$@;
+	QUIET_GENSKEL  = @echo '  GEN-SKEL '$@;
 
 	descend = \
 		+@echo	       '  DESCEND  '$(1); \