Blob Blame History Raw
commit 4b1497d8befcc4c8b26dc4e4866c3422ae8787c3
Author: Andi Kleen <ak@linux.intel.com>
Date:   Thu Oct 10 13:12:28 2013 -0500

    Add support for Intel Silvermont processor
    
    Just add the event list for Intel Silvermont based systems
    (Avoton, BayTrail) and the usual changes for a new CPU.
    No new code otherwise.
    
    The model number list is incomplete at this point, more will
    be added in the future.
    
    I also finally removed the top level event list descriptions.
    All the events are only described in the unit masks now
    (Intel doesn't really have a top level event, and I had
    to invent descriptions, which was error prone and
    often wrong)
    
    I also removed some outdated document number references.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>

diff --git a/events/Makefile.am b/events/Makefile.am
index d91d44b..3028c2f 100644
--- a/events/Makefile.am
+++ b/events/Makefile.am
@@ -21,6 +21,7 @@ event_files = \
 	i386/sandybridge/events i386/sandybridge/unit_masks \
 	i386/ivybridge/events i386/ivybridge/unit_masks \
 	i386/haswell/events i386/haswell/unit_masks \
+	i386/silvermont/events i386/silvermont/unit_masks \
 	ia64/ia64/events ia64/ia64/unit_masks \
 	ia64/itanium2/events ia64/itanium2/unit_masks \
 	ia64/itanium/events ia64/itanium/unit_masks \
diff --git a/events/i386/silvermont/events b/events/i386/silvermont/events
new file mode 100644
index 0000000..077cc0a
--- /dev/null
+++ b/events/i386/silvermont/events
@@ -0,0 +1,26 @@
+#
+# Intel "Silvermont" microarchitecture core events.
+#
+# See http://ark.intel.com/ for help in identifying Silvermont based CPUs
+#
+# Note the minimum counts are not discovered experimentally and could be likely
+# lowered in many cases without ill effect.
+#
+include:i386/arch_perfmon
+event:0x32 counters:0,1 um:l2_prefetcher_throttle minimum:200003 name:l2_prefetcher_throttle :
+event:0x3e counters:0,1 um:one minimum:200003 name:l2_prefetcher_pref_stream_alloc :
+event:0x50 counters:0,1 um:zero minimum:200003 name:l2_prefetch_pend_streams_pref_stream_pend_set :
+event:0x86 counters:0,1 um:nip_stall minimum:200003 name:nip_stall :
+event:0x87 counters:0,1 um:decode_stall minimum:200003 name:decode_stall :
+event:0x96 counters:0,1 um:uip_match minimum:200003 name:uip_match :
+event:0xc2 counters:0,1 um:uops_retired minimum:2000003 name:uops_retired :
+event:0xc3 counters:0,1 um:x10 minimum:200003 name:machine_clears_live_lock_breaker :
+event:0xc4 counters:0,1 um:br_inst_retired minimum:2000003 name:br_inst_retired :
+event:0xc5 counters:0,1 um:br_misp_retired minimum:200003 name:br_misp_retired :
+event:0xca counters:0,1 um:no_alloc_cycles minimum:200003 name:no_alloc_cycles :
+event:0xcb counters:0,1 um:rs_full_stall minimum:200003 name:rs_full_stall :
+event:0xcc counters:0,1 um:rs_dispatch_stall minimum:200003 name:rs_dispatch_stall :
+event:0xe6 counters:0,1 um:baclears minimum:2000003 name:baclears :
+event:0xe7 counters:0,1 um:x02 minimum:200003 name:ms_decoded_early_exit :
+event:0xe8 counters:0,1 um:one minimum:200003 name:btclears_all :
+event:0xe9 counters:0,1 um:decode_restriction minimum:200003 name:decode_restriction :
diff --git a/events/i386/silvermont/unit_masks b/events/i386/silvermont/unit_masks
new file mode 100644
index 0000000..6309282
--- /dev/null
+++ b/events/i386/silvermont/unit_masks
@@ -0,0 +1,71 @@
+#
+# Unit masks for the Intel "Silvermont" micro architecture
+#
+# See http://ark.intel.com/ for help in identifying Silvermont based CPUs
+#
+include:i386/arch_perfmon
+name:x02 type:mandatory default:0x2
+	0x2 No unit mask
+name:x10 type:mandatory default:0x10
+	0x10 No unit mask
+name:l2_prefetcher_throttle type:exclusive default:0x2
+	0x2 extra:edge conservative Counts the number of cycles the L2 prefetcher spends in throttling mode
+	0x1 extra:edge aggressive Counts the number of cycles the L2 prefetcher spends in throttling mode
+name:nip_stall type:exclusive default:0x3f
+	0x3f extra: all Counts the number of cycles the NIP stalls.
+	0x1 extra: pfb_full Counts the number of cycles the NIP stalls and the PFBs are full.   This DOES NOT inlude PFB throttler cases.
+	0x2 extra: itlb_miss Counts the number of cycles the NIP stalls and there is an outstanding ITLB miss. This is a cummulative count of cycles the NIP stalled for all ITLB misses.
+	0x8 extra: pfb_throttler Counts the number of cycles the NIP stalls, the throttler is engaged, and the PFBs appear full.
+	0x10 extra: do_snoop Counts the number of cycles the NIP stalls because of a SMC compliance snoop to the MEC is required.
+	0x20 extra: misc_other Counts the number of cycles the NIP stalls due to NUKE, Stop Front End, Inserted flows.
+	0x1e extra: pfb_ready Counts the number of cycles the NIP stalls when the PFBs are not full and the decoders are able to process bytes.  Does not count PFB_FULL nor MISC_OTHER stall cycles.
+name:decode_stall type:exclusive default:0x1
+	0x1 extra: pfb_empty Counts the number of cycles decoder is stalled because the PFB is empty, this count is useful to see if the decoder is receiving the bytes from the front end. This event together with the DECODE_STALL.IQ_FULL may be used to narrow down on the bottleneck.
+	0x2 extra: iq_full Counts the number of cycles decoder is stalled because the IQ is full, this count is useful to see if the decoder is delivering the decoded uops. This event together with the DECODE_STALL.PFB_EMPTY may be used to narrow down on the bottleneck.
+name:uip_match type:exclusive default:0x1
+	0x1 extra: first_uip This event is used for counting the number of times a specific micro IP address was decoded
+	0x2 extra: second_uip This event is used for counting the number of times a specific micro IP address was decoded
+name:uops_retired type:exclusive default:0x2
+	0x2 extra: x87 This event counts the number of micro-ops retired that used X87 hardware.
+	0x4 extra: mul This event counts the number of micro-ops retired that used MUL hardware.
+	0x8 extra: div This event counts the number of micro-ops retired that used DIV hardware.
+	0x1 extra: ms_cyles Counts the number of uops that are from the complex flows issued by the micro-sequencer (MS).  This includes uops from flows due to faults, assists, and inserted flows.
+name:br_inst_retired type:exclusive default:0x1
+	0x1 extra: remove_jcc REMOVE_JCC counts the number of branch instructions retired but removes taken and not taken conditional branches (JCC).  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0x2 extra: remove_rel_call REMOVE_REL_CALL counts the number of branch instructions retired but removes near relative CALL.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0x4 extra: remove_ind_call REMOVE_IND_CALL counts the number of branch instructions retired but removes near indirect CALL. Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0x8 extra: remove_ret REMOVE_RET counts the number of branch instructions retired but removes near RET.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0x10 extra: remove_ind_jmp REMOVE_IND_JMP counts the number of branch instructions retired but removes near indirect JMP.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0x20 extra: remove_rel_jmp REMOVE_REL_JMP counts the number of branch instructions retired but removes near relative JMP.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0x40 extra: remove_far REMOVE_FAR counts the number of branch instructions retired but removes all far branches.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0x80 extra: remove_not_taken_jcc REMOVE_NOT_TAKEN_JCC counts the number of branch instructions retired but removes taken conditional branches (JCC).  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+name:br_misp_retired type:exclusive default:0x1
+	0x1 extra: remove_jcc REMOVE_JCC counts the number of mispredicted branch instructions retired but removes taken and not taken conditional branches (JCC).  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
+	0x4 extra: remove_ind_call REMOVE_IND_CALL Counts the number of mispredicted branch instructions retired but removes near indirect CALL.  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
+	0x8 extra: remove_ret REMOVE_RET Counts the number of mispredicted branch instructions retired but removes near RET.  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
+	0x10 extra: remove_ind_jmp REMOVE_IND_JMP counts the number of mispredicted branch instructions retired but removes near indirect JMP.  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
+	0x80 extra: remove_not_taken_jcc REMOVE_NOT_TAKEN_JCC counts the number of mispredicted branch instructions retired but removes taken conditional branches (JCC).  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
+name:no_alloc_cycles type:exclusive default:0x3f
+	0x3f extra:inv all Counts the number of cycles that uops are allocated (inverse of NO_ALLOC_CYCLES.ALL)
+	0x2 extra: sd_buffer_full Counts the number of cycles when no uops are allocated and the store data buffer is full.
+	0x4 extra: mispredicts Counts the number of cycles when no uops are allocated and the alloc pipe is stalled waiting for a mispredicted jump to retire.  After the misprediction is detected, the front end will start immediately but the allocate pipe stalls until the mispredicted
+	0x8 extra: scoreboard Counts the number of cycles when no uops are allocated and a microcode IQ-based scoreboard stall is active. This includes stalls due to both the retirement scoreboard (at-ret) and micro-Jcc execution scoreboard (at-jeu).  Does not count cycles when the MS
+	0x10 extra: iq_empty Counts the number of cycles when no uops are allocated and the IQ is empty.  Will assert immediately after a mispredict and partially overlap with MISPREDICTS sub event.
+name:rs_full_stall type:exclusive default:0x2
+	0x2 extra: iec_port0 Counts the number of cycles the Alloc pipeline is stalled because IEC RS for port 0 is full.
+	0x4 extra: iec_port1 Counts the number of cycles the Alloc pipeline is stalled because IEC RS for port 1 is full.
+	0x8 extra: fpc_port0 Counts the number of cycles the Alloc pipeline is stalled because FPC RS for port 0 is full.
+	0x10 extra: fpc_port1 Counts the number of cycles the Alloc pipeline is stalled because FPC RS for port 1 is full.
+name:rs_dispatch_stall type:exclusive default:0x1
+	0x1 extra: iec0_rs *COUNTER BROKEN - NO FIX* Counts cycles when no uops were disptached from port 0 of IEC RS while the RS had valid ops left to dispatch
+	0x2 extra: iec1_rs *COUNTER BROKEN - NO FIX* Counts cycles when no uops were disptached from port 1 of IEC RS while the RS had valid ops left to dispatch
+	0x4 extra: fpc0_rs Counts cycles when no uops were disptached from port 0 of FPC RS while the RS had valid ops left to dispatch
+	0x8 extra: fpc1_rs Counts cycles when no uops were disptached from port 1 of FPC RS while the RS had valid ops left to dispatch
+	0x10 extra: mec_rs Counts cycles when no uops were dispatched from the MEC RS or rehab queue while valid ops were left to dispatch
+name:baclears type:exclusive default:0x2
+	0x2 extra: indirect Counts the number indirect branch baclears
+	0x4 extra: uncond Counts the number unconditional branch baclears
+	0x1e extra: no_corner_case sum of submasks [4:1].  Does not count special case baclears due to things like parity errors, bogus branches, and pd$ issues.
+name:decode_restriction type:exclusive default:0x1
+	0x1 extra: pdcache_wrong Counts the number of times a decode restriction reduced the decode throughput due to wrong instruction length prediction
+	0x2 extra: all_3cycle_resteers Counts the number of times a decode restriction reduced the decode throughput because of all 3 cycle resteer conditions.  Mainly PDCACHE_WRONG and MS_ENTRY cases.
diff --git a/libop/op_cpu_type.c b/libop/op_cpu_type.c
index badb7ba..4bb34b7 100644
--- a/libop/op_cpu_type.c
+++ b/libop/op_cpu_type.c
@@ -127,6 +127,7 @@ static struct cpu_descr const cpu_descrs[MAX_CPU_TYPE] = {
 	{ "AMD64 generic", "x86-64/generic", CPU_AMD64_GENERIC, 4 },
 	{ "IBM Power Architected Events V1", "ppc64/architected_events_v1", CPU_PPC64_ARCH_V1, 6 },
 	{ "ppc64 POWER8", "ppc64/power8", CPU_PPC64_POWER8, 6 },
+	{ "Intel Silvermont microarchitecture", "i386/silvermont", CPU_SILVERMONT, 2 },
 };
  
 static size_t const nr_cpu_descrs = sizeof(cpu_descrs) / sizeof(struct cpu_descr);
@@ -644,6 +645,7 @@ op_cpu op_cpu_base_type(op_cpu cpu_type)
 	case CPU_ATOM:
 	case CPU_NEHALEM:
 	case CPU_HASWELL:
+	case CPU_SILVERMONT:
 	case CPU_WESTMERE:
 	case CPU_SANDYBRIDGE:
 	case CPU_IVYBRIDGE:
diff --git a/libop/op_cpu_type.h b/libop/op_cpu_type.h
index 934fe9e..4703fa9 100644
--- a/libop/op_cpu_type.h
+++ b/libop/op_cpu_type.h
@@ -107,6 +107,7 @@ typedef enum {
 	CPU_AMD64_GENERIC, /**< AMD64 Generic */
 	CPU_PPC64_ARCH_V1, /** < IBM Power architected events version 1 */
 	CPU_PPC64_POWER8, /**< ppc64 POWER8 family */
+	CPU_SILVERMONT, /** < Intel Silvermont microarchitecture */
 	MAX_CPU_TYPE
 } op_cpu;
 
diff --git a/libop/op_events.c b/libop/op_events.c
index 9d2aa5e..39c710d 100644
--- a/libop/op_events.c
+++ b/libop/op_events.c
@@ -1201,6 +1201,7 @@ void op_default_event(op_cpu cpu_type, struct op_default_event_descr * descr)
  		case CPU_CORE_I7:
 		case CPU_NEHALEM:
 		case CPU_HASWELL:
+		case CPU_SILVERMONT:
 		case CPU_WESTMERE:
 		case CPU_SANDYBRIDGE:
 		case CPU_IVYBRIDGE:
diff --git a/libop/op_hw_specific.h b/libop/op_hw_specific.h
index 6ae19bc..e86dcae 100644
--- a/libop/op_hw_specific.h
+++ b/libop/op_hw_specific.h
@@ -150,6 +150,9 @@ static inline op_cpu op_cpu_specific_type(op_cpu cpu_type)
 		case 0x46:
 		case 0x47:
 			return CPU_HASWELL;
+		case 0x37:
+		case 0x4d:
+			return CPU_SILVERMONT;
 		}
 	}
 	return cpu_type;
diff --git a/utils/ophelp.c b/utils/ophelp.c
index 3b2896a..7543c6f 100644
--- a/utils/ophelp.c
+++ b/utils/ophelp.c
@@ -551,19 +551,20 @@ int main(int argc, char const * argv[])
 	case CPU_CORE_I7:
 	case CPU_NEHALEM:
 	case CPU_HASWELL:
+	case CPU_SILVERMONT:
 	case CPU_WESTMERE:
 	case CPU_SANDYBRIDGE:
 	case CPU_IVYBRIDGE:
 	case CPU_ATOM:
 		event_doc =
 			"See Intel Architecture Developer's Manual Volume 3B, Appendix A and\n"
-			"Intel Architecture Optimization Reference Manual (730795-001)\n\n";
+			"Intel Architecture Optimization Reference Manual\n\n";
 		break;
 
 	case CPU_ARCH_PERFMON:
 		event_doc =
 			"See Intel 64 and IA-32 Architectures Software Developer's Manual\n"
-			"Volume 3B (Document 253669) Chapter 18 for architectural perfmon events\n"
+			"Volume 3B Chapter 18 for architectural perfmon events\n"
 			"This is a limited set of fallback events because oprofile doesn't know your CPU\n";
 		break;
 	
commit 88779857662560604f85db608cf90f8609e1da6f
Author: Andi Kleen <ak@linux.intel.com>
Date:   Thu Sep 11 09:00:52 2014 -0500

    Update the Silvermont event files
    
    On further review the silvermont event files had a lot of problems.
    I regenerated them completely. This fixes the PEBS events, and
    fixes a range of others.
    
    The test suite passes without problems.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>

diff --git a/events/i386/silvermont/events b/events/i386/silvermont/events
index 077cc0a..434538f 100644
--- a/events/i386/silvermont/events
+++ b/events/i386/silvermont/events
@@ -7,20 +7,18 @@
 # lowered in many cases without ill effect.
 #
 include:i386/arch_perfmon
-event:0x32 counters:0,1 um:l2_prefetcher_throttle minimum:200003 name:l2_prefetcher_throttle :
-event:0x3e counters:0,1 um:one minimum:200003 name:l2_prefetcher_pref_stream_alloc :
-event:0x50 counters:0,1 um:zero minimum:200003 name:l2_prefetch_pend_streams_pref_stream_pend_set :
-event:0x86 counters:0,1 um:nip_stall minimum:200003 name:nip_stall :
-event:0x87 counters:0,1 um:decode_stall minimum:200003 name:decode_stall :
-event:0x96 counters:0,1 um:uip_match minimum:200003 name:uip_match :
+event:0x03 counters:0,1 um:rehabq minimum:200003 name:rehabq :
+event:0x04 counters:0,1 um:mem_uops_retired minimum:200003 name:mem_uops_retired :
+event:0x05 counters:0,1 um:page_walks minimum:200003 name:page_walks :
+event:0x30 counters:0,1 um:zero minimum:200003 name:l2_reject_xq_all :
+event:0x31 counters:0,1 um:zero minimum:200003 name:core_reject_l2q_all :
+event:0x80 counters:0,1 um:icache minimum:200003 name:icache :
 event:0xc2 counters:0,1 um:uops_retired minimum:2000003 name:uops_retired :
-event:0xc3 counters:0,1 um:x10 minimum:200003 name:machine_clears_live_lock_breaker :
-event:0xc4 counters:0,1 um:br_inst_retired minimum:2000003 name:br_inst_retired :
+event:0xc3 counters:0,1 um:machine_clears minimum:200003 name:machine_clears :
+event:0xc4 counters:0,1 um:br_inst_retired minimum:200003 name:br_inst_retired :
 event:0xc5 counters:0,1 um:br_misp_retired minimum:200003 name:br_misp_retired :
 event:0xca counters:0,1 um:no_alloc_cycles minimum:200003 name:no_alloc_cycles :
 event:0xcb counters:0,1 um:rs_full_stall minimum:200003 name:rs_full_stall :
-event:0xcc counters:0,1 um:rs_dispatch_stall minimum:200003 name:rs_dispatch_stall :
-event:0xe6 counters:0,1 um:baclears minimum:2000003 name:baclears :
-event:0xe7 counters:0,1 um:x02 minimum:200003 name:ms_decoded_early_exit :
-event:0xe8 counters:0,1 um:one minimum:200003 name:btclears_all :
-event:0xe9 counters:0,1 um:decode_restriction minimum:200003 name:decode_restriction :
+event:0xcd counters:0,1 um:one minimum:2000003 name:cycles_div_busy_all :
+event:0xe6 counters:0,1 um:baclears minimum:200003 name:baclears :
+event:0xe7 counters:0,1 um:one minimum:200003 name:ms_decoded_ms_entry :
diff --git a/events/i386/silvermont/unit_masks b/events/i386/silvermont/unit_masks
index 6309282..c0dac26 100644
--- a/events/i386/silvermont/unit_masks
+++ b/events/i386/silvermont/unit_masks
@@ -4,68 +4,86 @@
 # See http://ark.intel.com/ for help in identifying Silvermont based CPUs
 #
 include:i386/arch_perfmon
-name:x02 type:mandatory default:0x2
-	0x2 No unit mask
-name:x10 type:mandatory default:0x10
-	0x10 No unit mask
-name:l2_prefetcher_throttle type:exclusive default:0x2
-	0x2 extra:edge conservative Counts the number of cycles the L2 prefetcher spends in throttling mode
-	0x1 extra:edge aggressive Counts the number of cycles the L2 prefetcher spends in throttling mode
-name:nip_stall type:exclusive default:0x3f
-	0x3f extra: all Counts the number of cycles the NIP stalls.
-	0x1 extra: pfb_full Counts the number of cycles the NIP stalls and the PFBs are full.   This DOES NOT inlude PFB throttler cases.
-	0x2 extra: itlb_miss Counts the number of cycles the NIP stalls and there is an outstanding ITLB miss. This is a cummulative count of cycles the NIP stalled for all ITLB misses.
-	0x8 extra: pfb_throttler Counts the number of cycles the NIP stalls, the throttler is engaged, and the PFBs appear full.
-	0x10 extra: do_snoop Counts the number of cycles the NIP stalls because of a SMC compliance snoop to the MEC is required.
-	0x20 extra: misc_other Counts the number of cycles the NIP stalls due to NUKE, Stop Front End, Inserted flows.
-	0x1e extra: pfb_ready Counts the number of cycles the NIP stalls when the PFBs are not full and the decoders are able to process bytes.  Does not count PFB_FULL nor MISC_OTHER stall cycles.
-name:decode_stall type:exclusive default:0x1
-	0x1 extra: pfb_empty Counts the number of cycles decoder is stalled because the PFB is empty, this count is useful to see if the decoder is receiving the bytes from the front end. This event together with the DECODE_STALL.IQ_FULL may be used to narrow down on the bottleneck.
-	0x2 extra: iq_full Counts the number of cycles decoder is stalled because the IQ is full, this count is useful to see if the decoder is delivering the decoded uops. This event together with the DECODE_STALL.PFB_EMPTY may be used to narrow down on the bottleneck.
-name:uip_match type:exclusive default:0x1
-	0x1 extra: first_uip This event is used for counting the number of times a specific micro IP address was decoded
-	0x2 extra: second_uip This event is used for counting the number of times a specific micro IP address was decoded
-name:uops_retired type:exclusive default:0x2
-	0x2 extra: x87 This event counts the number of micro-ops retired that used X87 hardware.
-	0x4 extra: mul This event counts the number of micro-ops retired that used MUL hardware.
-	0x8 extra: div This event counts the number of micro-ops retired that used DIV hardware.
-	0x1 extra: ms_cyles Counts the number of uops that are from the complex flows issued by the micro-sequencer (MS).  This includes uops from flows due to faults, assists, and inserted flows.
-name:br_inst_retired type:exclusive default:0x1
-	0x1 extra: remove_jcc REMOVE_JCC counts the number of branch instructions retired but removes taken and not taken conditional branches (JCC).  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
-	0x2 extra: remove_rel_call REMOVE_REL_CALL counts the number of branch instructions retired but removes near relative CALL.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
-	0x4 extra: remove_ind_call REMOVE_IND_CALL counts the number of branch instructions retired but removes near indirect CALL. Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
-	0x8 extra: remove_ret REMOVE_RET counts the number of branch instructions retired but removes near RET.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
-	0x10 extra: remove_ind_jmp REMOVE_IND_JMP counts the number of branch instructions retired but removes near indirect JMP.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
-	0x20 extra: remove_rel_jmp REMOVE_REL_JMP counts the number of branch instructions retired but removes near relative JMP.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
-	0x40 extra: remove_far REMOVE_FAR counts the number of branch instructions retired but removes all far branches.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
-	0x80 extra: remove_not_taken_jcc REMOVE_NOT_TAKEN_JCC counts the number of branch instructions retired but removes taken conditional branches (JCC).  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
-name:br_misp_retired type:exclusive default:0x1
-	0x1 extra: remove_jcc REMOVE_JCC counts the number of mispredicted branch instructions retired but removes taken and not taken conditional branches (JCC).  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
-	0x4 extra: remove_ind_call REMOVE_IND_CALL Counts the number of mispredicted branch instructions retired but removes near indirect CALL.  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
-	0x8 extra: remove_ret REMOVE_RET Counts the number of mispredicted branch instructions retired but removes near RET.  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
-	0x10 extra: remove_ind_jmp REMOVE_IND_JMP counts the number of mispredicted branch instructions retired but removes near indirect JMP.  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
-	0x80 extra: remove_not_taken_jcc REMOVE_NOT_TAKEN_JCC counts the number of mispredicted branch instructions retired but removes taken conditional branches (JCC).  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
+name:rehabq type:exclusive default:0x1
+	0x1 extra: ld_block_st_forward This event counts the number of retired loads that were prohibited from receiving forwarded data from the store because of address mismatch.
+	0x1 extra:pebs ld_block_st_forward_pebs This event counts the number of retired loads that were prohibited from receiving forwarded data from the store because of address mismatch.
+	0x2 extra: ld_block_std_notready This event counts the cases where a forward was technically possible, but did not occur because the store data was not available at the right time
+	0x4 extra: st_splits This event counts the number of retire stores that experienced cache line boundary splits
+	0x8 extra: ld_splits This event counts the number of retire loads that experienced cache line boundary splits
+	0x8 extra:pebs ld_splits_pebs This event counts the number of retire loads that experienced cache line boundary splits
+	0x10 extra: lock This event counts the number of retired memory operations with lock semantics. These are either implicit locked instructions such as the XCHG instruction or instructions with an explicit LOCK prefix (0xF0).
+	0x20 extra: sta_full This event counts the number of retired stores that are delayed because there is not a store address buffer available.
+	0x40 extra: any_ld This event counts the number of load uops reissued from Rehabq
+	0x80 extra: any_st This event counts the number of store uops reissued from Rehabq
+name:mem_uops_retired type:exclusive default:0x1
+	0x1 extra: l1_miss_loads This event counts the number of load ops retired that miss in L1 Data cache. Note that prefetch misses will not be counted.
+	0x2 extra: l2_hit_loads This event counts the number of load ops retired that hit in the L2
+	0x2 extra:pebs l2_hit_loads_pebs This event counts the number of load ops retired that hit in the L2
+	0x4 extra: l2_miss_loads This event counts the number of load ops retired that miss in the L2
+	0x4 extra:pebs l2_miss_loads_pebs This event counts the number of load ops retired that miss in the L2
+	0x8 extra: dtlb_miss_loads This event counts the number of load ops retired that had DTLB miss.
+	0x8 extra:pebs dtlb_miss_loads_pebs This event counts the number of load ops retired that had DTLB miss.
+	0x10 extra: utlb_miss This event counts the number of load ops retired that had UTLB miss.
+	0x20 extra: hitm This event counts the number of load ops retired that got data from the other core or from the other module.
+	0x20 extra:pebs hitm_pebs This event counts the number of load ops retired that got data from the other core or from the other module.
+	0x40 extra: all_loads This event counts the number of load ops retired
+	0x80 extra: all_stores This event counts the number of store ops retired
+name:page_walks type:exclusive default:0x1
+	0x1 extra:edge d_side_walks This event counts when a data (D) page walk is completed or started.  Since a page walk implies a TLB miss, the number of TLB misses can be counted by counting the number of pagewalks.
+	0x1 extra: d_side_cycles This event counts every cycle when a D-side (walks due to a load) page walk is in progress. Page walk duration divided by number of page walks is the average duration of page-walks.
+	0x2 extra:edge i_side_walks This event counts when an instruction (I) page walk is completed or started.  Since a page walk implies a TLB miss, the number of TLB misses can be counted by counting the number of pagewalks.
+	0x2 extra: i_side_cycles This event counts every cycle when a I-side (walks due to an instruction fetch) page walk is in progress. Page walk duration divided by number of page walks is the average duration of page-walks.
+	0x3 extra:edge walks This event counts when a data (D) page walk or an instruction (I) page walk is completed or started.  Since a page walk implies a TLB miss, the number of TLB misses can be counted by counting the number of pagewalks.
+	0x3 extra: cycles This event counts every cycle when a data (D) page walk or instruction (I) page walk is in progress.  Since a pagewalk implies a TLB miss, the approximate cost of a TLB miss can be determined from this event.
+name:icache type:exclusive default:0x3
+	0x3 extra: accesses This event counts all instruction fetches, including uncacheable fetches.
+	0x1 extra: hit This event counts all instruction fetches from the instruction cache.
+	0x2 extra: misses This event counts all instruction fetches that miss the Instruction cache or produce memory requests. This includes uncacheable fetches. An instruction fetch miss is counted only once and not once for every cycle it is outstanding.
+name:uops_retired type:exclusive default:0x10
+	0x10 extra: all This event counts the number of micro-ops retired. The processor decodes complex macro instructions into a sequence of simpler micro-ops. Most instructions are composed of one or two micro-ops. Some instructions are decoded into longer sequences such as repeat instructions, floating point transcendental instructions, and assists. In some cases micro-op sequences are fused or whole instructions are fused into one micro-op. See other UOPS_RETIRED events for differentiating retired fused and non-fused micro-ops.
+	0x1 extra: ms This event counts the number of micro-ops retired that were supplied from MSROM.
+name:machine_clears type:exclusive default:0x8
+	0x8 extra: all Machine clears happen when something happens in the machine that causes the hardware to need to take special care to get the right answer. When such a condition is signaled on an instruction, the front end of the machine is notified that it must restart, so no more instructions will be decoded from the current path.  All instructions "older" than this one will be allowed to finish.  This instruction and all "younger" instructions must be cleared, since they must not be allowed to complete.  Essentially, the hardware waits until the problematic instruction is the oldest instruction in the machine.  This means all older instructions are retired, and all pending stores (from older instructions) are completed.  Then the new path of instructions from the front end are allowed to start into the machine.  There are many conditions that might cause a machine clear (including the receipt of an interrupt, or a trap or a fault).  All those conditions (including but not limited to MACHINE_CLEARS.MEMORY_ORDERING, MACHINE_CLEARS.SMC, and MACHINE_CLEARS.FP_ASSIST) are captured in the ANY event. In addition, some conditions can be specifically counted (i.e. SMC, MEMORY_ORDERING, FP_ASSIST).  However, the sum of SMC, MEMORY_ORDERING, and FP_ASSIST machine clears will not necessarily equal the number of ANY.
+	0x1 extra: smc This event counts the number of times that a program writes to a code section. Self-modifying code causes a severe penalty in all Intel? architecture processors.
+	0x2 extra: memory_ordering This event counts the number of times that pipeline was cleared due to memory ordering issues.
+	0x4 extra: fp_assist This event counts the number of times that pipeline stalled due to FP operations needing assists.
+name:br_inst_retired type:exclusive default:0x7e
+	0x7e extra: jcc JCC counts the number of conditional branch (JCC) instructions retired. Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0x7e extra:pebs jcc_pebs JCC counts the number of conditional branch (JCC) instructions retired. Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xfe extra: taken_jcc TAKEN_JCC counts the number of taken conditional branch (JCC) instructions retired. Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xfe extra:pebs taken_jcc_pebs TAKEN_JCC counts the number of taken conditional branch (JCC) instructions retired. Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xf9 extra: call CALL counts the number of near CALL branch instructions retired.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xf9 extra:pebs call_pebs CALL counts the number of near CALL branch instructions retired.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xfd extra: rel_call REL_CALL counts the number of near relative CALL branch instructions retired.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xfd extra:pebs rel_call_pebs REL_CALL counts the number of near relative CALL branch instructions retired.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xfb extra: ind_call IND_CALL counts the number of near indirect CALL branch instructions retired.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xfb extra:pebs ind_call_pebs IND_CALL counts the number of near indirect CALL branch instructions retired.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xf7 extra: return RETURN counts the number of near RET branch instructions retired.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xf7 extra:pebs return_pebs RETURN counts the number of near RET branch instructions retired.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xeb extra: non_return_ind NON_RETURN_IND counts the number of near indirect JMP and near indirect CALL branch instructions retired.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xeb extra:pebs non_return_ind_pebs NON_RETURN_IND counts the number of near indirect JMP and near indirect CALL branch instructions retired.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xbf extra: far_branch FAR counts the number of far branch instructions retired.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+	0xbf extra:pebs far_branch_pebs FAR counts the number of far branch instructions retired.  Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types: conditional branches, direct calls and jumps, indirect calls and jumps, returns.
+name:br_misp_retired type:exclusive default:0x7e
+	0x7e extra: jcc JCC counts the number of mispredicted conditional branches (JCC) instructions retired.  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
+	0x7e extra:pebs jcc_pebs JCC counts the number of mispredicted conditional branches (JCC) instructions retired.  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
+	0xfe extra: taken_jcc TAKEN_JCC counts the number of mispredicted taken conditional branch (JCC) instructions retired.  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
+	0xfe extra:pebs taken_jcc_pebs TAKEN_JCC counts the number of mispredicted taken conditional branch (JCC) instructions retired.  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
+	0xfb extra: ind_call IND_CALL counts the number of mispredicted near indirect CALL branch instructions retired.  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
+	0xfb extra:pebs ind_call_pebs IND_CALL counts the number of mispredicted near indirect CALL branch instructions retired.  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
+	0xf7 extra: return RETURN counts the number of mispredicted near RET branch instructions retired.  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
+	0xf7 extra:pebs return_pebs RETURN counts the number of mispredicted near RET branch instructions retired.  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
+	0xeb extra: non_return_ind NON_RETURN_IND counts the number of mispredicted near indirect JMP and near indirect CALL branch instructions retired.  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
+	0xeb extra:pebs non_return_ind_pebs NON_RETURN_IND counts the number of mispredicted near indirect JMP and near indirect CALL branch instructions retired.  This event counts the number of retired branch instructions that were mispredicted by the processor, categorized by type. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa.  When the misprediction is discovered, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.
 name:no_alloc_cycles type:exclusive default:0x3f
-	0x3f extra:inv all Counts the number of cycles that uops are allocated (inverse of NO_ALLOC_CYCLES.ALL)
-	0x2 extra: sd_buffer_full Counts the number of cycles when no uops are allocated and the store data buffer is full.
-	0x4 extra: mispredicts Counts the number of cycles when no uops are allocated and the alloc pipe is stalled waiting for a mispredicted jump to retire.  After the misprediction is detected, the front end will start immediately but the allocate pipe stalls until the mispredicted
-	0x8 extra: scoreboard Counts the number of cycles when no uops are allocated and a microcode IQ-based scoreboard stall is active. This includes stalls due to both the retirement scoreboard (at-ret) and micro-Jcc execution scoreboard (at-jeu).  Does not count cycles when the MS
-	0x10 extra: iq_empty Counts the number of cycles when no uops are allocated and the IQ is empty.  Will assert immediately after a mispredict and partially overlap with MISPREDICTS sub event.
-name:rs_full_stall type:exclusive default:0x2
-	0x2 extra: iec_port0 Counts the number of cycles the Alloc pipeline is stalled because IEC RS for port 0 is full.
-	0x4 extra: iec_port1 Counts the number of cycles the Alloc pipeline is stalled because IEC RS for port 1 is full.
-	0x8 extra: fpc_port0 Counts the number of cycles the Alloc pipeline is stalled because FPC RS for port 0 is full.
-	0x10 extra: fpc_port1 Counts the number of cycles the Alloc pipeline is stalled because FPC RS for port 1 is full.
-name:rs_dispatch_stall type:exclusive default:0x1
-	0x1 extra: iec0_rs *COUNTER BROKEN - NO FIX* Counts cycles when no uops were disptached from port 0 of IEC RS while the RS had valid ops left to dispatch
-	0x2 extra: iec1_rs *COUNTER BROKEN - NO FIX* Counts cycles when no uops were disptached from port 1 of IEC RS while the RS had valid ops left to dispatch
-	0x4 extra: fpc0_rs Counts cycles when no uops were disptached from port 0 of FPC RS while the RS had valid ops left to dispatch
-	0x8 extra: fpc1_rs Counts cycles when no uops were disptached from port 1 of FPC RS while the RS had valid ops left to dispatch
-	0x10 extra: mec_rs Counts cycles when no uops were dispatched from the MEC RS or rehab queue while valid ops were left to dispatch
-name:baclears type:exclusive default:0x2
-	0x2 extra: indirect Counts the number indirect branch baclears
-	0x4 extra: uncond Counts the number unconditional branch baclears
-	0x1e extra: no_corner_case sum of submasks [4:1].  Does not count special case baclears due to things like parity errors, bogus branches, and pd$ issues.
-name:decode_restriction type:exclusive default:0x1
-	0x1 extra: pdcache_wrong Counts the number of times a decode restriction reduced the decode throughput due to wrong instruction length prediction
-	0x2 extra: all_3cycle_resteers Counts the number of times a decode restriction reduced the decode throughput because of all 3 cycle resteer conditions.  Mainly PDCACHE_WRONG and MS_ENTRY cases.
+	0x3f extra: all The NO_ALLOC_CYCLES.ALL event counts the number of cycles when the front-end does not provide any instructions to be allocated for any reason. This event indicates the cycles where an allocation stalls occurs, and no UOPS are allocated in that cycle.
+	0x1 extra: rob_full Counts the number of cycles when no uops are allocated and the ROB is full (less than 2 entries available)
+	0x20 extra: rat_stall Counts the number of cycles when no uops are allocated and a RATstall is asserted.
+	0x50 extra: not_delivered The NO_ALLOC_CYCLES.NOT_DELIVERED event is used to measure front-end inefficiencies, i.e. when front-end of the machine is not delivering micro-ops to the back-end and the back-end is not stalled. This event can be used to identify if the machine is truly front-end bound.  When this event occurs, it is an indication that the front-end of the machine is operating at less than its theoretical peak performance.  Background: We can think of the processor pipeline as being divided into 2 broader parts: Front-end and Back-end. Front-end is responsible for fetching the instruction, decoding into micro-ops (uops) in machine understandable format and putting them into a micro-op queue to be consumed by back end. The back-end then takes these micro-ops, allocates the required resources.  When all resources are ready, micro-ops are executed. If the back-end is not ready to accept micro-ops from the front-end, then we do not want to count these as front-end bottlenecks.  However, whenever we have bottlenecks in the back-end, we will have allocation unit stalls and eventually forcing the front-end to wait until the back-end is ready to receive more UOPS. This event counts the cycles only when back-end is requesting more uops and front-end is not able to provide them. Some examples of conditions that cause front-end efficiencies are: Icache misses, ITLB misses, and decoder restrictions that limit the the front-end bandwidth.
+name:rs_full_stall type:exclusive default:0x1f
+	0x1f extra: all Counts the number of cycles the Alloc pipeline is stalled when any one of the RSs (IEC, FPC and MEC) is full. This event is a superset of all the individual RS stall event counts.
+	0x1 extra: mec Counts the number of cycles and allocation pipeline is stalled and is waiting for a free MEC reservation station entry.  The cycles should be appropriately counted in case of the cracked ops e.g. In case of a cracked load-op, the load portion is sent to M
+name:baclears type:exclusive default:0x1
+	0x1 extra: all The BACLEARS event counts the number of times the front end is resteered, mainly when the Branch Prediction Unit cannot provide a correct prediction and this is corrected by the Branch Address Calculator at the front end.  The BACLEARS.ANY event counts the number of baclears for any type of branch.
+	0x8 extra: return The BACLEARS event counts the number of times the front end is resteered, mainly when the Branch Prediction Unit cannot provide a correct prediction and this is corrected by the Branch Address Calculator at the front end.  The BACLEARS.RETURN event counts the number of RETURN baclears.
+	0x10 extra: cond The BACLEARS event counts the number of times the front end is resteered, mainly when the Branch Prediction Unit cannot provide a correct prediction and this is corrected by the Branch Address Calculator at the front end.  The BACLEARS.COND event counts the number of JCC (Jump on Condtional Code) baclears.