commit 5f11ddb982931f754d3319a64313cf880424ea73
Author: Andi Kleen <ak@linux.intel.com>
Date: Thu Jul 17 16:23:38 2014 -0500
Update the Haswell events to the latest version
Some minor changes to the previous version, but it should be more
consistent with other tools now.
The event name descriptions have been dropped. They were never all that
useful anyways because the event is defined by the unit masks.
Now all events with more than one unit mask only have a description
in the unit masks.
As a new feature any known Errata to the event are referenced.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
diff --git a/events/i386/haswell/events b/events/i386/haswell/events
index 51fcd50..5aa5eb5 100644
--- a/events/i386/haswell/events
+++ b/events/i386/haswell/events
@@ -7,54 +7,58 @@
# lowered in many cases without ill effect.
#
include:i386/arch_perfmon
-event:0x03 counters:cpuid um:x02 minimum:100003 name:ld_blocks_store_forward : Cases when loads get true Block-on-Store blocking code preventing store forwarding
-event:0x05 counters:cpuid um:misalign_mem_ref minimum:2000003 name:misalign_mem_ref : misalign_mem_ref
-event:0x07 counters:cpuid um:one minimum:100003 name:ld_blocks_partial_address_alias : False dependencies in MOB due to partial address comparison
-event:0x08 counters:cpuid um:dtlb_load_misses minimum:2000003 name:dtlb_load_misses : dtlb_load_misses
-event:0x0d counters:cpuid um:x03 minimum:2000003 name:int_misc_recovery_cycles : Number of cycles waiting for the checkpoints in Resource Allocation Table (RAT) to be recovered after Nuke due to all other cases except JEClear (e.g. whenever a ucode assist is needed like SSE exception, memory disambiguation, etc...)
-event:0x0e counters:cpuid um:uops_issued minimum:2000003 name:uops_issued : uops_issued
-event:0x24 counters:cpuid um:l2_rqsts minimum:200003 name:l2_rqsts : l2_rqsts
-event:0x27 counters:cpuid um:x50 minimum:200003 name:l2_demand_rqsts_wb_hit : Not rejected writebacks that hit L2 cache
-event:0x48 counters:2 um:l1d_pend_miss minimum:2000003 name:l1d_pend_miss : l1d_pend_miss
-event:0x49 counters:cpuid um:dtlb_store_misses minimum:100003 name:dtlb_store_misses : dtlb_store_misses
-event:0x4c counters:cpuid um:load_hit_pre minimum:100003 name:load_hit_pre : load_hit_pre
-event:0x51 counters:cpuid um:one minimum:2000003 name:l1d_replacement : L1D data line replacements
-event:0x54 counters:cpuid um:tx_mem minimum:2000003 name:tx_mem : tx_mem
-event:0x58 counters:cpuid um:move_elimination minimum:1000003 name:move_elimination : move_elimination
-event:0x5c counters:cpuid um:cpl_cycles minimum:2000003 name:cpl_cycles : cpl_cycles
-event:0x5d counters:cpuid um:tx_exec minimum:2000003 name:tx_exec : tx_exec
-event:0x5e counters:cpuid um:one minimum:2000003 name:rs_events_empty_cycles : Cycles when Reservation Station (RS) is empty for the thread
-event:0x63 counters:cpuid um:lock_cycles minimum:2000003 name:lock_cycles : lock_cycles
-event:0x79 counters:0,1,2,3 um:idq minimum:2000003 name:idq : idq
-event:0x80 counters:cpuid um:x02 minimum:200003 name:icache_misses : Number of Instruction Cache, Streaming Buffer and Victim Cache Misses. Includes Uncacheable accesses.
-event:0x85 counters:cpuid um:itlb_misses minimum:100003 name:itlb_misses : itlb_misses
-event:0x87 counters:cpuid um:ild_stall minimum:2000003 name:ild_stall : ild_stall
-event:0x88 counters:cpuid um:br_inst_exec minimum:200003 name:br_inst_exec : br_inst_exec
-event:0x89 counters:cpuid um:br_misp_exec minimum:200003 name:br_misp_exec : br_misp_exec
-event:0x9c counters:0,1,2,3 um:idq_uops_not_delivered minimum:2000003 name:idq_uops_not_delivered : idq_uops_not_delivered
-event:0xa1 counters:cpuid um:uops_executed_port minimum:2000003 name:uops_executed_port : uops_executed_port
-event:0xa2 counters:cpuid um:resource_stalls minimum:2000003 name:resource_stalls : resource_stalls
-event:0xa3 counters:2 um:cycle_activity minimum:2000003 name:cycle_activity : cycle_activity
-event:0xae counters:cpuid um:one minimum:100007 name:itlb_itlb_flush : Flushing of the Instruction TLB (ITLB) pages, includes 4k/2M/4M pages.
-event:0xb0 counters:cpuid um:offcore_requests minimum:100003 name:offcore_requests : offcore_requests
-event:0xb1 counters:cpuid um:uops_executed minimum:2000003 name:uops_executed : uops_executed
-event:0xbc counters:0,1,2,3 um:page_walker_loads minimum:2000003 name:page_walker_loads : page_walker_loads
-event:0xbd counters:cpuid um:tlb_flush minimum:100007 name:tlb_flush : tlb_flush
-event:0xc0 counters:1 um:one minimum:2000003 name:inst_retired_prec_dist : Precise instruction retired event with HW to reduce effect of PEBS shadow in IP distribution
-event:0xc1 counters:cpuid um:other_assists minimum:100003 name:other_assists : other_assists
-event:0xc2 counters:cpuid um:uops_retired minimum:2000003 name:uops_retired : uops_retired
-event:0xc3 counters:cpuid um:machine_clears minimum:100003 name:machine_clears : machine_clears
-event:0xc4 counters:cpuid um:br_inst_retired minimum:400009 name:br_inst_retired : br_inst_retired
-event:0xc5 counters:cpuid um:br_misp_retired minimum:400009 name:br_misp_retired : br_misp_retired
-event:0xc8 counters:cpuid um:hle_retired minimum:2000003 name:hle_retired : hle_retired
-event:0xc9 counters:cpuid um:rtm_retired minimum:2000003 name:rtm_retired : rtm_retired
-event:0xca counters:cpuid um:fp_assist minimum:100003 name:fp_assist : fp_assist
-event:0xcc counters:cpuid um:x20 minimum:2000003 name:rob_misc_events_lbr_inserts : Count cases of saving new LBR
-event:0xd0 counters:0,1,2,3 um:mem_uops_retired minimum:2000003 name:mem_uops_retired : mem_uops_retired
-event:0xd1 counters:0,1,2,3 um:mem_load_uops_retired minimum:2000003 name:mem_load_uops_retired : mem_load_uops_retired
-event:0xd2 counters:0,1,2,3 um:mem_load_uops_l3_hit_retired minimum:100003 name:mem_load_uops_l3_hit_retired : mem_load_uops_l3_hit_retired
-event:0xd3 counters:0,1,2,3 um:one minimum:100007 name:mem_load_uops_l3_miss_retired_local_dram : Data from local DRAM either Snoop not needed or Snoop Miss (RspI)
-event:0xe6 counters:cpuid um:x1f minimum:100003 name:baclears_any : Counts the total number when the front end is resteered, mainly when the BPU cannot provide a correct prediction and this is corrected by other branch handling mechanisms at the front end.
-event:0xf0 counters:cpuid um:l2_trans minimum:200003 name:l2_trans : l2_trans
-event:0xf1 counters:cpuid um:l2_lines_in minimum:100003 name:l2_lines_in : l2_lines_in
-event:0xf2 counters:cpuid um:l2_lines_out minimum:100003 name:l2_lines_out : l2_lines_out
+event:0x03 counters:cpuid um:ld_blocks minimum:100003 name:ld_blocks :
+event:0x05 counters:cpuid um:misalign_mem_ref minimum:2000003 name:misalign_mem_ref :
+event:0x07 counters:cpuid um:one minimum:100003 name:ld_blocks_partial_address_alias :
+event:0x08 counters:cpuid um:dtlb_load_misses minimum:2000003 name:dtlb_load_misses :
+event:0x0d counters:cpuid um:x03 minimum:2000003 name:int_misc_recovery_cycles :
+event:0x0e counters:cpuid um:uops_issued minimum:2000003 name:uops_issued :
+event:0x24 counters:cpuid um:l2_rqsts minimum:200003 name:l2_rqsts :
+event:0x27 counters:cpuid um:x50 minimum:200003 name:l2_demand_rqsts_wb_hit :
+event:0x48 counters:2 um:l1d_pend_miss minimum:2000003 name:l1d_pend_miss :
+event:0x49 counters:cpuid um:dtlb_store_misses minimum:100003 name:dtlb_store_misses :
+event:0x4c counters:cpuid um:load_hit_pre minimum:100003 name:load_hit_pre :
+event:0x4f counters:cpuid um:x10 minimum:2000003 name:ept_walk_cycles :
+event:0x51 counters:cpuid um:one minimum:2000003 name:l1d_replacement :
+event:0x54 counters:cpuid um:tx_mem minimum:2000003 name:tx_mem :
+event:0x58 counters:cpuid um:move_elimination minimum:1000003 name:move_elimination :
+event:0x5c counters:cpuid um:cpl_cycles minimum:2000003 name:cpl_cycles :
+event:0x5d counters:cpuid um:tx_exec minimum:2000003 name:tx_exec :
+event:0x5e counters:cpuid um:rs_events minimum:2000003 name:rs_events :
+event:0x60 counters:cpuid um:offcore_requests_outstanding minimum:2000003 name:offcore_requests_outstanding :
+event:0x63 counters:cpuid um:lock_cycles minimum:2000003 name:lock_cycles :
+event:0x79 counters:0,1,2,3 um:idq minimum:2000003 name:idq :
+event:0x80 counters:cpuid um:icache minimum:2000003 name:icache :
+event:0x85 counters:cpuid um:itlb_misses minimum:100003 name:itlb_misses :
+event:0x87 counters:cpuid um:ild_stall minimum:2000003 name:ild_stall :
+event:0x88 counters:cpuid um:br_inst_exec minimum:200003 name:br_inst_exec :
+event:0x89 counters:cpuid um:br_misp_exec minimum:200003 name:br_misp_exec :
+event:0x9c counters:0,1,2,3 um:idq_uops_not_delivered minimum:2000003 name:idq_uops_not_delivered :
+event:0xa1 counters:cpuid um:uops_executed_port minimum:2000003 name:uops_executed_port :
+event:0xa2 counters:cpuid um:resource_stalls minimum:2000003 name:resource_stalls :
+event:0xa3 counters:2 um:cycle_activity minimum:2000003 name:cycle_activity :
+event:0xa8 counters:cpuid um:one minimum:2000003 name:lsd_uops :
+event:0xab counters:cpuid um:x02 minimum:2000003 name:dsb2mite_switches_penalty_cycles :
+event:0xae counters:cpuid um:one minimum:100007 name:itlb_itlb_flush :
+event:0xb0 counters:cpuid um:offcore_requests minimum:100003 name:offcore_requests :
+event:0xb1 counters:cpuid um:uops_executed minimum:2000003 name:uops_executed :
+event:0xbc counters:0,1,2,3 um:page_walker_loads minimum:2000003 name:page_walker_loads :
+event:0xbd counters:cpuid um:tlb_flush minimum:100007 name:tlb_flush :
+event:0xc0 counters:1 um:one minimum:2000003 name:inst_retired_prec_dist :
+event:0xc1 counters:cpuid um:other_assists minimum:100003 name:other_assists :
+event:0xc2 counters:cpuid um:uops_retired minimum:2000003 name:uops_retired :
+event:0xc3 counters:cpuid um:machine_clears minimum:2000003 name:machine_clears :
+event:0xc4 counters:cpuid um:br_inst_retired minimum:400009 name:br_inst_retired :
+event:0xc5 counters:cpuid um:br_misp_retired minimum:400009 name:br_misp_retired :
+event:0xc8 counters:cpuid um:hle_retired minimum:2000003 name:hle_retired :
+event:0xc9 counters:0,1,2,3 um:rtm_retired minimum:2000003 name:rtm_retired :
+event:0xca counters:cpuid um:fp_assist minimum:100003 name:fp_assist :
+event:0xcc counters:cpuid um:x20 minimum:2000003 name:rob_misc_events_lbr_inserts :
+event:0xd0 counters:0,1,2,3 um:mem_uops_retired minimum:2000003 name:mem_uops_retired :
+event:0xd1 counters:0,1,2,3 um:mem_load_uops_retired minimum:2000003 name:mem_load_uops_retired :
+event:0xd2 counters:0,1,2,3 um:mem_load_uops_l3_hit_retired minimum:100003 name:mem_load_uops_l3_hit_retired :
+event:0xd3 counters:0,1,2,3 um:mem_load_uops_l3_miss_retired minimum:100007 name:mem_load_uops_l3_miss_retired :
+event:0xe6 counters:cpuid um:x1f minimum:100003 name:baclears_any :
+event:0xf0 counters:cpuid um:l2_trans minimum:200003 name:l2_trans :
+event:0xf1 counters:cpuid um:l2_lines_in minimum:100003 name:l2_lines_in :
+event:0xf2 counters:cpuid um:l2_lines_out minimum:100003 name:l2_lines_out :
diff --git a/events/i386/haswell/unit_masks b/events/i386/haswell/unit_masks
index 32e1c1e..60c2a61 100644
--- a/events/i386/haswell/unit_masks
+++ b/events/i386/haswell/unit_masks
@@ -8,27 +8,32 @@ name:x02 type:mandatory default:0x2
0x2 No unit mask
name:x03 type:mandatory default:0x3
0x3 No unit mask
+name:x10 type:mandatory default:0x10
+ 0x10 No unit mask
name:x1f type:mandatory default:0x1f
0x1f No unit mask
name:x20 type:mandatory default:0x20
0x20 No unit mask
name:x50 type:mandatory default:0x50
0x50 No unit mask
+name:ld_blocks type:exclusive default:0x2
+ 0x2 extra: store_forward This event counts loads that followed a store to the same address, where the data could not be forwarded inside the pipeline from the store to the load. The most common reason why store forwarding would be blocked is when a load's address range overlaps with a preceding smaller uncompleted store. The penalty for blocked store forwarding is that the load must wait for the store to write its value to the cache before it can be issued.
+ 0x8 extra: no_sr The number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use
name:misalign_mem_ref type:exclusive default:0x1
0x1 extra: loads Speculative cache line split load uops dispatched to L1 cache
0x2 extra: stores Speculative cache line split STA uops dispatched to L1 cache
name:dtlb_load_misses type:exclusive default:0x1
0x1 extra: miss_causes_a_walk Load misses in all DTLB levels that cause page walks
- 0xe extra: walk_completed Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes of any page size.
0x2 extra: walk_completed_4k Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes (4K).
0x4 extra: walk_completed_2m_4m Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes (2M/4M).
- 0x10 extra: walk_duration Cycles when PMH is busy with page walks
- 0x60 extra: stlb_hit Load operations that miss the first DTLB level but hit the second and do not cause page walks
- 0x20 extra: stlb_hit_4k Load misses that miss the DTLB and hit the STLB (4K)
- 0x40 extra: stlb_hit_2m Load misses that miss the DTLB and hit the STLB (2M)
+ 0x10 extra: walk_duration This event counts cycles when the page miss handler (PMH) is servicing page walks caused by DTLB load misses.
+ 0x20 extra: stlb_hit_4k This event counts load operations from a 4K page that miss the first DTLB level but hit the second and do not cause page walks.
+ 0x40 extra: stlb_hit_2m This event counts load operations from a 2M page that miss the first DTLB level but hit the second and do not cause page walks.
0x80 extra: pde_cache_miss DTLB demand load misses with low part of linear-to-physical address translation missed
-name:uops_issued type:exclusive default:any
- 0x1 extra: any Uops that Resource Allocation Table (RAT) issues to Reservation Station (RS)
+ 0xe extra: walk_completed Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes of any page size.
+ 0x60 extra: stlb_hit Load operations that miss the first DTLB level but hit the second and do not cause page walks
+name:uops_issued type:exclusive default:0x1
+ 0x1 extra: any This event counts the number of uops issued by the Front-end of the pipeline to the Back-end. This event is counted at the allocation stage and will count both retired and non-retired uops.
0x10 extra: flags_merge Number of flags-merge uops being allocated. Such uops considered perf sensitive; added by GSR u-arch.
0x20 extra: slow_lea Number of slow LEA uops being allocated. A uop is generally considered SlowLea if it has 3 sources (e.g. 2 sources + immediate) regardless if as a result of LEA instruction or not.
0x40 extra: single_mul Number of Multiply packed/scalar single precision uops allocated
@@ -47,49 +52,59 @@ name:l2_rqsts type:exclusive default:0x21
0x22 extra: rfo_miss RFO requests that miss L2 cache
0x44 extra: code_rd_hit L2 cache hits when fetching instructions, code reads.
0x24 extra: code_rd_miss L2 cache misses when fetching instructions
- 0x27 extra: all_demand_miss Demand requests that miss L2 cache
- 0xe7 extra: all_demand_references Demand requests to L2 cache
- 0x3f extra: miss All requests that miss L2 cache
- 0xff extra: references All L2 requests
-name:l1d_pend_miss type:exclusive default:pending
+ 0x27 extra: all_demand_miss Demand requests that miss L2 cache
+ 0xe7 extra: all_demand_references Demand requests to L2 cache
+ 0x3f extra: miss All requests that miss L2 cache
+ 0xff extra: references All L2 requests
+name:l1d_pend_miss type:exclusive default:0x1
0x1 extra: pending L1D miss oustandings duration in cycles
0x1 extra:cmask=1 pending_cycles Cycles with L1D load Misses outstanding.
- 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding, using an edge detect to count transitions.
name:dtlb_store_misses type:exclusive default:0x1
0x1 extra: miss_causes_a_walk Store misses in all DTLB levels that cause page walks
- 0xe extra: walk_completed Store misses in all DTLB levels that cause completed page walks
- 0x2 extra: walk_completed_4k Store miss in all TLB levels causes a page walk that completes. (4K)
+ 0x2 extra: walk_completed_4k Store miss in all TLB levels causes a page walk that completes. (4K)
0x4 extra: walk_completed_2m_4m Store misses in all DTLB levels that cause completed page walks (2M/4M)
- 0x10 extra: walk_duration Cycles when PMH is busy with page walks
- 0x60 extra: stlb_hit Store operations that miss the first TLB level but hit the second and do not cause page walks
- 0x20 extra: stlb_hit_4k Store misses that miss the DTLB and hit the STLB (4K)
- 0x40 extra: stlb_hit_2m Store misses that miss the DTLB and hit the STLB (2M)
+ 0x10 extra: walk_duration This event counts cycles when the page miss handler (PMH) is servicing page walks caused by DTLB store misses.
+ 0x20 extra: stlb_hit_4k This event counts store operations from a 4K page that miss the first DTLB level but hit the second and do not cause page walks.
+ 0x40 extra: stlb_hit_2m This event counts store operations from a 2M page that miss the first DTLB level but hit the second and do not cause page walks.
0x80 extra: pde_cache_miss DTLB store misses with low part of linear-to-physical address translation missed
+ 0xe extra: walk_completed Store misses in all DTLB levels that cause completed page walks
+ 0x60 extra: stlb_hit Store operations that miss the first TLB level but hit the second and do not cause page walks
name:load_hit_pre type:exclusive default:0x1
0x1 extra: sw_pf Not software-prefetch load dispatches that hit FB allocated for software prefetch
0x2 extra: hw_pf Not software-prefetch load dispatches that hit FB allocated for hardware prefetch
name:tx_mem type:exclusive default:0x1
0x1 extra: abort_conflict Number of times a transactional abort was signaled due to a data conflict on a transactionally accessed address
- 0x2 extra: abort_capacity Number of times a transactional abort was signaled due to a data capacity limitation
+ 0x2 extra: abort_capacity_write Number of times a transactional abort was signaled due to a data capacity limitation for transactional writes.
0x4 extra: abort_hle_store_to_elided_lock Number of times a HLE transactional region aborted due to a non XRELEASE prefixed instruction writing to an elided lock in the elision buffer
0x8 extra: abort_hle_elision_buffer_not_empty Number of times an HLE transactional execution aborted due to NoAllocatedElisionBuffer being non-zero.
- 0x10 extra: abort_hle_elision_buffer_mismatch Number of times an HLE transactional execution aborted due to XRELEASE lock not satisfying the address and value requirements in the elision buffer.
+ 0x10 extra: abort_hle_elision_buffer_mismatch Number of times an HLE transactional execution aborted due to XRELEASE lock not satisfying the address and value requirements in the elision buffer
0x20 extra: abort_hle_elision_buffer_unsupported_alignment Number of times an HLE transactional execution aborted due to an unsupported read alignment from the elision buffer.
- 0x40 extra: abort_hle_elision_buffer_full Number of times HLE lock could not be elided due to ElisionBufferAvailable being zero.
+ 0x40 extra: hle_elision_buffer_full Number of times HLE lock could not be elided due to ElisionBufferAvailable being zero.
name:move_elimination type:exclusive default:0x1
0x1 extra: int_eliminated Number of integer Move Elimination candidate uops that were eliminated.
0x2 extra: simd_eliminated Number of SIMD Move Elimination candidate uops that were eliminated.
0x4 extra: int_not_eliminated Number of integer Move Elimination candidate uops that were not eliminated.
0x8 extra: simd_not_eliminated Number of SIMD Move Elimination candidate uops that were not eliminated.
-name:cpl_cycles type:exclusive default:ring0
+name:cpl_cycles type:exclusive default:0x1
0x1 extra: ring0 Unhalted core cycles when the thread is in ring 0
0x2 extra: ring123 Unhalted core cycles when thread is in rings 1, 2, or 3
0x1 extra:cmask=1,edge ring0_trans Number of intervals between processor halts while thread is in ring 0
name:tx_exec type:exclusive default:0x1
- 0x1 extra: misc1 Counts the number of times a class of instructions that may cause a transactional abort was executed. Since this is the count of execution it may not always cause a transactional abort.
- 0x2 extra: misc2 Counts the number of times a class of instructions that may cause a transactional abort was executed inside a transactional region
- 0x4 extra: misc3 Counts the number of times an instruction execution caused the nest count supported to be exceeded
- 0x8 extra: misc4 Counts the number of times an HLE XACQUIRE instruction was executed inside an RTM transactional region
+ 0x1 extra: misc1 Counts the number of times a class of instructions that may cause a transactional abort was executed. Since this is the count of execution, it may not always cause a transactional abort.
+ 0x2 extra: misc2 Counts the number of times a class of instructions (e.g., vzeroupper) that may cause a transactional abort was executed inside a transactional region
+ 0x4 extra: misc3 Counts the number of times an instruction execution caused the transactional nest count supported to be exceeded
+ 0x8 extra: misc4 Counts the number of times a XBEGIN instruction was executed inside an HLE transactional region.
+ 0x10 extra: misc5 Counts the number of times an HLE XACQUIRE instruction was executed inside an RTM transactional region
+name:rs_events type:exclusive default:0x1
+ 0x1 extra: empty_cycles This event counts cycles when the Reservation Station ( RS ) is empty for the thread. The RS is a structure that buffers allocated micro-ops from the Front-end. If there are many cycles when the RS is empty, it may represent an underflow of instructions delivered from the Front-end.
+ 0x1 extra:cmask=1,inv,edge empty_end Counts end of periods where the Reservation Station (RS) was empty. Could be useful to precisely locate Frontend Latency Bound issues.
+name:offcore_requests_outstanding type:exclusive default:0x1
+ 0x1 extra: demand_data_rd Offcore outstanding Demand Data Read transactions in uncore queue.
+ 0x2 extra: demand_code_rd Offcore outstanding code reads transactions in SuperQueue (SQ), queue to uncore, every cycle
+ 0x4 extra: demand_rfo Offcore outstanding RFO store transactions in SuperQueue (SQ), queue to uncore
+ 0x8 extra: all_data_rd Offcore outstanding cacheable Core Data Read transactions in SuperQueue (SQ), queue to uncore
+ 0x1 extra:cmask=1 cycles_with_demand_data_rd Cycles when offcore outstanding Demand Data Read transactions are present in SuperQueue (SQ), queue to uncore
+ 0x8 extra:cmask=1 cycles_with_data_rd Cycles when offcore outstanding cacheable Core Data Read transactions are present in SuperQueue (SQ), queue to uncore
name:lock_cycles type:exclusive default:0x1
0x1 extra: split_lock_uc_lock_duration Cycles when L1 and L2 are locked due to UC or split lock
0x2 extra: cache_lock_duration Cycles when L1D is locked
@@ -99,8 +114,8 @@ name:idq type:exclusive default:0x2
0x8 extra: dsb_uops Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path
0x10 extra: ms_dsb_uops Uops initiated by Decode Stream Buffer (DSB) that are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy
0x20 extra: ms_mite_uops Uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy
- 0x30 extra: ms_uops Uops delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy
- 0x30 extra:cmask=1 ms_cycles Cycles when uops are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy
+ 0x30 extra: ms_uops This event counts uops delivered by the Front-end with the assistance of the microcode sequencer. Microcode assists are used for complex instructions or scenarios that can't be handled by the standard decoder. Using other instructions, if possible, will usually improve performance.
+ 0x30 extra:cmask=1 ms_cycles This event counts cycles during which the microcode sequencer assisted the Front-end in delivering uops. Microcode assists are used for complex instructions or scenarios that can't be handled by the standard decoder. Using other instructions, if possible, will usually improve performance.
0x4 extra:cmask=1 mite_cycles Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from MITE path
0x8 extra:cmask=1 dsb_cycles Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from Decode Stream Buffer (DSB) path
0x10 extra:cmask=1 ms_dsb_cycles Cycles when uops initiated by Decode Stream Buffer (DSB) are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy
@@ -110,17 +125,21 @@ name:idq type:exclusive default:0x2
0x24 extra:cmask=4 all_mite_cycles_4_uops Cycles MITE is delivering 4 Uops
0x24 extra:cmask=1 all_mite_cycles_any_uops Cycles MITE is delivering any Uop
0x3c extra: mite_all_uops Uops delivered to Instruction Decode Queue (IDQ) from MITE path
+ 0x30 extra:cmask=1,edge ms_switches Number of switches from DSB (Decode Stream Buffer) or MITE (legacy decode pipeline) to the Microcode Sequencer
+name:icache type:exclusive default:0x2
+ 0x2 extra: misses This event counts Instruction Cache (ICACHE) misses.
+ 0x4 extra: ifetch_stall Cycles where a code-fetch stalled due to L1 instruction-cache miss or an iTLB miss
name:itlb_misses type:exclusive default:0x1
0x1 extra: miss_causes_a_walk Misses at all ITLB levels that cause page walks
- 0xe extra: walk_completed Misses in all ITLB levels that cause completed page walks
0x2 extra: walk_completed_4k Code miss in all TLB levels causes a page walk that completes. (4K)
0x4 extra: walk_completed_2m_4m Code miss in all TLB levels causes a page walk that completes. (2M/4M)
- 0x10 extra: walk_duration Cycles when PMH is busy with page walks
- 0x60 extra: stlb_hit Operations that miss the first ITLB level but hit the second and do not cause any page walks
+ 0x10 extra: walk_duration This event counts cycles when the page miss handler (PMH) is servicing page walks caused by ITLB misses.
0x20 extra: stlb_hit_4k Core misses that miss the DTLB and hit the STLB (4K)
0x40 extra: stlb_hit_2m Code misses that miss the DTLB and hit the STLB (2M)
+ 0xe extra: walk_completed Misses in all ITLB levels that cause completed page walks
+ 0x60 extra: stlb_hit Operations that miss the first ITLB level but hit the second and do not cause any page walks
name:ild_stall type:exclusive default:0x1
- 0x1 extra: lcp Stalls caused by changing prefix length of the instruction.
+ 0x1 extra: lcp This event counts cycles where the decoder is stalled on an instruction with a length changing prefix (LCP).
0x4 extra: iq_full Stall cycles because IQ is full
name:br_inst_exec type:exclusive default:0xff
0xff extra: all_branches Speculative and retired branches
@@ -145,14 +164,14 @@ name:br_misp_exec type:exclusive default:0xff
0xc1 extra: all_conditional Speculative and retired mispredicted macro conditional branches
0xc4 extra: all_indirect_jump_non_call_ret Mispredicted indirect branches excluding calls and returns
0xa0 extra: taken_indirect_near_call Taken speculative and retired mispredicted indirect calls
-name:idq_uops_not_delivered type:exclusive default:core
- 0x1 extra: core Uops not delivered to Resource Allocation Table (RAT) per thread when backend of the machine is not stalled
- 0x1 extra:cmask=4 cycles_0_uops_deliv_core Cycles per thread when 4 or more uops are not delivered to Resource Allocation Table (RAT) when backend of the machine is not stalled
- 0x1 extra:cmask=3 cycles_le_1_uop_deliv_core Cycles per thread when 3 or more uops are not delivered to Resource Allocation Table (RAT) when backend of the machine is not stalled
+name:idq_uops_not_delivered type:exclusive default:0x1
+ 0x1 extra: core This event count the number of undelivered (unallocated) uops from the Front-end to the Resource Allocation Table (RAT) while the Back-end of the processor is not stalled. The Front-end can allocate up to 4 uops per cycle so this event can increment 0-4 times per cycle depending on the number of unallocated uops. This event is counted on a per-core basis.
+ 0x1 extra:cmask=4 cycles_0_uops_deliv_core This event counts the number cycles during which the Front-end allocated exactly zero uops to the Resource Allocation Table (RAT) while the Back-end of the processor is not stalled. This event is counted on a per-core basis.
+ 0x1 extra:cmask=3 cycles_le_1_uop_deliv_core Cycles per thread when 3 or more uops are not delivered to Resource Allocation Table (RAT) when backend of the machine is not stalled
0x1 extra:cmask=2 cycles_le_2_uop_deliv_core Cycles with less than 2 uops delivered by the front end.
0x1 extra:cmask=1 cycles_le_3_uop_deliv_core Cycles with less than 3 uops delivered by the front end.
0x1 extra:cmask=1,inv cycles_fe_was_ok Counts cycles FE delivered 4 uops or Resource Allocation Table (RAT) was stalling FE.
-name:uops_executed_port type:exclusive default:port_0
+name:uops_executed_port type:exclusive default:0x1
0x1 extra: port_0 Cycles per thread when uops are executed in port 0
0x2 extra: port_1 Cycles per thread when uops are executed in port 1
0x4 extra: port_2 Cycles per thread when uops are executed in port 2
@@ -172,88 +191,100 @@ name:uops_executed_port type:exclusive default:port_0
name:resource_stalls type:exclusive default:0x1
0x1 extra: any Resource-related stall cycles
0x4 extra: rs Cycles stalled due to no eligible RS entry available.
- 0x8 extra: sb Cycles stalled due to no store buffers available. (not including draining form sync).
+ 0x8 extra: sb This event counts cycles during which no instructions were allocated because no Store Buffers (SB) were available.
0x10 extra: rob Cycles stalled due to re-order buffer full.
-name:cycle_activity type:exclusive default:0x8
+name:cycle_activity type:exclusive default:0x1
+ 0x1 extra:cmask=1 cycles_l2_pending Cycles with pending L2 cache miss loads.
0x8 extra:cmask=8 cycles_l1d_pending Cycles with pending L1 cache miss loads.
0x2 extra:cmask=2 cycles_ldm_pending Cycles with pending memory loads.
- 0x4 extra:cmask=4 cycles_no_execute Total execution stalls
- 0x6 extra:cmask=6 stalls_ldm_pending Execution stalls due to memory subsystem.
-name:offcore_requests type:exclusive default:0x2
+ 0x4 extra:cmask=4 cycles_no_execute This event counts cycles during which no instructions were executed in the execution stage of the pipeline.
+ 0x5 extra:cmask=5 stalls_l2_pending Execution stalls due to L2 cache misses.
+ 0x6 extra:cmask=6 stalls_ldm_pending This event counts cycles during which no instructions were executed in the execution stage of the pipeline and there were memory instructions pending (waiting for data).
+ 0xc extra:cmask=c stalls_l1d_pending Execution stalls due to L1 data cache misses
+name:offcore_requests type:exclusive default:0x1
+ 0x1 extra: demand_data_rd Demand Data Read requests sent to uncore
0x2 extra: demand_code_rd Cacheable and noncachaeble code read requests
0x4 extra: demand_rfo Demand RFO requests including regular RFOs, locks, ItoM
0x8 extra: all_data_rd Demand and prefetch data reads
-name:uops_executed type:exclusive default:thread
- 0x1 extra: thread Counts the number of uops to be executed per-thread each cycle.
- 0x2 extra: core Number of uops executed on the core.
+name:uops_executed type:exclusive default:0x2
+ 0x2 extra: core Number of uops executed on the core. Errata: HSM31
0x1 extra:cmask=1,inv stall_cycles Counts number of cycles no uops were dispatched to be executed on this thread.
- 0x1 extra:cmask=1,inv cycles_ge_1_uop_exec Cycles where at least 1 uop was executed per-thread
- 0x1 extra:cmask=1,inv cycles_ge_2_uops_exec Cycles where at least 2 uops were executed per-thread
- 0x1 extra:cmask=1,inv cycles_ge_3_uops_exec Cycles where at least 3 uops were executed per-thread
- 0x1 extra:cmask=1,inv cycles_ge_4_uops_exec Cycles where at least 4 uops were executed per-thread
+ 0x1 extra:cmask=1 cycles_ge_1_uops_exec This events counts the cycles where at least one uop was executed. It is counted per thread. Errata: HSM31
+ 0x1 extra:cmask=2 cycles_ge_2_uops_exec This events counts the cycles where at least two uop were executed. It is counted per thread. Errata: HSM31
+ 0x1 extra:cmask=3 cycles_ge_3_uops_exec This events counts the cycles where at least three uop were executed. It is counted per thread. Errata: HSM31
+ 0x1 extra:cmask=4 cycles_ge_4_uops_exec Cycles where at least 4 uops were executed per-thread Errata: HSM31
name:page_walker_loads type:exclusive default:0x11
- 0x11 extra: ia32_dtlb_l1 Number of DTLB page walker hits in the L1+FB
- 0x21 extra: ia32_itlb_l1 Number of ITLB page walker hits in the L1+FB
- 0x12 extra: ia32_dtlb_l2 Number of DTLB page walker hits in the L2
- 0x22 extra: ia32_itlb_l2 Number of ITLB page walker hits in the L2
- 0x14 extra: ia32_dtlb_l3 Number of DTLB page walker hits in the L3 + XSNP
- 0x24 extra: ia32_itlb_l3 Number of ITLB page walker hits in the L3 + XSNP
- 0x18 extra: ia32_dtlb_memory Number of DTLB page walker hits in Memory
- 0x28 extra: ia32_itlb_memory Number of ITLB page walker hits in Memory
+ 0x11 extra: dtlb_l1 Number of DTLB page walker hits in the L1+FB
+ 0x21 extra: itlb_l1 Number of ITLB page walker hits in the L1+FB
+ 0x41 extra: ept_dtlb_l1 Counts the number of Extended Page Table walks from the DTLB that hit in the L1 and FB.
+ 0x81 extra: ept_itlb_l1 Counts the number of Extended Page Table walks from the ITLB that hit in the L1 and FB.
+ 0x12 extra: dtlb_l2 Number of DTLB page walker hits in the L2
+ 0x22 extra: itlb_l2 Number of ITLB page walker hits in the L2
+ 0x42 extra: ept_dtlb_l2 Counts the number of Extended Page Table walks from the DTLB that hit in the L2.
+ 0x82 extra: ept_itlb_l2 Counts the number of Extended Page Table walks from the ITLB that hit in the L2.
+ 0x14 extra: dtlb_l3 Number of DTLB page walker hits in the L3 + XSNP
+ 0x24 extra: itlb_l3 Number of ITLB page walker hits in the L3 + XSNP
+ 0x44 extra: ept_dtlb_l3 Counts the number of Extended Page Table walks from the DTLB that hit in the L3.
+ 0x84 extra: ept_itlb_l3 Counts the number of Extended Page Table walks from the ITLB that hit in the L2.
+ 0x18 extra: dtlb_memory Number of DTLB page walker hits in Memory
+ 0x48 extra: ept_dtlb_memory Counts the number of Extended Page Table walks from the DTLB that hit in memory.
+ 0x88 extra: ept_itlb_memory Counts the number of Extended Page Table walks from the ITLB that hit in memory.
name:tlb_flush type:exclusive default:0x1
0x1 extra: dtlb_thread DTLB flush attempts of the thread-specific entries
0x20 extra: stlb_any STLB flush attempts
name:other_assists type:exclusive default:0x8
- 0x8 extra: avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty applicable.
- 0x10 extra: sse_to_avx Number of transitions from SSE to AVX-256 when penalty applicable.
+ 0x8 extra: avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty applicable. Errata: HSM57
+ 0x10 extra: sse_to_avx Number of transitions from SSE to AVX-256 when penalty applicable. Errata: HSM57
0x40 extra: any_wb_assist Number of times any microcode assist is invoked by HW upon uop writeback.
-name:uops_retired type:exclusive default:all
- 0x1 extra: all Actually retired uops.
- 0x2 extra: retire_slots Retirement slots used.
- 0x1 extra:pebs all_ps Actually retired uops. (Precise Event - PEBS)
- 0x2 extra:pebs retire_slots_ps Retirement slots used. (Precise Event - PEBS)
- 0x1 extra:cmask=1,inv stall_cycles Cycles without actually retired uops.
- 0x1 extra:cmask=10,inv total_cycles Cycles with less than 10 actually retired uops.
- 0x1 extra:cmask=1,inv,any core_stall_cycles Cycles without actually retired uops.
-name:machine_clears type:exclusive default:0x2
- 0x2 extra: memory_ordering Counts the number of machine clears due to memory order conflicts.
- 0x4 extra: smc Self-modifying code (SMC) detected.
- 0x20 extra: maskmov This event counts the number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0.
-name:br_inst_retired type:exclusive default:all_branches_ps
- 0x1 extra: conditional Conditional branch instructions retired.
- 0x2 extra: near_call Direct and indirect near call instructions retired.
- 0x8 extra: near_return Return instructions retired.
- 0x10 extra: not_taken Not taken branch instructions retired.
- 0x20 extra: near_taken Taken branch instructions retired.
- 0x40 extra: far_branch Far branch instructions retired.
- 0x1 extra:pebs conditional_ps Conditional branch instructions retired. (Precise Event - PEBS)
- 0x2 extra:pebs near_call_ps Direct and indirect near call instructions retired. (Precise Event - PEBS)
- 0x4 extra:pebs all_branches_ps All (macro) branch instructions retired. (Precise Event - PEBS)
- 0x8 extra:pebs near_return_ps Return instructions retired. (Precise Event - PEBS)
- 0x20 extra:pebs near_taken_ps Taken branch instructions retired. (Precise Event - PEBS)
- 0x2 extra: near_call_r3 Direct and indirect macro near call instructions retired (captured in ring 3).
- 0x2 extra:pebs near_call_r3_ps Direct and indirect macro near call instructions retired (captured in ring 3). (Precise Event - PEBS)
-name:br_misp_retired type:exclusive default:all_branches_ps
- 0x1 extra: conditional Mispredicted conditional branch instructions retired.
- 0x1 extra:pebs conditional_ps Mispredicted conditional branch instructions retired. (Precise Event - PEBS)
- 0x4 extra:pebs all_branches_ps Mispredicted macro branch instructions retired. (Precise Event - PEBS)
- 0x20 extra: near_taken number of near branch instructions retired that were mispredicted and taken.
- 0x20 extra:pebs near_taken_ps number of near branch instructions retired that were mispredicted and taken. (Precise Event - PEBS)
+name:uops_retired type:exclusive default:0x1
+ 0x1 extra: all Actually retired uops.
+ 0x1 extra: all_pebs Actually retired uops.
+ 0x2 extra: retire_slots This event counts the number of retirement slots used each cycle. There are potentially 4 slots that can be used each cycle - meaning, 4 uops or 4 instructions could retire each cycle.
+ 0x2 extra: retire_slots_pebs This event counts the number of retirement slots used each cycle. There are potentially 4 slots that can be used each cycle - meaning, 4 uops or 4 instructions could retire each cycle.
+ 0x1 extra:cmask=1,inv stall_cycles Cycles without actually retired uops.
+ 0x1 extra:cmask=a,inv total_cycles Cycles with less than 10 actually retired uops.
+ 0x1 extra:cmask=1,inv core_stall_cycles Cycles without actually retired uops.
+name:machine_clears type:exclusive default:0x1
+ 0x1 extra: cycles Cycles there was a Nuke. Account for both thread-specific and All Thread Nukes.
+ 0x2 extra: memory_ordering This event counts the number of memory ordering machine clears detected. Memory ordering machine clears can result from memory address aliasing or snoops from another hardware thread or core to data inflight in the pipeline. Machine clears can have a significant performance impact if they are happening frequently.
+ 0x4 extra: smc This event is incremented when self-modifying code (SMC) is detected, which causes a machine clear. Machine clears can have a significant performance impact if they are happening frequently.
+ 0x20 extra: maskmov This event counts the number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0.
+ 0x1 extra:cmask=1,edge count Number of machine clears (nukes) of any type.
+name:br_inst_retired type:exclusive default:0x1
+ 0x1 extra: conditional Conditional branch instructions retired.
+ 0x1 extra: conditional_pebs Conditional branch instructions retired.
+ 0x2 extra: near_call Direct and indirect near call instructions retired.
+ 0x2 extra: near_call_pebs Direct and indirect near call instructions retired.
+ 0x8 extra: near_return Return instructions retired.
+ 0x8 extra: near_return_pebs Return instructions retired.
+ 0x10 extra: not_taken Not taken branch instructions retired.
+ 0x20 extra: near_taken Taken branch instructions retired.
+ 0x20 extra: near_taken_pebs Taken branch instructions retired.
+ 0x40 extra: far_branch Far branch instructions retired.
+ 0x4 extra:pebs all_branches_pebs All (macro) branch instructions retired.
+name:br_misp_retired type:exclusive default:0x1
+ 0x1 extra: conditional Mispredicted conditional branch instructions retired.
+ 0x1 extra: conditional_pebs Mispredicted conditional branch instructions retired.
+ 0x4 extra:pebs all_branches_pebs This event counts all mispredicted branch instructions retired. This is a precise event.
+ 0x20 extra: near_taken number of near branch instructions retired that were mispredicted and taken.
+ 0x20 extra: near_taken_pebs number of near branch instructions retired that were mispredicted and taken.
name:hle_retired type:exclusive default:0x1
0x1 extra: start Number of times an HLE execution started.
0x2 extra: commit Number of times an HLE execution successfully committed
- 0x4 extra: aborted Number of times an HLE execution aborted due to any reasons (multiple categories may count as one)
- 0x8 extra: aborted_misc1 Number of times an HLE execution aborted due to 1 various memory events
+ 0x4 extra: aborted Number of times an HLE execution aborted due to any reasons (multiple categories may count as one).
+ 0x4 extra: aborted_pebs Number of times an HLE execution aborted due to any reasons (multiple categories may count as one).
+ 0x8 extra: aborted_misc1 Number of times an HLE execution aborted due to various memory events (e.g., read/write capacity and conflicts).
0x10 extra: aborted_misc2 Number of times an HLE execution aborted due to uncommon conditions
0x20 extra: aborted_misc3 Number of times an HLE execution aborted due to HLE-unfriendly instructions
0x40 extra: aborted_misc4 Number of times an HLE execution aborted due to incompatible memory type
- 0x80 extra: aborted_misc5 Number of times an HLE execution aborted due to none of the previous categories (e.g. interrupt)
+ 0x80 extra: aborted_misc5 Number of times an HLE execution aborted due to none of the previous 4 categories (e.g. interrupts)
name:rtm_retired type:exclusive default:0x1
0x1 extra: start Number of times an RTM execution started.
0x2 extra: commit Number of times an RTM execution successfully committed
- 0x4 extra: aborted Number of times an RTM execution aborted due to any reasons (multiple categories may count as one)
- 0x8 extra: aborted_misc1 Number of times an RTM execution aborted due to various memory events
- 0x10 extra: aborted_misc2 Number of times an RTM execution aborted due to uncommon conditions
+ 0x4 extra: aborted Number of times an RTM execution aborted due to any reasons (multiple categories may count as one).
+ 0x4 extra: aborted_pebs Number of times an RTM execution aborted due to any reasons (multiple categories may count as one).
+ 0x8 extra: aborted_misc1 Number of times an RTM execution aborted due to various memory events (e.g. read/write capacity and conflicts)
+ 0x10 extra: aborted_misc2 Number of times an RTM execution aborted due to various memory events (e.g., read/write capacity and conflicts).
0x20 extra: aborted_misc3 Number of times an RTM execution aborted due to HLE-unfriendly instructions
0x40 extra: aborted_misc4 Number of times an RTM execution aborted due to incompatible memory type
0x80 extra: aborted_misc5 Number of times an RTM execution aborted due to none of the previous 4 categories (e.g. interrupt)
@@ -263,51 +294,59 @@ name:fp_assist type:exclusive default:0x1e
0x4 extra: x87_input Number of X87 assists due to input value.
0x8 extra: simd_output Number of SIMD FP assists due to Output values
0x10 extra: simd_input Number of SIMD FP assists due to input values
-name:mem_uops_retired type:exclusive default:all_loads
- 0x11 extra: stlb_miss_loads Load uops with true STLB miss retired to architected path.
- 0x12 extra: stlb_miss_stores Store uops with true STLB miss retired to architected path.
- 0x21 extra: lock_loads Load uops with locked access retired to architected path.
- 0x41 extra: split_loads Line-splitted load uops retired to architected path.
- 0x42 extra: split_stores Line-splitted store uops retired to architected path.
- 0x81 extra: all_loads Load uops retired to architected path with filter on bits 0 and 1 applied.
- 0x82 extra: all_stores Store uops retired to architected path with filter on bits 0 and 1 applied.
- 0x11 extra:pebs stlb_miss_loads_ps Load uops with true STLB miss retired to architected path. (Precise Event - PEBS)
- 0x12 extra:pebs stlb_miss_stores_ps Store uops true STLB miss retired to architected path. (Precise Event - PEBS)
- 0x21 extra:pebs lock_loads_ps Load uops with locked access retired to architected path. (Precise Event - PEBS)
- 0x41 extra:pebs split_loads_ps Line-splitted load uops retired to architected path. (Precise Event - PEBS)
- 0x42 extra:pebs split_stores_ps Line-splitted store uops retired to architected path. (Precise Event - PEBS)
- 0x81 extra:pebs all_loads_ps Load uops retired to architected path with filter on bits 0 and 1 applied. (Precise Event - PEBS)
- 0x82 extra:pebs all_stores_ps Store uops retired to architected path with filter on bits 0 and 1 applied. (Precise Event - PEBS)
-name:mem_load_uops_retired type:exclusive default:l1_hit
- 0x1 extra: l1_hit Retired load uops with L1 cache hits as data sources.
- 0x2 extra: l2_hit Retired load uops with L2 cache hits as data sources.
- 0x4 extra: l3_hit Retired load uops which data sources were data hits in LLC without snoops required.
- 0x10 extra: l2_miss Miss in mid-level (L2) cache. Excludes Unknown data-source.
- 0x40 extra: hit_lfb Retired load uops which data sources were load uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready.
- 0x1 extra:pebs l1_hit_ps Retired load uops with L1 cache hits as data sources. (Precise Event - PEBS)
- 0x2 extra:pebs l2_hit_ps Retired load uops with L2 cache hits as data sources. (Precise Event - PEBS)
- 0x4 extra:pebs l3_hit_ps Miss in last-level (L3) cache. Excludes Unknown data-source. (Precise Event - PEBS)
- 0x40 extra:pebs hit_lfb_ps Retired load uops which data sources were load uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready. (Precise Event - PEBS)
-name:mem_load_uops_l3_hit_retired type:exclusive default:xsnp_miss
- 0x1 extra: xsnp_miss Retired load uops which data sources were LLC hit and cross-core snoop missed in on-pkg core cache.
- 0x2 extra: xsnp_hit Retired load uops which data sources were LLC and cross-core snoop hits in on-pkg core cache.
- 0x4 extra: xsnp_hitm Retired load uops which data sources were HitM responses from shared LLC.
- 0x8 extra: xsnp_none Retired load uops which data sources were hits in LLC without snoops required.
- 0x1 extra:pebs xsnp_miss_ps Retired load uops which data sources were LLC hit and cross-core snoop missed in on-pkg core cache. (Precise Event - PEBS)
- 0x2 extra:pebs xsnp_hit_ps Retired load uops which data sources were LLC and cross-core snoop hits in on-pkg core cache. (Precise Event - PEBS)
- 0x4 extra:pebs xsnp_hitm_ps Retired load uops which data sources were HitM responses from shared LLC. (Precise Event - PEBS)
- 0x8 extra:pebs xsnp_none_ps Retired load uops which data sources were hits in LLC without snoops required. (Precise Event - PEBS)
+name:mem_uops_retired type:exclusive default:0x11
+ 0x11 extra: stlb_miss_loads Load uops with true STLB miss retired to architected path. Errata: HSM30
+ 0x11 extra: stlb_miss_loads_pebs Load uops with true STLB miss retired to architected path. Errata: HSM30
+ 0x12 extra: stlb_miss_stores Store uops with true STLB miss retired to architected path. Errata: HSM30
+ 0x12 extra: stlb_miss_stores_pebs Store uops with true STLB miss retired to architected path. Errata: HSM30
+ 0x21 extra: lock_loads Load uops with locked access retired to architected path. Errata: HSM30
+ 0x21 extra: lock_loads_pebs Load uops with locked access retired to architected path. Errata: HSM30
+ 0x41 extra: split_loads Line-splitted load uops retired to architected path. Errata: HSM30
+ 0x41 extra: split_loads_pebs Line-splitted load uops retired to architected path. Errata: HSM30
+ 0x42 extra: split_stores Line-splitted store uops retired to architected path. Errata: HSM30
+ 0x42 extra: split_stores_pebs Line-splitted store uops retired to architected path. Errata: HSM30
+ 0x81 extra: all_loads Load uops retired to architected path with filter on bits 0 and 1 applied. Errata: HSM30
+ 0x81 extra: all_loads_pebs Load uops retired to architected path with filter on bits 0 and 1 applied. Errata: HSM30
+ 0x82 extra: all_stores Store uops retired to architected path with filter on bits 0 and 1 applied. Errata: HSM30
+ 0x82 extra: all_stores_pebs Store uops retired to architected path with filter on bits 0 and 1 applied. Errata: HSM30
+name:mem_load_uops_retired type:exclusive default:0x1
+ 0x1 extra: l1_hit Retired load uops with L1 cache hits as data sources. Errata: HSM30
+ 0x1 extra: l1_hit_pebs Retired load uops with L1 cache hits as data sources. Errata: HSM30
+ 0x2 extra: l2_hit Retired load uops with L2 cache hits as data sources. Errata: HSM30
+ 0x2 extra: l2_hit_pebs Retired load uops with L2 cache hits as data sources. Errata: HSM30
+ 0x4 extra: l3_hit Retired load uops which data sources were data hits in L3 without snoops required. Errata: HSM26, HSM30
+ 0x4 extra: l3_hit_pebs Retired load uops which data sources were data hits in L3 without snoops required. Errata: HSM26, HSM30
+ 0x8 extra: l1_miss Retired load uops misses in L1 cache as data sources. Errata: HSM30
+ 0x8 extra: l1_miss_pebs Retired load uops misses in L1 cache as data sources. Errata: HSM30
+ 0x10 extra: l2_miss Miss in mid-level (L2) cache. Excludes Unknown data-source. Errata: HSM30
+ 0x10 extra: l2_miss_pebs Miss in mid-level (L2) cache. Excludes Unknown data-source. Errata: HSM30
+ 0x20 extra: l3_miss Miss in last-level (L3) cache. Excludes Unknown data-source. Errata: HSM26, HSM30
+ 0x20 extra: l3_miss_pebs Miss in last-level (L3) cache. Excludes Unknown data-source. Errata: HSM26, HSM30
+ 0x40 extra: hit_lfb Retired load uops which data sources were load uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready. Errata: HSM30
+ 0x40 extra: hit_lfb_pebs Retired load uops which data sources were load uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready. Errata: HSM30
+name:mem_load_uops_l3_hit_retired type:exclusive default:0x1
+ 0x1 extra: xsnp_miss Retired load uops which data sources were L3 hit and cross-core snoop missed in on-pkg core cache. Errata: HSM26, HSM30
+ 0x1 extra: xsnp_miss_pebs Retired load uops which data sources were L3 hit and cross-core snoop missed in on-pkg core cache. Errata: HSM26, HSM30
+ 0x2 extra: xsnp_hit Retired load uops which data sources were L3 and cross-core snoop hits in on-pkg core cache. Errata: HSM26, HSM30
+ 0x2 extra: xsnp_hit_pebs Retired load uops which data sources were L3 and cross-core snoop hits in on-pkg core cache. Errata: HSM26, HSM30
+ 0x4 extra: xsnp_hitm Retired load uops which data sources were HitM responses from shared L3. Errata: HSM26, HSM30
+ 0x4 extra: xsnp_hitm_pebs Retired load uops which data sources were HitM responses from shared L3. Errata: HSM26, HSM30
+ 0x8 extra: xsnp_none Retired load uops which data sources were hits in L3 without snoops required. Errata: HSM26, HSM30
+ 0x8 extra: xsnp_none_pebs Retired load uops which data sources were hits in L3 without snoops required. Errata: HSM26, HSM30
+name:mem_load_uops_l3_miss_retired type:exclusive default:0x1
+ 0x1 extra: local_dram This event counts retired load uops where the data came from local DRAM. This does not include hardware prefetches. Errata: HSM30
+ 0x1 extra: local_dram_pebs This event counts retired load uops where the data came from local DRAM. This does not include hardware prefetches. Errata: HSM30
name:l2_trans type:exclusive default:0x80
0x80 extra: all_requests Transactions accessing L2 pipe
0x1 extra: demand_data_rd Demand Data Read requests that access L2 cache
0x2 extra: rfo RFO requests that access L2 cache
0x4 extra: code_rd L2 cache accesses when fetching instructions
- 0x8 extra: all_pf L2 or LLC HW prefetches that access L2 cache
+ 0x8 extra: all_pf L2 or L3 HW prefetches that access L2 cache
0x10 extra: l1d_wb L1D writebacks that access L2 cache
0x20 extra: l2_fill L2 fill requests that access L2 cache
0x40 extra: l2_wb L2 writebacks that access L2 cache
name:l2_lines_in type:exclusive default:0x7
- 0x7 extra: all L2 cache lines filling L2
+ 0x7 extra: all This event counts the number of L2 cache lines brought into the L2 cache. Lines are filled into the L2 cache when there was an L2 miss.
0x1 extra: i L2 cache lines in I state filling L2
0x2 extra: s L2 cache lines in S state filling L2
0x4 extra: e L2 cache lines in E state filling L2