diff --git a/SOURCES/oprofile-goldmont.patch b/SOURCES/oprofile-goldmont.patch index e792b54..d88e660 100644 --- a/SOURCES/oprofile-goldmont.patch +++ b/SOURCES/oprofile-goldmont.patch @@ -490,3 +490,95 @@ index 2f265b3..d1c08d4 100644 0x8 extra: return Counts BACLEARS on return instructions. - 0x10 extra: cond Counts BACLEARS on Jcc (Jump on Conditional Code/Jump if Conditon is Met) branches. + 0x10 extra: cond Counts BACLEARS on Jcc (Jump on Conditional Code/Jump if Condition is Met) branches. +From b3c20ae8b52c10aa631ca0b931388df98ca3183d Mon Sep 17 00:00:00 2001 +From: Michael Petlan +Date: Fri, 23 Sep 2016 13:35:54 +0200 +Subject: [PATCH] Intel Goldmont default event + +Hi all, + +when testing oprofile on an Intel Goldmont machine, I have found out +that the default event cpu_clk_unhalted returns always zero. Thus, I +checked the configuration and Intel SDM, and I think there must be a +mistake. + +According to the Intel SDM, table 19-24, the event is 0x3c as usual. +It has two unit masks (0x00 (core_p) and 0x01 (ref)). With this, the +event starts giving reasonable results. + +The current configuration which is coded in oprofile is not even in +the SDM tale 19-24, so it is expectable that the following will give +zero value: + +perf stat -e cpu/event=0x00,umask=0x02/ ls + +Please consider applying the attached patch. + +CC'ing Andi to verify the fix. + +Thank you, +Michael + +commit df73e385442236fd6e763cc192185c606e59feda +Author: Michael Petlan +Date: Fri Sep 23 13:16:00 2016 +0200 + + Fixed default event on Intel Goldmont + + According to the Intel SDM, table 19-24, the event cpu_clk_unhalted + has the event number 0x3c and has two unit masks (0x00, 0x01). This + also corresponds to other Intels where the event is also 0x3c. + + Tested on a Goldmont Harrisonville (model 95). + + Before the patch: + + $ ocount ls + Events were actively counted for 1761229 nanoseconds. + Event counts (actual) for /usr/bin/ls: + Event Count % time counted + cpu_clk_unhalted 0 100.00 + + After the patch: + + Event counts (actual) for /usr/bin/ls: + Event Count % time counted + cpu_clk_unhalted 2,948,142 100.00 + + Signed-off-by: Michael Petlan +--- + events/i386/goldmont/events | 2 +- + events/i386/goldmont/unit_masks | 4 +--- + 2 files changed, 2 insertions(+), 4 deletions(-) + +diff --git a/events/i386/goldmont/events b/events/i386/goldmont/events +index 111438e..89cbc59 100644 +--- a/events/i386/goldmont/events ++++ b/events/i386/goldmont/events +@@ -6,7 +6,7 @@ + # Note the minimum counts are not discovered experimentally and could be likely + # lowered in many cases without ill effect. + # +-event:0x00 counters:cpuid um:cpu_clk_unhalted minimum:2000003 name:cpu_clk_unhalted : ++event:0x3c counters:cpuid um:cpu_clk_unhalted minimum:2000003 name:cpu_clk_unhalted : + event:0x03 counters:cpuid um:ld_blocks minimum:200003 name:ld_blocks : + event:0x05 counters:cpuid um:page_walks minimum:200003 name:page_walks : + event:0x0e counters:cpuid um:uops_issued minimum:200003 name:uops_issued_any : +diff --git a/events/i386/goldmont/unit_masks b/events/i386/goldmont/unit_masks +index d1c08d4..9d93da0 100644 +--- a/events/i386/goldmont/unit_masks ++++ b/events/i386/goldmont/unit_masks +@@ -21,9 +21,7 @@ name:uops_issued type:mandatory default:0x0 + 0x0 extra: any Counts uops issued by the front end and allocated into the back end of the machine. This event counts uops that retire as well as uops that were speculatively executed but didn't retire. The sort of speculative uops that might be counted includes, but is not limited to those uops issued in the shadow of a miss-predicted branch, those uops that are inserted during an assist (such as for a denormal floating point result), and (previously allocated) uops that might be canceled during a machine clear. + name:uops_not_delivered type:mandatory default:0x0 + 0x0 extra: any This event used to measure front-end inefficiencies. I.e. when front-end of the machine is not delivering uops to the back-end and the back-end has is not stalled. This event can be used to identify if the machine is truly front-end bound. When this event occurs, it is an indication that the front-end of the machine is operating at less than its theoretical peak performance. Background: We can think of the processor pipeline as being divided into 2 broader parts: Front-end and Back-end. Front-end is responsible for fetching the instruction, decoding into uops in machine understandable format and putting them into a uop queue to be consumed by back end. The back-end then takes these uops, allocates the required resources. When all resources are ready, uops are executed. If the back-end is not ready to accept uops from the front-end, then we do not want to count these as front-end bottlenecks. However, whenever we have bottlenecks in the back-end, we will have allocation unit stalls and eventually forcing the front-end to wait until the back-end is ready to receive more uops. This event counts only when back-end is requesting more uops and front-end is not able to provide them. When 3 uops are requested and no uops are delivered, the event counts 3. When 3 are requested, and only 1 is delivered, the event counts 2. When only 2 are delivered, the event counts 1. Alternatively stated, the event will not count if 3 uops are delivered, or if the back end is stalled and not requesting any uops at all. Counts indicate missed opportunities for the front-end to deliver a uop to the back end. Some examples of conditions that cause front-end efficiencies are: ICache misses, ITLB misses, and decoder restrictions that limit the front-end bandwidth. Known Issues: Some uops require multiple allocation slots. These uops will not be charged as a front end 'not delivered' opportunity, and will be regarded as a back end problem. For example, the INC instruction has one uop that requires 2 issue slots. A stream of INC instructions will not count as UOPS_NOT_DELIVERED, even though only one instruction can be issued per clock. The low uop issue rate for a stream of INC instructions is considered to be a back end issue. +-name:cpu_clk_unhalted type:exclusive default:core +- 0x2 extra: core Counts the number of core cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. In mobile systems the core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time. This event uses fixed counter 1. You cannot collect a PEBs record for this event. +- 0x1 extra: ref_tsc Counts the number of reference cycles that the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. In mobile systems the core frequency may change from time. This event is not affected by core frequency changes but counts as if the core is running at the maximum frequency all the time. This event uses fixed counter 2. You cannot collect a PEBs record for this event ++name:cpu_clk_unhalted type:exclusive default:core_p + 0x0 extra: core_p Core cycles when core is not halted. This event uses a (_P)rogrammable general purpose performance counter. + 0x1 extra: ref Reference cycles when core is not halted. This event uses a (_P)rogrammable general purpose performance counter. + name:ld_blocks type:exclusive default:all_block +-- +2.7.4 + diff --git a/SPECS/oprofile.spec b/SPECS/oprofile.spec index 7d7cf85..1dbf2fa 100644 --- a/SPECS/oprofile.spec +++ b/SPECS/oprofile.spec @@ -1,7 +1,7 @@ Summary: System wide profiler Name: oprofile Version: 0.9.9 -Release: 20%{?dist} +Release: 21%{?dist} License: GPLv2+ and LGPLv2+ Group: Development/System # @@ -214,6 +214,9 @@ exit 0 %{_sysconfdir}/ld.so.conf.d/* %changelog +* Wed Oct 19 2016 William Cohen - 0.9.9-21 +- Fix Intel Goldmont default event + * Tue Aug 9 2016 William Cohen - 0.9.9-20 - Ensure that the perf events setup before ocount execs child.