You must be already aware performance numbers are critical both SW and HW.
SW in the sense you have added/modified/deleted code, so how is the performance after
change.
HW in the sense, a New Processor is being introduced or a new DIMM upgrade is planned,
or on the IO side a new PLX switch is being introduced or that matter its firmware is being
updated.
Oh, add a another variable, virtualization!! You have diced and sliced the existing HW to
run 'x' number of Logical Domains (LDOMS - Sun's terminology).
Performance numbers are critical to every involved in the Product Development/Support.
(Exec's, Team Members, Marketing, Support etc...)
So Crunching the numbers in the best optimized form is critical. From a SW point
there are many applications / Specs that could be run. It depends on what you
plan to use the server for (Webserver, Database, HPC applications).
I'll cover in different posting, things available to run, but for now assume that
we are running something and the logs will provide you a number or a average value.
(Eg: In case of Webserver, no. of active connections that requested some content - file get).
Repeat you experiments for 'n' times, then you have a average number.
This is not it! I want to open the hood and monitor things (counters) while the application
is running. So here comes CPU performance counters - front end (commands) and back end (module).
Frond End:
OpenSolaris provides front end CLI's that gets the data cpustat.c
cpustat monitor system behavior using CPU performance counters,
/usr/sbin/cpustat -h --> will provide you the counters exported by systems CPU.
Back End:
CLI calls's into back end module called pcbe (Performance Counter Back End)
CPU specific back end module is written to interfaces described in cpc_pcbe.h
In the onnv (Nevada) source base following are the files:
- uts/common/sys/cpc_pcbe.h ---> Interfaces
- uts/intel/pcbe/core_pcbe.c ---> Intel Family 6 Models 15, 23, 26 & 28
- uts/intel/pcbe/opteron_pcbe.c ---> AMD Opteron and AMD Athlon 64 processors
- uts/intel/pcbe/p123_pcbe.c ---> Pentiums I, II, and III
- uts/intel/pcbe/p4_pcbe.c ---> Pentium 4
- uts/sun4u/pcbe/opl_pcbe.c ---> Fuijtsu's SPARC64 VI & VII
- uts/sun4u/pcbe/us234_pcbe.c ---> UltraSPARC-II, III, IV
- uts/sun4v/pcbe/niagara2_pcbe.c --> UltraSPARC T2 & T2+ Processors
- uts/sun4v/pcbe/niagara_pcbe.c ---> Niagara
- uts/sun4v/pcbe/rock_pcbe.c ---> Rock CPU
and frequency of events is described in PIC register, both the registers
exists per thread. Some of the events are per chip based,
events increments PIC's of all threads. (cycle_count).
137 * Performance Control Register (PCR)
138 *
139 * +----------+-----+-----+------+----+
140 * | 0 | OVF | 0 | OVR0 | 0 |
141 * +----------+-----+-----+------+----+
142 * 63 48 47:32 31:27 26 25
143 *
144 * +----+----+--- -+----+-----+---+-----+-----+----+----+----+
145 * | NC | 0 | SC | 0 | SU | 0 | SL |ULRO | UT | ST |PRIV|
146 * +----+----+-----+----+-----+---+-----+-----+----+----+----+
147 * 24:22 21 20:18 17 16:11 10 9:4 3 2 1 0
148 *
149 * ULRO and OVRO bits should be on upon accessing pcr unless
150 * those fields need to be updated.
151 * Turn off these bits when updating SU/SL or OVF field
152 * (during initialization, etc.).
153 *
154 *
155 * Performance Instrumentation Counter (PIC)
156 * Four PICs are implemented in SPARC64 VI and VII,
157 * each PIC is accessed using PCR.SC as a select field.
158 *
159 * +------------------------+--------------------------+
160 * | PICU | PICL |
161 * +------------------------+--------------------------+
162 * 63 32 31 0
Sample script to monitor the events.
#!/bin/ksh
#cpustat -c pic0=cycle_counts,cycle_counts,cycle_counts,cycle_counts,cycle_counts,cycle_counts,cycle_counts,cycle_counts 5 5
# event specification syntax:
# [picn=]
while :
do
cpustat \
-c pic0=cycle_counts,pic1=cycle_counts,pic7=cycle_counts \
-c pic0=instruction_counts,pic1=instruction_counts,pic7=op_stv_wait \
-c pic0=op_stv_wait,pic1=instruction_flow_counts,pic7=load_store_instructions \
-c pic0=load_store_instructions,pic1=iwr_empty,pic7=branch_instructions \
-c pic0=branch_instructions,pic1=op_stv_wait,pic7=floating_instructions \
-c pic0=floating_instructions,pic1=load_store_instructions,pic7=impdep2_instructions \
-c pic0=impdep2_instructions,pic1=branch_instructions,pic7=prefetch_instructions \
-c pic0=prefetch_instructions,pic1=floating_instructions,pic7=regwin_intlk \
-c pic0=flush_rs,pic1=impdep2_instructions,pic7=rs1 \
-c pic0=2iid_use,pic1=prefetch_instructions,pic7=trap_IMMU_miss \
-c pic0=trap_int_vector,pic1=rs1,pic7=jbus_odrbus2_busy \
-c pic0=ts_by_sxmiss,pic1=1iid_use \
-c pic0=active_cycle_count,pic1=trap_all,pic7=1endop \
-c pic0=op_stv_wait_sxmiss,pic1=thread_switch_all,pic7=op_stv_wait_sxmiss_ex \
-c pic1=active_cycle_count,pic7=if_wait_all \
-c pic0=swpf_fail_all,pic1=act_thread_suspend,pic7=dvp_count_dm \
-c pic0=sx_miss_wait_pf,pic1=cse_window_empty,pic7=sx_miss_count_dm_opsh \
-c pic0=jbus_cpi_count,pic1=inh_cmit_gpr_2write,pic7=jbus_odrbus2_busy \
-c pic0=jbus_reqbus1_busy,pic1=swpf_success_all,pic7=instruction_counts 1 10
sleep 5
done
exit
event1: cycle_counts instruction_counts instruction_flow_counts
iwr_empty op_stv_wait load_store_instructions
branch_instructions floating_instructions
impdep2_instructions prefetch_instructions rs1 1iid_use
trap_all thread_switch_all active_cycle_count
act_thread_suspend cse_window_empty inh_cmit_gpr_2write
swpf_success_all sx_miss_wait_dm jbus_bi_count
lost_softpf_pfp_full jbus_reqbus0_busy
event2: cycle_counts instruction_counts op_stv_wait
load_store_instructions branch_instructions
floating_instructions impdep2_instructions
prefetch_instructions 4iid_use flush_rs trap_spill
ts_by_timer active_cycle_count 0iid_use
op_stv_wait_nc_pend 0endop write_op_uTLB sx_miss_count_pf
jbus_cpd_count snres_64 jbus_reqbus3_busy
event3: cycle_counts instruction_counts op_stv_wait
load_store_instructions branch_instructions
floating_instructions impdep2_instructions
prefetch_instructions 3iid_use trap_int_level
ts_by_data_arrive active_cycle_count op_stv_wait_nc_pend
op_stv_wait_sxmiss_ex eu_comp_wait write_if_uTLB
sx_miss_count_dm jbus_cpb_count snres_256
lost_softpf_by_abort jbus_reqbus2_busy
event4: cycle_counts instruction_counts op_stv_wait
load_store_instructions branch_instructions
floating_instructions impdep2_instructions
prefetch_instructions sync_intlk trap_trap_inst ts_by_if
active_cycle_count cse_window_empty_sp_full fl_comp_wait
op_r_iu_req_mi_go sx_read_count_pf jbus_orderbus_busy
sx_miss_count_dm_if jbus_odrbus1_busy
event5: cycle_counts instruction_counts instruction_flow_counts
iwr_empty op_stv_wait load_store_instructions
branch_instructions floating_instructions
impdep2_instructions prefetch_instructions trap_fill
ts_by_intr active_cycle_count flush_rs
cse_window_empty_sp_full op_stv_wait_ex 3endop
if_r_iu_req_mi_go swpf_lbs_hit sx_read_count_dm
jbus_reqbus_busy sx_btc_count jbus_odrbus0_busy
event6: cycle_counts instruction_counts op_stv_wait
load_store_instructions branch_instructions
floating_instructions impdep2_instructions
prefetch_instructions trap_DMMU_miss ts_by_suspend
ts_by_other active_cycle_count decall_intlk
cse_window_empty_sp_full 2endop op_stv_wait_sxmiss
op_wait_all dvp_count_pf sx_miss_count_dm_opex
jbus_odrbus3_busy
event7: cycle_counts instruction_counts op_stv_wait
load_store_instructions branch_instructions
floating_instructions impdep2_instructions
prefetch_instructions regwin_intlk rs1 trap_IMMU_miss
ts_by_spinloop active_cycle_count cse_window_empty_sp_full
1endop op_stv_wait_sxmiss_ex if_wait_all dvp_count_dm
sx_miss_count_dm_opsh jbus_odrbus2_busy
attributes: nouser sys
See the "SPARC64 VI extensions" for descriptions of these events.