e47a9c
.\" Copyright (c) 2014, Jan Chaloupka <jchaloup@redhat.com>
e47a9c
.\"
e47a9c
.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
e47a9c
.\" This is free documentation; you can redistribute it and/or
e47a9c
.\" modify it under the terms of the GNU General Public License as
e47a9c
.\" published by the Free Software Foundation; either version 2 of
e47a9c
.\" the License, or (at your option) any later version.
e47a9c
.\"
e47a9c
.\" The GNU General Public License's references to "object code"
e47a9c
.\" and "executables" are to be interpreted as the output of any
e47a9c
.\" document formatting or typesetting system, including
e47a9c
.\" intermediate and printed output.
e47a9c
.\"
e47a9c
.\" This manual is distributed in the hope that it will be useful,
e47a9c
.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
e47a9c
.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
e47a9c
.\" GNU General Public License for more details.
e47a9c
.\"
e47a9c
.\" You should have received a copy of the GNU General Public
e47a9c
.\" License along with this manual; if not, see
e47a9c
.\" <http://www.gnu.org/licenses/>.
e47a9c
.\" %%%LICENSE_END
e47a9c
.TH "IB_ATOMIC_BW" 1 2014 "Open Fabrics Enterprise Distribution"
e47a9c
.\" IB_ATOMIC_BW
e47a9c
.SH NAME
e47a9c
ib_atomic_bw, ib_atomic_lat, ib_read_bw, ib_read_lat, ib_send_bw,
e47a9c
ib_send_lat, ib_write_bw, ib_write_lat
e47a9c
\- Collection of tests written over uverbs intended for use as a
e47a9c
performance micro-benchmark
e47a9c
.SH SYNOPSIS
e47a9c
.sp
e47a9c
.B ib_atomic_bw [<host>] [options]
e47a9c
.sp
e47a9c
.B ib_atomic_lat [<host>] [options]
e47a9c
.sp
e47a9c
.B ib_read_bw [<host>] [options] 
e47a9c
.sp
e47a9c
.B ib_read_lat [<host>] [options]
e47a9c
.sp
e47a9c
.B ib_write_bw [<host>] [options]
e47a9c
.sp
e47a9c
.B ib_write_lat [<host>] [options]
e47a9c
.SH DESCRIPTION
e47a9c
This is a collection of tests written over uverbs intended for use as a
e47a9c
performance micro-benchmark. As an example, the tests can be used for
e47a9c
HW or SW tuning and/or functional testing.
e47a9c
e47a9c
The collection conatains a set of BW and latency benchmark such as :
e47a9c
.sp
e47a9c
* Read   - ib_read_bw and ib_read_lat.
e47a9c
.sp
e47a9c
* Write  - ib_write_bw and ib_wriet_lat.
e47a9c
.sp
e47a9c
* Send   - ib_send_bw and ib_send_lat.
e47a9c
.sp
e47a9c
* Atomic - ib_atomic_bw and ib_atomic_lat
e47a9c
.sp
e47a9c
* Raw Etherent (when working with MOFED2) - raw_ethernet_bw, raw_ethernet_lat
e47a9c
e47a9c
The benchmark used the CPU cycle counter to get time stamps without context
e47a9c
switch.  Some CPU architectures (e.g., Intel's 80486 or older PPC) do NOT
e47a9c
have such capability.
e47a9c
e47a9c
The latency benchmarks measures round-trip time but reports half of that as
e47a9c
one-way latency.
e47a9c
This means that it may not be sufficiently accurate for asymmetrical
e47a9c
configurations.
e47a9c
e47a9c
On Bw benchmarks, we calculate the BW on send side only, as he calculates
e47a9c
the Bw after collecting completion from the receive side.
e47a9c
In case we use the bidirectional flag , BW is calculated on both sides.
e47a9c
in ib_send_bw, server side also calculate the received throughput.
e47a9c
e47a9c
Min/Median/Max result is reported in latency tests.
e47a9c
The median (vs average) is less sensitive to extreme scores.
e47a9c
Typically, the "Max" value is the first value measured.
e47a9c
e47a9c
Larger samples help marginally only. The default (1000) is pretty good.
e47a9c
Note that an array of cycles_t (typically unsigned long) is allocated
e47a9c
once to collect samples and again to store the difference between them.
e47a9c
Really big sample sizes (e.g., 1 million) might expose other problems
e47a9c
with the program. In this case you can use -N flag (No Peak) to instruct
e47a9c
the test sample only 2 times (begining and end).
e47a9c
e47a9c
All throughput tests now have duration feature as well (-D <seconds to run>)
e47a9c
to instruct the test to run for <seconds to run>.
e47a9c
Another feature added is --run_infinitely, which instruct the test to run
e47a9c
all te time and print throughput every 5 seconds.
e47a9c
e47a9c
The "-H" option (latency) will dump the histogram for additional statistical
e47a9c
analysis.
e47a9c
See xgraph, ygraph, r-base (http://www.r-project.org/), pspp, or other
e47a9c
statistical math programs.
e47a9c
e47a9c
e47a9c
Architectures tested: i686, x86_64, ia64
e47a9c
.SH OPTIONS
e47a9c
The SAME OPTIONS must be passed to both server and client.
e47a9c
e47a9c
If
e47a9c
.I <host>
e47a9c
is not presented, command starts a server and waits for connection.
e47a9c
If it is, command connects to server at
e47a9c
.I <host>.
e47a9c
.sp
e47a9c
.B Common Options:
e47a9c
.RS 4
e47a9c
.TP
e47a9c
\fB\-h\fR, \fB\-\-help\fR
e47a9c
Display this help message screen.
e47a9c
.TP
e47a9c
\fB\-p\fR, \fB\-\-port\fR=\fI<port>\fR
e47a9c
Listen on/connect to port <port> (default: 18515) when exchaning data.
e47a9c
.TP
e47a9c
\fB\-R\fR, \fB\-\-rdma_cm\fR
e47a9c
Connect QPs with rdma_cm and run test on those QPs.
e47a9c
.TP
e47a9c
\fB\-z\fR, \fB\-\-com_rdma_cm\fR
e47a9c
Communicate with rdma_cm module to exchange data \- use regular QPs.
e47a9c
.TP
e47a9c
\fB\-m\fR, \fB\-\-mtu\fR=\fI<mtu>\fR
e47a9c
 QP Mtu size (default: active_mtu from ibv_devinfo).
e47a9c
.TP
e47a9c
\fB\-c\fR, \fB\-\-connection\fR=\fI<RC/UC/UD>\fR
e47a9c
Connection type RC/UC/UD (default RC)
e47a9c
.TP
e47a9c
\fB\-d\fR, \fB\-\-ib\-dev\fR=\fI<dev>\fR
e47a9c
Use IB device <dev> (default: first device found).
e47a9c
.TP
e47a9c
\fB\-i\fR, \fB\-\-ib\-port\fR=\fI<port>\fR
e47a9c
Use port <port> of IB device (default: 1).
e47a9c
.TP
e47a9c
\fB\-s\fR, \fB\-\-size\fR=\fI<size>\fR
e47a9c
Size of message to exchange (default: 1).
e47a9c
.TP
e47a9c
\fB\-a\fR, \fB\-\-all\fR
e47a9c
Run sizes from 2 till 2^23.
e47a9c
.TP
e47a9c
\fB\-n\fR, \fB\-\-iters\fR=\fI<iters>\fR
e47a9c
Number of exchanges (at least 100, default: 1000).
e47a9c
.TP
e47a9c
\fB\-x\fR, \fB\-\-gid\-index\fR=\fI<index>\fR
e47a9c
Test uses GID with GID index taken from command
e47a9c
.TP
e47a9c
\fB\-V\fR, \fB\-\-version\fR
e47a9c
Display version number.
e47a9c
.TP
e47a9c
\fB\-e\fR, \fB\-\-events\fR
e47a9c
Sleep on CQ events (default poll).
e47a9c
.TP
e47a9c
\fB\-F\fR, \fB\-\-CPU\-freq\fR
e47a9c
Do not fail even if cpufreq_ondemand module.
e47a9c
.TP
e47a9c
\fB\-I\fR, \fB\-\-inline_size\fR=\fI<size>\fR
e47a9c
Max size of message to be sent in inline mode.
e47a9c
.TP
e47a9c
\fB\-u\fR, \fB\-\-qp\-timeout\fR=\fI<timeout>\fR
e47a9c
QP timeout, timeout value is 4 usec*2 ^timeout (default: 14).
e47a9c
.TP
e47a9c
\fB\-S\fR, \fB\-\-sl\fR=\fI<sl>\fR
e47a9c
SL \- Service Level (default 0)
e47a9c
.TP
e47a9c
\fB\-r\fR, \fB\-\-rx\-depth\fR=\fI<dep>\fR
e47a9c
Make rx queue bigger than tx (default 600).
e47a9c
.RE
e47a9c
.sp
e47a9c
.B Latenct tests options:
e47a9c
.RS 4
e47a9c
.TP
e47a9c
\fB\-C\fR, \fB\-\-report\-cycles\fR
e47a9c
Report times in cpu cycle units.
e47a9c
.TP
e47a9c
\fB\-H\fR, \fB\-\-report\-histogram\fR
e47a9c
Print out all results (Default: summary only).
e47a9c
.TP
e47a9c
\fB\-U\fR, \fB\-\-report\-unsorted\fR
e47a9c
Print out unsorted results (default sorted).
e47a9c
.RE
e47a9c
.sp
e47a9c
.B BW tests options:
e47a9c
.RS 4
e47a9c
.TP
e47a9c
\fB\-b\fR, \fB\-\-bidirectional\fR
e47a9c
Measure bidirectional bandwidth (default uni).
e47a9c
.TP
e47a9c
\fB\-N\fR, \fB\-\-no\fR
e47a9c
peak\-bw              Cancel peak\-bw calculation (default with peak\-bw)
e47a9c
.TP
e47a9c
\fB\-Q\fR, \fB\-\-cq\-mod\fR
e47a9c
Generate Cqe only after <cq\-mod> completion
e47a9c
.TP
e47a9c
\fB\-t\fR, \fB\-\-tx\-depth=<dep>\fR
e47a9c
Size of tx queue (default: 128).
e47a9c
.TP
e47a9c
\fB\-O\fR, \fB\-\-dualport\fR
e47a9c
Run test in dual\-port mode (2 QPs). both ports must be active (default OFF).
e47a9c
.TP
e47a9c
\fB\-D\fR, \fB\-\-duration=<sec>\fR
e47a9c
Run test for <sec> period of seconds.
e47a9c
.TP
e47a9c
\fB\-f\fR, \fB\-\-margin=<sec>\fR
e47a9c
When in Duration, measure results within margins (default: 2)
e47a9c
.TP
e47a9c
\fB\-l\fR, \fB\-\-post_list=<list_size>\fR
e47a9c
Post list of WQEs of <list size> size (instead of single post).
e47a9c
.TP
e47a9c
\fB\-q\fR, \fB\-\-qp=<num_of_qps>\fR
e47a9c
Num of QPs running in the process (default: 1).
e47a9c
.TP
e47a9c
\fB\-\-run_infinitely \fR
e47a9c
Run test forever\fR, \fBprint results every 5 seconds.
e47a9c
.RE
e47a9c
.sp
e47a9c
.B SEND tests options:
e47a9c
.RS 4
e47a9c
.TP
e47a9c
\fB\-r\fR, \fB\-\-rx\-depth=<dep>\fR
e47a9c
Size of RX queue (default: 512 in BW test).
e47a9c
.TP
e47a9c
\fB\-g\fR, \fB\-\-mcg=<num_of_qps>\fR
e47a9c
Send messages to multicast group with <num_of_qps> qps attached to it.
e47a9c
.TP
e47a9c
\fB\-M\fR, \fB\-\-MGID=<multicast_gid>\fR
e47a9c
In multicast, uses <multicast_gid> as the group MGID.
e47a9c
.RE
e47a9c
.sp
e47a9c
.B Raw Ethernet BW test options:
e47a9c
.RS 4
e47a9c
.TP
e47a9c
\fB\-A\fR, \fB\-\-atomic_type=<type>\fR
e47a9c
type of atomic operation from {CMP_AND_SWAP,FETCH_AND_ADD}.
e47a9c
.TP
e47a9c
\fB\-o\fR, \fB\-\-outs=<num>\fR
e47a9c
Number of outstanding read/atomic requests \- also on READ tests.
e47a9c
.TP
e47a9c
\fB\-B\fR, \fB\-\-source_mac\fR
e47a9c
source MAC address by this format XX:XX:XX:XX:XX:XX (default take the MAC address form GID).
e47a9c
.TP
e47a9c
\fB\-E\fR, \fB\-\-dest_mac\fR
e47a9c
destination MAC address by this format XX:XX:XX:XX:XX:XX **MUST** be entered.
e47a9c
.TP
e47a9c
\fB\-J\fR, \fB\-\-server_ip\fR
e47a9c
server ip address by this format X.X.X.X (using to send packets with IP header).
e47a9c
.TP
e47a9c
\fB\-j\fR, \fB\-\-client_ip\fR
e47a9c
client ip address by this format X.X.X.X (using to send packets with IP header).
e47a9c
.TP
e47a9c
\fB\-K\fR, \fB\-\-server_port\fR
e47a9c
server udp port number (using to send packets with UPD header).
e47a9c
.TP
e47a9c
\fB\-k\fR, \fB\-\-client_port\fR
e47a9c
client udp port number (using to send packets with UDP header).
e47a9c
.TP
e47a9c
\fB\-Z\fR, \fB\-\-server\fR
e47a9c
choose server side for the current machine (\-\-server/\-\-client must be selected ).
e47a9c
.TP
e47a9c
\fB\-P\fR, \fB\-\-client\fR
e47a9c
choose client side for the current machine (\-\-server/\-\-client must be selected).
e47a9c
.RE
e47a9c
.SH ENVIRONMENT
e47a9c
.B Prerequisites:
e47a9c
.RS
e47a9c
kernel 2.6
e47a9c
.RE
e47a9c
.RS
e47a9c
(kernel module) matches libibverbs
e47a9c
.RE
e47a9c
.RS
e47a9c
(kernel module) matches librdmacm
e47a9c
.RE
e47a9c
.RS
e47a9c
(kernel module) matches libibumad
e47a9c
.RE
e47a9c
.RS
e47a9c
(kernel module) matches libmath (lm).
e47a9c
.RE
e47a9c
.SH NOTES
e47a9c
You need to be running a Subnet Manager on the switch or on one of the nodes in your fabric, in case you are in IB fabric.
e47a9c
.SH BUGS
e47a9c
1. Multicast feauture in ib_send_lat and in ib_send_bw still have many problems!
e47a9c
Will increase the support and bug fixes in this Q, but now the tests may stuck
e47a9c
and could produce undefine behaviours.
e47a9c
.sp
e47a9c
2. Bidirectional feature in ib_send_bw test, when running in UD or UC mode.
e47a9c
The algorithm we use for the bidirectional measurement is designed for RC connection type.
e47a9c
When running in UC or UD connection types, there is a small probablity the test will be stuck.
e47a9c
.sp
e47a9c
3. RDMA_CM feature in read tests still doesn't work.
e47a9c
.sp
e47a9c
4. Dual-port support currently works only with ib_write_bw.
e47a9c
.sp
e47a9c
5. Compabilty issues may occur between different versions of perftest.
e47a9c
Please make sure you work with the same version on both sides to ensure
e47a9c
consistency of the test.
e47a9c
.SH AUTHORS
e47a9c
Please post results/observations to the openib-general mailing list.
e47a9c
See "Contact Us" at http://openib.org/mailman/listinfo/openib-general and
e47a9c
http://www.openib.org.