|
|
0ff89e |
.\" Copyright (c) 2014, Jan Chaloupka <jchaloup@redhat.com>
|
|
|
0ff89e |
.\"
|
|
|
0ff89e |
.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
|
|
|
0ff89e |
.\" This is free documentation; you can redistribute it and/or
|
|
|
0ff89e |
.\" modify it under the terms of the GNU General Public License as
|
|
|
0ff89e |
.\" published by the Free Software Foundation; either version 2 of
|
|
|
0ff89e |
.\" the License, or (at your option) any later version.
|
|
|
0ff89e |
.\"
|
|
|
0ff89e |
.\" The GNU General Public License's references to "object code"
|
|
|
0ff89e |
.\" and "executables" are to be interpreted as the output of any
|
|
|
0ff89e |
.\" document formatting or typesetting system, including
|
|
|
0ff89e |
.\" intermediate and printed output.
|
|
|
0ff89e |
.\"
|
|
|
0ff89e |
.\" This manual is distributed in the hope that it will be useful,
|
|
|
0ff89e |
.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
0ff89e |
.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
0ff89e |
.\" GNU General Public License for more details.
|
|
|
0ff89e |
.\"
|
|
|
0ff89e |
.\" You should have received a copy of the GNU General Public
|
|
|
0ff89e |
.\" License along with this manual; if not, see
|
|
|
0ff89e |
.\" <http://www.gnu.org/licenses/>.
|
|
|
0ff89e |
.\" %%%LICENSE_END
|
|
|
0ff89e |
.TH "IB_ATOMIC_BW" 1 2014 "Open Fabrics Enterprise Distribution"
|
|
|
0ff89e |
.\" IB_ATOMIC_BW
|
|
|
0ff89e |
.SH NAME
|
|
|
0ff89e |
ib_atomic_bw, ib_atomic_lat, ib_read_bw, ib_read_lat, ib_send_bw,
|
|
|
0ff89e |
ib_send_lat, ib_write_bw, ib_write_lat
|
|
|
0ff89e |
\- Collection of tests written over uverbs intended for use as a
|
|
|
0ff89e |
performance micro-benchmark
|
|
|
0ff89e |
.SH SYNOPSIS
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
.B ib_atomic_bw [<host>] [options]
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
.B ib_atomic_lat [<host>] [options]
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
.B ib_read_bw [<host>] [options]
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
.B ib_read_lat [<host>] [options]
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
.B ib_write_bw [<host>] [options]
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
.B ib_write_lat [<host>] [options]
|
|
|
0ff89e |
.SH DESCRIPTION
|
|
|
0ff89e |
This is a collection of tests written over uverbs intended for use as a
|
|
|
0ff89e |
performance micro-benchmark. As an example, the tests can be used for
|
|
|
0ff89e |
HW or SW tuning and/or functional testing.
|
|
|
0ff89e |
|
|
|
0ff89e |
The collection conatains a set of BW and latency benchmark such as :
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
* Read - ib_read_bw and ib_read_lat.
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
* Write - ib_write_bw and ib_wriet_lat.
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
* Send - ib_send_bw and ib_send_lat.
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
* Atomic - ib_atomic_bw and ib_atomic_lat
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
* Raw Etherent (when working with MOFED2) - raw_ethernet_bw, raw_ethernet_lat
|
|
|
0ff89e |
|
|
|
0ff89e |
The benchmark used the CPU cycle counter to get time stamps without context
|
|
|
0ff89e |
switch. Some CPU architectures (e.g., Intel's 80486 or older PPC) do NOT
|
|
|
0ff89e |
have such capability.
|
|
|
0ff89e |
|
|
|
0ff89e |
The latency benchmarks measures round-trip time but reports half of that as
|
|
|
0ff89e |
one-way latency.
|
|
|
0ff89e |
This means that it may not be sufficiently accurate for asymmetrical
|
|
|
0ff89e |
configurations.
|
|
|
0ff89e |
|
|
|
0ff89e |
On Bw benchmarks, we calculate the BW on send side only, as he calculates
|
|
|
0ff89e |
the Bw after collecting completion from the receive side.
|
|
|
0ff89e |
In case we use the bidirectional flag , BW is calculated on both sides.
|
|
|
0ff89e |
in ib_send_bw, server side also calculate the received throughput.
|
|
|
0ff89e |
|
|
|
0ff89e |
Min/Median/Max result is reported in latency tests.
|
|
|
0ff89e |
The median (vs average) is less sensitive to extreme scores.
|
|
|
0ff89e |
Typically, the "Max" value is the first value measured.
|
|
|
0ff89e |
|
|
|
0ff89e |
Larger samples help marginally only. The default (1000) is pretty good.
|
|
|
0ff89e |
Note that an array of cycles_t (typically unsigned long) is allocated
|
|
|
0ff89e |
once to collect samples and again to store the difference between them.
|
|
|
0ff89e |
Really big sample sizes (e.g., 1 million) might expose other problems
|
|
|
0ff89e |
with the program. In this case you can use -N flag (No Peak) to instruct
|
|
|
0ff89e |
the test sample only 2 times (begining and end).
|
|
|
0ff89e |
|
|
|
0ff89e |
All throughput tests now have duration feature as well (-D <seconds to run>)
|
|
|
0ff89e |
to instruct the test to run for <seconds to run>.
|
|
|
0ff89e |
Another feature added is --run_infinitely, which instruct the test to run
|
|
|
0ff89e |
all te time and print throughput every 5 seconds.
|
|
|
0ff89e |
|
|
|
0ff89e |
The "-H" option (latency) will dump the histogram for additional statistical
|
|
|
0ff89e |
analysis.
|
|
|
0ff89e |
See xgraph, ygraph, r-base (http://www.r-project.org/), pspp, or other
|
|
|
0ff89e |
statistical math programs.
|
|
|
0ff89e |
|
|
|
0ff89e |
|
|
|
0ff89e |
Architectures tested: i686, x86_64, ia64
|
|
|
0ff89e |
.SH OPTIONS
|
|
|
0ff89e |
The SAME OPTIONS must be passed to both server and client.
|
|
|
0ff89e |
|
|
|
0ff89e |
If
|
|
|
0ff89e |
.I <host>
|
|
|
0ff89e |
is not presented, command starts a server and waits for connection.
|
|
|
0ff89e |
If it is, command connects to server at
|
|
|
0ff89e |
.I <host>.
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
.B Common Options:
|
|
|
0ff89e |
.RS 4
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-h\fR, \fB\-\-help\fR
|
|
|
0ff89e |
Display this help message screen.
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-p\fR, \fB\-\-port\fR=\fI<port>\fR
|
|
|
0ff89e |
Listen on/connect to port <port> (default: 18515) when exchaning data.
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-R\fR, \fB\-\-rdma_cm\fR
|
|
|
0ff89e |
Connect QPs with rdma_cm and run test on those QPs.
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-z\fR, \fB\-\-com_rdma_cm\fR
|
|
|
0ff89e |
Communicate with rdma_cm module to exchange data \- use regular QPs.
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-m\fR, \fB\-\-mtu\fR=\fI<mtu>\fR
|
|
|
0ff89e |
QP Mtu size (default: active_mtu from ibv_devinfo).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-c\fR, \fB\-\-connection\fR=\fI<RC/UC/UD>\fR
|
|
|
0ff89e |
Connection type RC/UC/UD (default RC)
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-d\fR, \fB\-\-ib\-dev\fR=\fI<dev>\fR
|
|
|
0ff89e |
Use IB device <dev> (default: first device found).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-i\fR, \fB\-\-ib\-port\fR=\fI<port>\fR
|
|
|
0ff89e |
Use port <port> of IB device (default: 1).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-s\fR, \fB\-\-size\fR=\fI<size>\fR
|
|
|
0ff89e |
Size of message to exchange (default: 1).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-a\fR, \fB\-\-all\fR
|
|
|
0ff89e |
Run sizes from 2 till 2^23.
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-n\fR, \fB\-\-iters\fR=\fI<iters>\fR
|
|
|
0ff89e |
Number of exchanges (at least 100, default: 1000).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-x\fR, \fB\-\-gid\-index\fR=\fI<index>\fR
|
|
|
0ff89e |
Test uses GID with GID index taken from command
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-V\fR, \fB\-\-version\fR
|
|
|
0ff89e |
Display version number.
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-e\fR, \fB\-\-events\fR
|
|
|
0ff89e |
Sleep on CQ events (default poll).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-F\fR, \fB\-\-CPU\-freq\fR
|
|
|
0ff89e |
Do not fail even if cpufreq_ondemand module.
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-I\fR, \fB\-\-inline_size\fR=\fI<size>\fR
|
|
|
0ff89e |
Max size of message to be sent in inline mode.
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-u\fR, \fB\-\-qp\-timeout\fR=\fI<timeout>\fR
|
|
|
0ff89e |
QP timeout, timeout value is 4 usec*2 ^timeout (default: 14).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-S\fR, \fB\-\-sl\fR=\fI<sl>\fR
|
|
|
0ff89e |
SL \- Service Level (default 0)
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-r\fR, \fB\-\-rx\-depth\fR=\fI<dep>\fR
|
|
|
0ff89e |
Make rx queue bigger than tx (default 600).
|
|
|
0ff89e |
.RE
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
.B Latenct tests options:
|
|
|
0ff89e |
.RS 4
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-C\fR, \fB\-\-report\-cycles\fR
|
|
|
0ff89e |
Report times in cpu cycle units.
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-H\fR, \fB\-\-report\-histogram\fR
|
|
|
0ff89e |
Print out all results (Default: summary only).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-U\fR, \fB\-\-report\-unsorted\fR
|
|
|
0ff89e |
Print out unsorted results (default sorted).
|
|
|
0ff89e |
.RE
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
.B BW tests options:
|
|
|
0ff89e |
.RS 4
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-b\fR, \fB\-\-bidirectional\fR
|
|
|
0ff89e |
Measure bidirectional bandwidth (default uni).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-N\fR, \fB\-\-no\fR
|
|
|
0ff89e |
peak\-bw Cancel peak\-bw calculation (default with peak\-bw)
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-Q\fR, \fB\-\-cq\-mod\fR
|
|
|
0ff89e |
Generate Cqe only after <cq\-mod> completion
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-t\fR, \fB\-\-tx\-depth=<dep>\fR
|
|
|
0ff89e |
Size of tx queue (default: 128).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-O\fR, \fB\-\-dualport\fR
|
|
|
0ff89e |
Run test in dual\-port mode (2 QPs). both ports must be active (default OFF).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-D\fR, \fB\-\-duration=<sec>\fR
|
|
|
0ff89e |
Run test for <sec> period of seconds.
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-f\fR, \fB\-\-margin=<sec>\fR
|
|
|
0ff89e |
When in Duration, measure results within margins (default: 2)
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-l\fR, \fB\-\-post_list=<list_size>\fR
|
|
|
0ff89e |
Post list of WQEs of <list size> size (instead of single post).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-q\fR, \fB\-\-qp=<num_of_qps>\fR
|
|
|
0ff89e |
Num of QPs running in the process (default: 1).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-\-run_infinitely \fR
|
|
|
0ff89e |
Run test forever\fR, \fBprint results every 5 seconds.
|
|
|
0ff89e |
.RE
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
.B SEND tests options:
|
|
|
0ff89e |
.RS 4
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-r\fR, \fB\-\-rx\-depth=<dep>\fR
|
|
|
0ff89e |
Size of RX queue (default: 512 in BW test).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-g\fR, \fB\-\-mcg=<num_of_qps>\fR
|
|
|
0ff89e |
Send messages to multicast group with <num_of_qps> qps attached to it.
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-M\fR, \fB\-\-MGID=<multicast_gid>\fR
|
|
|
0ff89e |
In multicast, uses <multicast_gid> as the group MGID.
|
|
|
0ff89e |
.RE
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
.B Raw Ethernet BW test options:
|
|
|
0ff89e |
.RS 4
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-A\fR, \fB\-\-atomic_type=<type>\fR
|
|
|
0ff89e |
type of atomic operation from {CMP_AND_SWAP,FETCH_AND_ADD}.
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-o\fR, \fB\-\-outs=<num>\fR
|
|
|
0ff89e |
Number of outstanding read/atomic requests \- also on READ tests.
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-B\fR, \fB\-\-source_mac\fR
|
|
|
0ff89e |
source MAC address by this format XX:XX:XX:XX:XX:XX (default take the MAC address form GID).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-E\fR, \fB\-\-dest_mac\fR
|
|
|
0ff89e |
destination MAC address by this format XX:XX:XX:XX:XX:XX **MUST** be entered.
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-J\fR, \fB\-\-server_ip\fR
|
|
|
0ff89e |
server ip address by this format X.X.X.X (using to send packets with IP header).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-j\fR, \fB\-\-client_ip\fR
|
|
|
0ff89e |
client ip address by this format X.X.X.X (using to send packets with IP header).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-K\fR, \fB\-\-server_port\fR
|
|
|
0ff89e |
server udp port number (using to send packets with UPD header).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-k\fR, \fB\-\-client_port\fR
|
|
|
0ff89e |
client udp port number (using to send packets with UDP header).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-Z\fR, \fB\-\-server\fR
|
|
|
0ff89e |
choose server side for the current machine (\-\-server/\-\-client must be selected ).
|
|
|
0ff89e |
.TP
|
|
|
0ff89e |
\fB\-P\fR, \fB\-\-client\fR
|
|
|
0ff89e |
choose client side for the current machine (\-\-server/\-\-client must be selected).
|
|
|
0ff89e |
.RE
|
|
|
0ff89e |
.SH ENVIRONMENT
|
|
|
0ff89e |
.B Prerequisites:
|
|
|
0ff89e |
.RS
|
|
|
0ff89e |
kernel 2.6
|
|
|
0ff89e |
.RE
|
|
|
0ff89e |
.RS
|
|
|
0ff89e |
(kernel module) matches libibverbs
|
|
|
0ff89e |
.RE
|
|
|
0ff89e |
.RS
|
|
|
0ff89e |
(kernel module) matches librdmacm
|
|
|
0ff89e |
.RE
|
|
|
0ff89e |
.RS
|
|
|
0ff89e |
(kernel module) matches libibumad
|
|
|
0ff89e |
.RE
|
|
|
0ff89e |
.RS
|
|
|
0ff89e |
(kernel module) matches libmath (lm).
|
|
|
0ff89e |
.RE
|
|
|
0ff89e |
.SH NOTES
|
|
|
0ff89e |
You need to be running a Subnet Manager on the switch or on one of the nodes in your fabric, in case you are in IB fabric.
|
|
|
0ff89e |
.SH BUGS
|
|
|
0ff89e |
1. Multicast feauture in ib_send_lat and in ib_send_bw still have many problems!
|
|
|
0ff89e |
Will increase the support and bug fixes in this Q, but now the tests may stuck
|
|
|
0ff89e |
and could produce undefine behaviours.
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
2. Bidirectional feature in ib_send_bw test, when running in UD or UC mode.
|
|
|
0ff89e |
The algorithm we use for the bidirectional measurement is designed for RC connection type.
|
|
|
0ff89e |
When running in UC or UD connection types, there is a small probablity the test will be stuck.
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
3. RDMA_CM feature in read tests still doesn't work.
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
4. Dual-port support currently works only with ib_write_bw.
|
|
|
0ff89e |
.sp
|
|
|
0ff89e |
5. Compabilty issues may occur between different versions of perftest.
|
|
|
0ff89e |
Please make sure you work with the same version on both sides to ensure
|
|
|
0ff89e |
consistency of the test.
|
|
|
0ff89e |
.SH AUTHORS
|
|
|
0ff89e |
Please post results/observations to the openib-general mailing list.
|
|
|
0ff89e |
See "Contact Us" at http://openib.org/mailman/listinfo/openib-general and
|
|
|
0ff89e |
http://www.openib.org.
|