Blame SOURCES/ib_atomic_bw.1

592373
.\" Copyright (c) 2014, Jan Chaloupka <jchaloup@redhat.com>
592373
.\"
592373
.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
592373
.\" This is free documentation; you can redistribute it and/or
592373
.\" modify it under the terms of the GNU General Public License as
592373
.\" published by the Free Software Foundation; either version 2 of
592373
.\" the License, or (at your option) any later version.
592373
.\"
592373
.\" The GNU General Public License's references to "object code"
592373
.\" and "executables" are to be interpreted as the output of any
592373
.\" document formatting or typesetting system, including
592373
.\" intermediate and printed output.
592373
.\"
592373
.\" This manual is distributed in the hope that it will be useful,
592373
.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
592373
.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
592373
.\" GNU General Public License for more details.
592373
.\"
592373
.\" You should have received a copy of the GNU General Public
592373
.\" License along with this manual; if not, see
592373
.\" <http://www.gnu.org/licenses/>.
592373
.\" %%%LICENSE_END
592373
.TH "IB_ATOMIC_BW" 1 2014 "Open Fabrics Enterprise Distribution"
592373
.\" IB_ATOMIC_BW
592373
.SH NAME
592373
ib_atomic_bw, ib_atomic_lat, ib_read_bw, ib_read_lat, ib_send_bw,
592373
ib_send_lat, ib_write_bw, ib_write_lat
592373
\- Collection of tests written over uverbs intended for use as a
592373
performance micro-benchmark
592373
.SH SYNOPSIS
592373
.sp
592373
.B ib_atomic_bw [<host>] [options]
592373
.sp
592373
.B ib_atomic_lat [<host>] [options]
592373
.sp
592373
.B ib_read_bw [<host>] [options] 
592373
.sp
592373
.B ib_read_lat [<host>] [options]
592373
.sp
592373
.B ib_write_bw [<host>] [options]
592373
.sp
592373
.B ib_write_lat [<host>] [options]
592373
.SH DESCRIPTION
592373
This is a collection of tests written over uverbs intended for use as a
592373
performance micro-benchmark. As an example, the tests can be used for
592373
HW or SW tuning and/or functional testing.
592373
592373
The collection conatains a set of BW and latency benchmark such as :
592373
.sp
592373
* Read   - ib_read_bw and ib_read_lat.
592373
.sp
592373
* Write  - ib_write_bw and ib_wriet_lat.
592373
.sp
592373
* Send   - ib_send_bw and ib_send_lat.
592373
.sp
592373
* Atomic - ib_atomic_bw and ib_atomic_lat
592373
.sp
592373
* Raw Etherent (when working with MOFED2) - raw_ethernet_bw, raw_ethernet_lat
592373
592373
The benchmark used the CPU cycle counter to get time stamps without context
592373
switch.  Some CPU architectures (e.g., Intel's 80486 or older PPC) do NOT
592373
have such capability.
592373
592373
The latency benchmarks measures round-trip time but reports half of that as
592373
one-way latency.
592373
This means that it may not be sufficiently accurate for asymmetrical
592373
configurations.
592373
592373
On Bw benchmarks, we calculate the BW on send side only, as he calculates
592373
the Bw after collecting completion from the receive side.
592373
In case we use the bidirectional flag , BW is calculated on both sides.
592373
in ib_send_bw, server side also calculate the received throughput.
592373
592373
Min/Median/Max result is reported in latency tests.
592373
The median (vs average) is less sensitive to extreme scores.
592373
Typically, the "Max" value is the first value measured.
592373
592373
Larger samples help marginally only. The default (1000) is pretty good.
592373
Note that an array of cycles_t (typically unsigned long) is allocated
592373
once to collect samples and again to store the difference between them.
592373
Really big sample sizes (e.g., 1 million) might expose other problems
592373
with the program. In this case you can use -N flag (No Peak) to instruct
592373
the test sample only 2 times (begining and end).
592373
592373
All throughput tests now have duration feature as well (-D <seconds to run>)
592373
to instruct the test to run for <seconds to run>.
592373
Another feature added is --run_infinitely, which instruct the test to run
592373
all te time and print throughput every 5 seconds.
592373
592373
The "-H" option (latency) will dump the histogram for additional statistical
592373
analysis.
592373
See xgraph, ygraph, r-base (http://www.r-project.org/), pspp, or other
592373
statistical math programs.
592373
592373
592373
Architectures tested: i686, x86_64, ia64
592373
.SH OPTIONS
592373
The SAME OPTIONS must be passed to both server and client.
592373
592373
If
592373
.I <host>
592373
is not presented, command starts a server and waits for connection.
592373
If it is, command connects to server at
592373
.I <host>.
592373
.sp
592373
.B Common Options:
592373
.RS 4
592373
.TP
592373
\fB\-h\fR, \fB\-\-help\fR
592373
Display this help message screen.
592373
.TP
592373
\fB\-p\fR, \fB\-\-port\fR=\fI<port>\fR
592373
Listen on/connect to port <port> (default: 18515) when exchaning data.
592373
.TP
592373
\fB\-R\fR, \fB\-\-rdma_cm\fR
592373
Connect QPs with rdma_cm and run test on those QPs.
592373
.TP
592373
\fB\-z\fR, \fB\-\-com_rdma_cm\fR
592373
Communicate with rdma_cm module to exchange data \- use regular QPs.
592373
.TP
592373
\fB\-m\fR, \fB\-\-mtu\fR=\fI<mtu>\fR
592373
 QP Mtu size (default: active_mtu from ibv_devinfo).
592373
.TP
592373
\fB\-c\fR, \fB\-\-connection\fR=\fI<RC/UC/UD>\fR
592373
Connection type RC/UC/UD (default RC)
592373
.TP
592373
\fB\-d\fR, \fB\-\-ib\-dev\fR=\fI<dev>\fR
592373
Use IB device <dev> (default: first device found).
592373
.TP
592373
\fB\-i\fR, \fB\-\-ib\-port\fR=\fI<port>\fR
592373
Use port <port> of IB device (default: 1).
592373
.TP
592373
\fB\-s\fR, \fB\-\-size\fR=\fI<size>\fR
592373
Size of message to exchange (default: 1).
592373
.TP
592373
\fB\-a\fR, \fB\-\-all\fR
592373
Run sizes from 2 till 2^23.
592373
.TP
592373
\fB\-n\fR, \fB\-\-iters\fR=\fI<iters>\fR
592373
Number of exchanges (at least 100, default: 1000).
592373
.TP
592373
\fB\-x\fR, \fB\-\-gid\-index\fR=\fI<index>\fR
592373
Test uses GID with GID index taken from command
592373
.TP
592373
\fB\-V\fR, \fB\-\-version\fR
592373
Display version number.
592373
.TP
592373
\fB\-e\fR, \fB\-\-events\fR
592373
Sleep on CQ events (default poll).
592373
.TP
592373
\fB\-F\fR, \fB\-\-CPU\-freq\fR
592373
Do not fail even if cpufreq_ondemand module.
592373
.TP
592373
\fB\-I\fR, \fB\-\-inline_size\fR=\fI<size>\fR
592373
Max size of message to be sent in inline mode.
592373
.TP
592373
\fB\-u\fR, \fB\-\-qp\-timeout\fR=\fI<timeout>\fR
592373
QP timeout, timeout value is 4 usec*2 ^timeout (default: 14).
592373
.TP
592373
\fB\-S\fR, \fB\-\-sl\fR=\fI<sl>\fR
592373
SL \- Service Level (default 0)
592373
.TP
592373
\fB\-r\fR, \fB\-\-rx\-depth\fR=\fI<dep>\fR
592373
Make rx queue bigger than tx (default 600).
592373
.RE
592373
.sp
592373
.B Latenct tests options:
592373
.RS 4
592373
.TP
592373
\fB\-C\fR, \fB\-\-report\-cycles\fR
592373
Report times in cpu cycle units.
592373
.TP
592373
\fB\-H\fR, \fB\-\-report\-histogram\fR
592373
Print out all results (Default: summary only).
592373
.TP
592373
\fB\-U\fR, \fB\-\-report\-unsorted\fR
592373
Print out unsorted results (default sorted).
592373
.RE
592373
.sp
592373
.B BW tests options:
592373
.RS 4
592373
.TP
592373
\fB\-b\fR, \fB\-\-bidirectional\fR
592373
Measure bidirectional bandwidth (default uni).
592373
.TP
592373
\fB\-N\fR, \fB\-\-no\fR
592373
peak\-bw              Cancel peak\-bw calculation (default with peak\-bw)
592373
.TP
592373
\fB\-Q\fR, \fB\-\-cq\-mod\fR
592373
Generate Cqe only after <cq\-mod> completion
592373
.TP
592373
\fB\-t\fR, \fB\-\-tx\-depth=<dep>\fR
592373
Size of tx queue (default: 128).
592373
.TP
592373
\fB\-O\fR, \fB\-\-dualport\fR
592373
Run test in dual\-port mode (2 QPs). both ports must be active (default OFF).
592373
.TP
592373
\fB\-D\fR, \fB\-\-duration=<sec>\fR
592373
Run test for <sec> period of seconds.
592373
.TP
592373
\fB\-f\fR, \fB\-\-margin=<sec>\fR
592373
When in Duration, measure results within margins (default: 2)
592373
.TP
592373
\fB\-l\fR, \fB\-\-post_list=<list_size>\fR
592373
Post list of WQEs of <list size> size (instead of single post).
592373
.TP
592373
\fB\-q\fR, \fB\-\-qp=<num_of_qps>\fR
592373
Num of QPs running in the process (default: 1).
592373
.TP
592373
\fB\-\-run_infinitely \fR
592373
Run test forever\fR, \fBprint results every 5 seconds.
592373
.RE
592373
.sp
592373
.B SEND tests options:
592373
.RS 4
592373
.TP
592373
\fB\-r\fR, \fB\-\-rx\-depth=<dep>\fR
592373
Size of RX queue (default: 512 in BW test).
592373
.TP
592373
\fB\-g\fR, \fB\-\-mcg=<num_of_qps>\fR
592373
Send messages to multicast group with <num_of_qps> qps attached to it.
592373
.TP
592373
\fB\-M\fR, \fB\-\-MGID=<multicast_gid>\fR
592373
In multicast, uses <multicast_gid> as the group MGID.
592373
.RE
592373
.sp
592373
.B Raw Ethernet BW test options:
592373
.RS 4
592373
.TP
592373
\fB\-A\fR, \fB\-\-atomic_type=<type>\fR
592373
type of atomic operation from {CMP_AND_SWAP,FETCH_AND_ADD}.
592373
.TP
592373
\fB\-o\fR, \fB\-\-outs=<num>\fR
592373
Number of outstanding read/atomic requests \- also on READ tests.
592373
.TP
592373
\fB\-B\fR, \fB\-\-source_mac\fR
592373
source MAC address by this format XX:XX:XX:XX:XX:XX (default take the MAC address form GID).
592373
.TP
592373
\fB\-E\fR, \fB\-\-dest_mac\fR
592373
destination MAC address by this format XX:XX:XX:XX:XX:XX **MUST** be entered.
592373
.TP
592373
\fB\-J\fR, \fB\-\-server_ip\fR
592373
server ip address by this format X.X.X.X (using to send packets with IP header).
592373
.TP
592373
\fB\-j\fR, \fB\-\-client_ip\fR
592373
client ip address by this format X.X.X.X (using to send packets with IP header).
592373
.TP
592373
\fB\-K\fR, \fB\-\-server_port\fR
592373
server udp port number (using to send packets with UPD header).
592373
.TP
592373
\fB\-k\fR, \fB\-\-client_port\fR
592373
client udp port number (using to send packets with UDP header).
592373
.TP
592373
\fB\-Z\fR, \fB\-\-server\fR
592373
choose server side for the current machine (\-\-server/\-\-client must be selected ).
592373
.TP
592373
\fB\-P\fR, \fB\-\-client\fR
592373
choose client side for the current machine (\-\-server/\-\-client must be selected).
592373
.RE
592373
.SH ENVIRONMENT
592373
.B Prerequisites:
592373
.RS
592373
kernel 2.6
592373
.RE
592373
.RS
592373
(kernel module) matches libibverbs
592373
.RE
592373
.RS
592373
(kernel module) matches librdmacm
592373
.RE
592373
.RS
592373
(kernel module) matches libibumad
592373
.RE
592373
.RS
592373
(kernel module) matches libmath (lm).
592373
.RE
592373
.SH NOTES
592373
You need to be running a Subnet Manager on the switch or on one of the nodes in your fabric, in case you are in IB fabric.
592373
.SH BUGS
592373
1. Multicast feauture in ib_send_lat and in ib_send_bw still have many problems!
592373
Will increase the support and bug fixes in this Q, but now the tests may stuck
592373
and could produce undefine behaviours.
592373
.sp
592373
2. Bidirectional feature in ib_send_bw test, when running in UD or UC mode.
592373
The algorithm we use for the bidirectional measurement is designed for RC connection type.
592373
When running in UC or UD connection types, there is a small probablity the test will be stuck.
592373
.sp
592373
3. RDMA_CM feature in read tests still doesn't work.
592373
.sp
592373
4. Dual-port support currently works only with ib_write_bw.
592373
.sp
592373
5. Compabilty issues may occur between different versions of perftest.
592373
Please make sure you work with the same version on both sides to ensure
592373
consistency of the test.
592373
.SH AUTHORS
592373
Please post results/observations to the openib-general mailing list.
592373
See "Contact Us" at http://openib.org/mailman/listinfo/openib-general and
592373
http://www.openib.org.