From ad7e94b35988c8cd03866d47aa6fb21841cfae7c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Nikola=20Forr=C3=B3?= Date: Tue, 28 Mar 2017 15:04:36 +0200 Subject: [PATCH 6/6] packet.7: add missing socket options --- man-pages/man7/packet.7 | 218 ++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 209 insertions(+), 9 deletions(-) diff --git a/man-pages/man7/packet.7 b/man-pages/man7/packet.7 index f5d990b..b217e5e 100644 --- a/man-pages/man7/packet.7 +++ b/man-pages/man7/packet.7 @@ -177,19 +177,24 @@ and .I sll_ifindex are used. .SS Socket options +Packet socket options are configured by calling +.BR setsockopt (2) +with level +.BR SOL_PACKET . +.TP +.BR PACKET_ADD_MEMBERSHIP +.PD 0 +.TP +.BR PACKET_DROP_MEMBERSHIP +.PD Packet sockets can be used to configure physical layer multicasting and promiscuous mode. -It works by calling -.BR setsockopt (2) -on a packet socket for -.B SOL_PACKET -and one of the options .B PACKET_ADD_MEMBERSHIP -to add a binding or +adds a binding and .B PACKET_DROP_MEMBERSHIP -to drop it. +drops it. They both expect a -.B packet_mreq +.I packet_mreq structure as argument: .in +4n @@ -222,11 +227,206 @@ and sets the socket up to receive all multicast packets arriving at the interface. -In addition the traditional ioctls +In addition, the traditional ioctls .BR SIOCSIFFLAGS , .BR SIOCADDMULTI , .B SIOCDELMULTI can be used for the same purpose. +.TP +.BR PACKET_AUXDATA " (since Linux 2.6.21)" +.\" commit 8dc4194474159660d7f37c495e3fc3f10d0db8cc +If this binary option is enabled, the packet socket passes a metadata +structure along with each packet in the +.BR recvmsg (2) +control field. +The structure can be read with +.BR cmsg (3). +It is defined as + +.in +4n +.nf +struct tpacket_auxdata { + __u32 tp_status; + __u32 tp_len; /* packet length */ + __u32 tp_snaplen; /* captured length */ + __u16 tp_mac; + __u16 tp_net; + __u16 tp_vlan_tci; + __u16 tp_padding; +}; +.fi +.in +.TP +.BR PACKET_FANOUT " (since Linux 3.1)" +.\" commit dc99f600698dcac69b8f56dda9a8a00d645c5ffc +To scale processing across threads, packet sockets can form a fanout +group. +In this mode, each matching packet is enqueued onto only one +socket in the group. +A socket joins a fanout group by calling +.BR setsockopt (2) +with level +.B SOL_PACKET +and option +.BR PACKET_FANOUT . +Each network namespace can have up to 65536 independent groups. +A socket selects a group by encoding the ID in the first 16 bits of +the integer option value. +The first packet socket to join a group implicitly creates it. +To successfully join an existing group, subsequent packet sockets +must have the same protocol, device settings, fanout mode and +flags (see below). +Packet sockets can leave a fanout group only by closing the socket. +The group is deleted when the last socket is closed. + +Fanout supports multiple algorithms to spread traffic between sockets. +The default mode, +.BR PACKET_FANOUT_HASH , +sends packets from the same flow to the same socket to maintain +per-flow ordering. +For each packet, it chooses a socket by taking the packet flow hash +modulo the number of sockets in the group, where a flow hash is a hash +over network-layer address and optional transport-layer port fields. +The load-balance mode +.BR PACKET_FANOUT_LB +implements a round-robin algorithm. +.BR PACKET_FANOUT_CPU +selects the socket based on the CPU that the packet arrived on. +.BR PACKET_FANOUT_ROLLOVER +processes all data on a single socket, moves to the next when one +becomes backlogged. +.BR PACKET_FANOUT_RND +selects the socket using a pseudo-random number generator. + +Fanout modes can take additional options. +IP fragmentation causes packets from the same flow to have different +flow hashes. +The flag +.BR PACKET_FANOUT_FLAG_DEFRAG , +if set, causes packet to be defragmented before fanout is applied, to +preserve order even in this case. +Fanout mode and options are communicated in the second 16 bits of the +integer option value. +The flag +.BR PACKET_FANOUT_FLAG_ROLLOVER +enables the roll over mechanism as a backup strategy: if the +original fanout algorithm selects a backlogged socket, the packet +rolls over to the next available one. +.TP +.BR PACKET_LOSS " (with " PACKET_TX_RING ) +If set, do not silently drop a packet on transmission error, but +return it with status set to +.BR TP_STATUS_WRONG_FORMAT . +.TP +.BR PACKET_RESERVE " (with " PACKET_RX_RING ) +By default, a packet receive ring writes packets immediately following the +metadata structure and alignment padding. +This integer option reserves additional headroom. +.TP +.BR PACKET_RX_RING +Create a memory-mapped ring buffer for asynchronous packet reception. +The packet socket reserves a contiguous region of application address +space, lays it out into an array of packet slots and copies packets +(up to +.IR tp_snaplen +) into subsequent slots. +Each packet is preceded by a metadata structure similar to +.IR tpacket_auxdata . +The protocol fields encode the offset to the data +from the start of the metadata header. +.I tp_net +stores the offset to the network layer. +If the packet socket is of type +.BR SOCK_DGRAM , +then +.I tp_mac +is the same. +If it is of type +.BR SOCK_RAW , +then that field stores the offset to the link-layer frame. +Packet socket and application communicate the head and tail of the ring +through the +.I tp_status +field. +The packet socket owns all slots with status +.BR TP_STATUS_KERNEL . +After filling a slot, it changes the status of the slot to transfer +ownership to the application. +During normal operation, the new status is +.BR TP_STATUS_USER , +to signal that a correctly received packet has been stored. +When the application has finished processing a packet, it transfers +ownership of the slot back to the socket by setting the status to +.BR TP_STATUS_KERNEL . +Packet sockets implement multiple variants of the packet ring. +The implementation details are described in +.IR Documentation/networking/packet_mmap.txt +in the Linux kernel source tree. +.TP +.BR PACKET_STATISTICS +Retrieve packet socket statistics in the form of a structure + +.in +4n +.nf +struct tpacket_stats { + unsigned int tp_packets; /* Total packet count */ + unsigned int tp_drops; /* Dropped packet count */ +}; +.fi +.in + +Receiving statistics resets the internal counters. +The statistics structure differs when using a ring of variant +.BR TPACKET_V3 . +.TP +.BR PACKET_TIMESTAMP " (with " PACKET_RX_RING "; since Linux 2.6.36)" +.\" commit 614f60fa9d73a9e8fdff3df83381907fea7c5649 +The packet receive ring always stores a timestamp in the metadata header. +By default, this is a software generated timestamp generated when the +packet is copied into the ring. +This integer option selects the type of timestamp. +Besides the default, it support the two hardware formats described in +.IR Documentation/networking/timestamping.txt +in the Linux kernel source tree. +.TP +.BR PACKET_TX_RING " (since Linux 2.6.31)" +.\" commit 69e3c75f4d541a6eb151b3ef91f34033cb3ad6e1 +Create a memory-mapped ring buffer for packet transmission. +This option is similar to +.BR PACKET_RX_RING +and takes the same arguments. +The application writes packets into slots with status +.BR TP_STATUS_AVAILABLE +and schedules them for transmission by changing the status to +.BR TP_STATUS_SEND_REQUEST . +When packets are ready to be transmitted, the application calls +.BR send (2) +or a variant thereof. +The +.I buf +and +.I len +fields of this call are ignored. +If an address is passed using +.BR sendto (2) +or +.BR sendmsg (2) , +then that overrides the socket default. +On successful transmission, the socket resets the slot to +.BR TP_STATUS_AVAILABLE . +It discards packets silently on error unless +.BR PACKET_LOSS +is set. +.TP +.BR PACKET_VERSION " (with " PACKET_RX_RING "; since Linux 2.6.27)" +.\" commit bbd6ef87c544d88c30e4b762b1b61ef267a7d279 +By default, +.BR PACKET_RX_RING +creates a packet receive ring of variant +.BR TPACKET_V1 . +To create another variant, configure the desired variant by setting this +integer option before creating the ring. + .SS Ioctls .B SIOCGSTAMP can be used to receive the timestamp of the last received packet. -- 2.7.4