7–RoCE Configuration
Configuring DCQCN
161
AH0054602-00 J
DSCP-PFC
is a feature that allows a receiver to interpret the priority of an
incoming packet for PFC purposes, rather than according to the vLAN
priority or the DSCP field in the IPv4 header. You may use an indirection
table to indicate a specified DSCP value to a vLAN priority value.
DSCP-PFC can work across L2 networks because it is an L3 (IPv4) feature.
Traffic classes
, also known as priority groups, are groups of vLAN priorities
(or DSCP values if DSCP-PFC is used) that can have properties such as
being lossy or lossless. Generally, 0 is used for the default common lossy
traffic group, 3 is used for the FCoE traffic group, and 4 is used for the
iSCSI-TLV traffic group. You may encounter DCB mismatch issues if you
attempt to reuse these numbers on networks that also support FCoE or
iSCSI-TLV traffic. Cavium recommends that you use numbers 1–2 or 5–7 for
RoCE-related traffic groups.
ETS
(enhanced transition services) is an allocation of maximum bandwidth
per traffic class.
DCQCN Overview
Some networking protocols (RoCE, for example) require droplessness. PFC is a
mechanism for achieving droplessness in an L2 network, and DSCP-PFC is a
mechanism for achieving it across distinct L2 networks. However, PFC is deficient
in the following regards:
When activated, PFC completely halts the traffic of the specified priority on
the port, as opposed to reducing transmission rate.
All traffic of the specified priority is affected, even if there is a subset of
specific connections that are causing the congestion.
PFC is a single-hop mechanism. That is, if a receiver experiences
congestion and indicates the congestion through a PFC packet, only the
nearest neighbor will react. When the neighbor experiences congestion
(likely because it can no longer transmit), it also generates its own PFC. This
generation is known as
pause propagation
. Pause propagation may cause
inferior route utilization, because all buffers must congest before the
transmitter is made aware of the problem.
DCQCN addresses all of these disadvantages. The ECN delivers congestion
indication to the reaction point. The reaction point sends a CNP packet to the
transmitter, which reacts by reducing its transmission rate and avoiding the
congestion. DCQCN also specifies how the transmitter attempts to increase its
transmission rate and use bandwidth effectively after congestion ceases. DCQCN
is described in the 2015 SIGCOMM paper,
Congestion Control for Large-Scale
RDMA Deployments
, located here:
http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p523.pdf