USB 3.0 transfers, bursts and short packets

Published: 19 July 2019

Introduction

This page discusses USB SuperSpeed Transfers, Bursts and Short Packets, in an attempt to shed light on the relationships between these, and higher level terms, such as the xHCI Transfer Descriptor, Microsoft Windows’ IRP and Linux’ URB.

This is not an attempt to cover all aspects, so the respective standards should be referred to for the exact details (in particular the USB 3.0 and xHCI specifications).

SuperSpeed Streams are discussed on another page.

Transfers

The USB data flow is not based upon a continuous stream (like general file I/O and TCP/IP), not even regarding bulk endpoints: Chunks of data are transmitted in either direction in what is called “transfers”. Those who have written software for a USB device (a kernel driver or e.g. a libusb based user-space program) are familiar with this term: It's some kind of work item, that is executed by data being sent to or from the USB device.

Curiously, the USB 3.0 doesn't define this term accurately. Rather, it refers to the USB 2.0 spec (see citations at the bottom of this page), indicating that the same semantics are retained. The USB 2.0 spec relies on a Microsoft Windows term, IRP, in relation to transfers. The IRP is a Windows kernel data structure, that is created for system calls requesting I/O operations. Among others, it contains the address to the data buffer and the requested data length. In Linux, there is a parallel data structure for USB requests, called a URB.

The USB 2.0 spec makes a 1:1 connection between a transfer and an IRP on the operation system, in the context of defining the response to short packets arriving from the device. As just mentioned, the USB 3.0 spec merely references to this relationship.

A more appealing definition for a transfer relates to the “Transfer Descriptor” (TD), defined in the xHCI specification as the data structure used by the low-level software to request USB operations from the USB controller. These TDs are queued by the software (the xHCI driver) for each endpoint, each TD containing, among others, the number of bytes required for transmission (”TD Transfer Size”). The fulfillment of such TD is hence the execution of a USB transfer.

The length of a transfer

For an OUT (host-to-device) transfer, the number of bytes actually transferred is controlled by the host, and is hence always equal to the number required (unless something kind of error occurs). However for an IN (device-to-host) transfer, the exact amount of data is controlled by the device: It may signal the end of transfer by issuing a short packet (i.e. with less than 1024 bytes for USB 3.0). This is not considered an error by neither the USB 3.0 spec nor the eHCI spec (see for example eHCI 1.2 spec section 4.10.1.1.1), but just an early termination of the transaction with less data than expected.

If the number of bytes expected in an IN transaction isn’t a multiple of 1024, the last packet must be less than full (i.e. a short packet is imminent). The host’s USB 3.0 controller has no way to inform the device how many bytes it expects, but only limit the number of packets it’s allowed to send at any given moment. Therefore, the device may very well fill the last packet beyond what was expected by the host, causing a “Babble Error” per eHCI spec 4.10.2.4. It’s therefore generally wise to set the length in a TD to a multiple of 1024, even if less data is actually required, unless the device is guaranteed to produce the exactly correct amount of data.

SuperSpeed bursts

The major difference between USB 3.0 and earlier USB revisions is that there are separate physical wires for data in each direction, so packets can be transmitted in both directions simultaneously. In particular, acknowledgment packets can be transmitted at the same time as data continues to be transmitted in the opposite direction.

In order to facilitate the physical bandwidth efficiently, bursts of packets are allowed in USB 3.0. This allows more than one data packet for an endpoint to be transmitted before it’s acknowledged. The mechanism is implemented with the NumP field in ERDY and ACK packets, and with the EOB flag in Data Header Packets.

The essence of the NumP field is a permission to transmit data packets without receiving an ACK for the previous ones. It indicates the number of packets that the receiver of data packets is ready to immediately accept at the time that the ERDY or ACK packet that carries it is dispatched. NumP can be zero to temporarily stop the packet flow.

In more detail:

The ACK packet’s NumP field is the number of data packets, for the relevant endpoint, that the receiver is ready to receive after the data packet that the said ACK packet acknowledges.
ERDY is sent only by devices to indicate a readiness to receive data packets, as part of the flow control (see below). The meaning of its NumP field is the same as for an ACK packet, but ERDY doesn’t acknowledge anything.

Note that the NumP fields mentioned here are carried in ACK / ERDY packets sent by the side receiving the data packets. The data packet sender’s say on bursts is embodied in the EOB flag in data packet headers. When this flag is set, the sender indicates it has no more data packets to send at that moment, and also humbly asks the host not to request more data from that endpoint until further notice (i.e. with an ERDY packet, see below).

Also note that when interpreting an ACK packet, its receiver must take the data packets it has sent since that packet into account, as the NumP field in the ACK packet relates to the data packet it acknowledges, and not the situation at the transmitting side.

It’s crucial to note that there’s no necessary connection between bursts and transfers. The termination of a burst (i.e. a flow control condition) doesn’t indicate an end of transfer, and an end of transfer doesn’t necessarily end a burst. The only point where they are related is when the device sends a short packet, after which it’s not allowed to continue sending data packets until receiving an ACK packet for that last short packet with a non-zero NumP. The rationale is that the device can’t know if there’s an outstanding TD for that endpoint after the transfer it has just terminated, hence it needs to get permission from the host’s USB controller to continue.

But even though a short packet is de facto the last packet it a burst, it doesn’t put the endpoint in a flow control condition, unless that last packet had the EOB flag set. In particular, a burst of an OUT endpoint may continue immediately after a short packet if the host has another TD queued.

How long can a burst last?

... or: Can a bulk endpoint hijack the bus forever?

Even though NumP can’t possibly exceed the number of 16 (and is often limited even further), a burst may be significantly longer. In fact, there might be an everlasting flow of data packets, if certain conditions are met.

This might appear to contradict the definition of bMaxBurst (in the SuperSpeed Endpoint Companion Descriptor), given in section 9.6.7, Table 9-22 of the USB 3.0 specification: 'The maximum number of packets the endpoint can send or receive as part of a burst', where this quantity is set to bMaxBurst plus one.

But the same spec also says in section 4.4.1: "Each endpoint on a SuperSpeed device indicates the number of packets that it can send/receive (called the maximum data burst size) before it has to wait for an explicit handshake". Section 8.10.2 also adds "The number of packets an endpoint on a device can send or receive at a time without an intermediate acknowledgement packet is reported by the device in the endpoint companion descriptor (refer to Section 9.6.7) for that endpoint.

So does bMaxBurst restrict the total length of the burst, or just the number of unacknowledged packets? The xHCI specification adds to this confusion in section 4.14.4.1 of its revision 1.2: ("Enhanced SuperSpeed Burst Transactions"), saying:

"The USB3 Specification, section 8.10.2 defines bMaxBurst as 'The number of packets an endpoint on a device can send or receive at a time without an intermediate acknowledgement packet'.

For an Enhanced SuperSpeed bulk endpoint, the xHC shall use Max Burst Size (which is set to bMaxBurst, refer to section 6.2.3.4) to determine the maximum number of outstanding acknowledgement packets that are allowed for an endpoint. It may also use Max Burst Size to identify the number of packets the endpoint should send or receive in a Service Opportunity. If more than one async endpoint has data to move, the xHC should advance to the next endpoint when Max Burst Size packets have been moved for an endpoint. However if there is only one endpoint with data to move in the async Pipe Schedule, then the xHC may exceed Max Burst Size packets to an endpoint and stream packets to/from the endpoint until either the Transfer Ring is exhausted or the device terminates the burst by asserting NumP = 0 (OUT pipe ACK TP) or EOB = ‘1’ (IN pipe DP), or flow controls the pipe by returning an NRDY TP."

A few clarifications on this clause:

The short summary: If there's no other candidate for the bus bandwidth, it's OK, but not required, to go ahead with the same burst until the data is exhausted.
If there is another endpoint qualifying for bus bandwidth, the recommended practice is to limit the the burst to the Max Burst Size. Note that the word "should" is used, and not "shall". "Should" equals "is recommended that" in this spec, according to its own section 1.8.6.
"Enhanced SuperSpeed" is a collective term for USB 3.x. Even though the term doesn't appear in the USB 3.0 spec, this term is coined in later specs, and includes USB 3.0.
A "Service Opportunity" is a block of time that the xHC allocates for moving packets on USB, for a specific endpoint.
The bold emphasis above is not present in the original document.
The citation of section 8.10.2 is not verbally accurate, and probably refers to a draft of the USB specification. But it's correct in its spirit.

So far, it seems like a burst can go on forever if both software and the endpoint keep feeding and consuming data, even if there are other bulk endpoints waiting for their turn. And it also seems that given the recommendations cited above, odds are that the hardware USB controller will limit the burst length, at least when there are other endpoints waiting. But this can't be relied upon.

However there's another limitation, which is defined in the xHCI spec's section 4.14.4:

"If there is more than one endpoint in the async schedule the xHC shall limit the number of packets transferred during a Service Opportunity (SO) to MSOPC. However, if only one endpoint is in the async schedule, the xHC may exceed the default MSOPC and continuously stream packets to an endpoint."

MSOPC is defined in section 4.14.1:

"The Max Service Opportunity Packet Count (MSOPC) is the maximum number of DPs that the xHC shall schedule during one Service Opportunity (SO). The MSOPC value for an endpoint is set by the number of packets defined by the Endpoint Context fields; (Max Burst Size +1) (Mult + 1)."

"Mult" brings us back to USB 3.0 spec's table 9-22, where this parameter is defined along with bMaxBurst, which also states that the "maximum number of packets within a service interval that this endpoint supports" is:

Maximum number of packets = (bMaxBurst + 1) x (Mult + 1)

But that doesn't apply for bulk endpoints, for which Mult isn't defined at all. So it's more likely that the MSOPC for a bulk endpoint is based upon the Mult given in the Endpoint context, as defined in the xHCI spec, section 6.2.3. Among others, it says that Mult = 0 for any endpoint except isochronous (Table 6-8).

With this latter limitation, the conclusion is that the Maximum Data Burst Size limits the actual number of packets in a burst after all, and not the number of unacknowledged packets. But if this is the case, what did the xHCI specification mean with its section 4.14.4.1?

Bottom line: Based on the citations above and other indications in the xHCI spec, the intended meaning is that the number of packets in a burst may exceed bMaxBurst + 1 only if there are no other bulk (or status) endpoints requesting bandwidth. So a proper xHCI controller may issue long bursts, but not allow an endpoint to hijack the bus. Section 4.14.4.1 seems to be slightly inaccurate with respect to the original intention.

However as there seems to be an amount of confusion in the xHCI spec itself regarding the meaning of the Maximum Data Burst Size, that confusion might influence implementations of xHCI controllers. It's therefore wise to assume the less restrictive definition when implementing USB hardware.

Bulk endpoint’s flow control

The USB 3.0 spec, like all the others before it, lets the host control of all data flow. This goes along with the TD queuing mechanism defined in the xHCI spec, so there’s data exchange with an endpoint only if there are TDs queued. In other words, USB traffic takes place only when there are outstanding requests by the software for it (and specifically allocated data buffers for its data).

Unlike USB 2.0, there is no polling of endpoints, but instead the host sends data at will to the device (OUT endpoint), or requests data by sending an ACK packet with a nonzero value in the NumP field (IN endpoint). Such ACK packet can initiate a burst or be an intermediate response in an ongoing one. The device may refuse to communication by responding with an NRDY packet (in both directions).

To avoid wasting bandwidth on refused attempts by the host, a flow control mechanism requires the device to inform the host on its readiness to send or receive data (see section 8.10.1 in the spec): Both IN and OUT endpoints go into flow condition by sending an NRDY packet, and return to active state (i.e. exit flow control condition), by sending an ERDY packet.

Endpoints may also enter the flow control condition efficiently by terminating an ongoing burst gracefully: An IN endpoint may set the EOB bit in a Data Packet Header on the last packet it transmits, thereby signaling it’s the last packet it has. An OUT endpoint responds with an ACK with the NumP field set to zero. In both cases, the flow condition is invoked without any extra handshake traffic. An ERDY packet from the device is required to get into active state again.

Note that while the meaning of the NumP field in an ERDY packet on an OUT endpoint is the same as for an ACK packet (the number of packets the device is ready to receive for that endpoint), the significance of NumP in an ERDY packet for an IN endpoint is somewhat less clear. According to Table 8-14 in the USB 3.0 spec, it should contain the number of data packets that the device has to send. It further says it’s possibly for informative purposes only, and yet it’s somewhat unclear why this is required; after all, the host’s USB controller will know when the data is exhausted by virtue of the EOB field. Possibly it helps with bandwidth efficiency or even preventing the deadlock situation discussed below, by preferring endpoints with a larger NumP.

Also note that the host has the prerogative of forgetting the flow control state, and attempt communication at any time. The device simply responds with an NRDY in that case. Inefficient, but legit. On the other hand, the device is not allowed to send ERDY packets if it isn’t in a flow control condition (USB 3.0 spec section 8.10.1). So the host is allowed to forget its flow control condition and try its way by sending packets that may turn out to be a waste of bandwidth, but the device doesn’t have this freedom.

Looking at the overall picture, the device sort-of controls when data is transmitted by virtue of flow control, but the host has the final say: No data is transmitted anyhow if there’s no queued TD, and the host is allowed to ignore the flow control state (even though this is probably uncommon).

Short packets

A short packet marks the end of transfer, but not necessarily a flow condition: Per section 8.10.3, a short packet is always the last in a burst, and indicates the end of transfer. However by itself, it doesn’t request a flow control condition, so the host may issue another ACK (i.e. DATA IN request) packet.

After sending a short packet, the device must stop sending packets for that endpoint until it has received an ACK for that packet. A nonzero NumP of that ACK packet indicates that the host is ready to receive more packets (i.e. that there another TD queued for the same endpoint). If the NumP is zero, there is no more TD queued, in which case no packets should be sent.

An EOB requests a flow condition. Per section 8.10.1, to issue a flow control condition, a DP with EOB set to one or an NRDY packet is sent by the device. The host is not likely to issue an ACK packet for that endpoint until an ERDY packet arrives from the device, but it may.

Interleaving between bulk endpoints transactions

Per section 8.1 in the USB 3.0 spec: The host may interleave packets of BULK OUT transactions, but not BULK IN. The said section clearly states that an IN transaction can’t be initiated until the previous one has been terminated: “… host shall not initiate another IN bus transaction to any endpoint until the host receives all DPs or an NRDY or a STALL TP or the transaction times out for the current ACK TP sent”.

This opens for a deadlock situation, if a BULK IN endpoint responds to a request from the host with less data than required to complete a transfer, but terminating it with an EOB rather than by virtue of a Short Packet. This prevents the host from using the data channel for starting BULK IN transfers from other endpoints, and the current endpoint is in a flow control condition. This deadlock is easily avoided by proper design of the device / endpoint relations, in particular if the endpoint always sends a Short Packet when it runs out of data (this may even be a zero length packet if the last data packet was 1024 bytes long).

Alternatively, the USB controller could make use of its prerogative to attempt communication even if the device has signaled a flow control condition. Accordingly, it may attempt to send another ACK packet to the device, for the sake of provoking it to respond with NRDY, which allows it to go on with other endpoints per the citation above.

This knot is untied in later USB spec revisions, but only for devices of these later USB revisions among themselves: Simultaneous IN Transactions are allowed in the USB 3.2 spec, but not to a SuperSpeed (USB 3.0) bus instance. In USB 3.2’s section 8.1.1, “receives a DP with EOB flag set” is listed as an condition not appearing in USB 3.0 for starting a BULK IN transaction to another endpoint. In other words, USB 3.0 requires the end or absence of a BULK IN transaction for moving on to another, but on USB 3.2 it’s allowed between bursts.

It appears like someone forgot to mention the EOB flag in the list of excuses to switch to another endpoint in the USB 3.0 spec.

As for the BULK OUT endpoints, the xHCI specification defines the scheduling of data transmissions in terms of Service Opportunities (see sections 4.14.1 and 4.14.4), hence allocating each endpoint during a separate, dedicated time slot. Interleaving is therefore not taken advantage of, except for that when switching from one endpoint to another, the first packets on the new burst may start before the acknowledgment from the last packet(s) of the previous burst have arrived.

References in the USB 3.0 / 2.0 specs on Short Packets and transfers

These are the places in the two specifications from which the definition a transfer can be derived, through its relation with Short Packets.

USB 3.0, section 8.10.3 (“Short Packets”) says that SuperSpeed retains the semantics of short packet behavior that USB 2.0 supports, but it doesn’t say exactly what a “transfer” means, possibly avoiding Microsoft Windows terms (IRPs in particular):

SuperSpeed retains the semantics of short packet behavior that USB 2.0 supports. When the host or a device receives a DP with the Data Length field shorter than the maximum packet size for that endpoint it shall deem that that transfer is complete.

USB 2.0, section 5.3.2 (“Pipes”) clarifies this:

If there are no IRPs pending or in progress for a pipe, the pipe is idle and the Host Controller will take no action with regard to the pipe; i.e., the endpoint for such a pipe will not see any bus transactions directed to it. The only time bus activity is present for a pipe is when IRPs are pending for that pipe.

and further down:

An IRP may require multiple data payloads to move the client data over the bus. The data payloads for such a multiple data payload IRP are expected to be of the maximum packet size until the last data payload that contains the remainder of the overall IRP. See the description of each transfer type for more details. For such an IRP, short packets (i.e., less than maximum-sized data payloads) on input that do not completely fill an IRP data buffer can have one of two possible meanings, depending upon the expectations of a client:

A client can expect a variable-sized amount of data in an IRP. In this case, a short packet that does not fill an IRP data buffer can be used simply as an in-band delimiter to indicate “end of unit of data.” The IRP should be retired without error and the Host Controller should advance to the next IRP.
A client can expect a specific-sized amount of data. In this case, a short packet that does not fill an IRP data buffer is an indication of an error. The IRP should be retired, the pipe should be stalled, and any pending IRPs associated with the pipe should also be retired.

As suggested above, the term “IRP” is would probably have been best replaced with an xHCI Transfer Descriptor (”TD”).