What are Streams?
At the gory detail level, each SuperSpeed data packet belonging to a bulk endpoint has a 16-bit field in its information header, Stream ID (more precisely, this field is in the Data Packet Header). For regular, non-Stream endpoints, this field is always zero. When non-zero, this field marks each of these packets as belonging to a stream, designating it to a Stream ID.
However Streams are NOT an expansion of an endpoint into 65536 sub-endpoints, each with a life of its own. It’s not about adding 16 bits to the endpoint number.
To understand the benefit of Streams, consider a traditional, non-Stream, BULK OUT endpoint: First, the host queues chunks of data for transmission towards the device (i.e. it queues Transfer Descriptors on the xHCI interface). The xHCI controller then transmits these chunks of data towards the device in order they were queued.
With Streams, the host’s software assigns a Stream ID to each chunk of data it queues for transmission. At any given time, the host’s USB controller and the device agree on which Stream ID to service, and only data chunks for this ID are transmitted. This allows the device to select the Stream ID, so it can essentially pull data from the host in the needed order.
The same mechanism works with BULK IN endpoints, of course, but its advantage is less obvious.
A USB device may use Streams on some (bulk) endpoints and not on others. Streams is not possible on non-bulk endpoints.
Refer to this page for some background on SuperSpeed USB transfers and bursts (regardless to Streams).
The Current Stream ID
In relation to Streams, the USB 3.0 spec describes rather tangled state machines for the device and host for endpoints in both directions, with the purpose of ensuring that both sides have the same view on the Current Stream ID (and handling momentary disagreements due to race conditions). It seems like the main idea is to give the device the initiative of selecting which Stream ID to serve, but this is a USB protocol after all, so the host has the last word.
In broad strokes, the device can suggest the Current Stream ID on an endpoint by sending an ERDY packet with that Stream ID. For an IN endpoint, the host may respond with an ACK packet with the same ID to initiate a transfer for that ID, or an ACK with another ID, in which case the latter ID overrules, or do nothing at all (in case there’s no queued TD with that Stream ID).
For an OUT endpoint, the host may send a DATA OUT packet with that Stream ID, or another Stream ID, or do nothing. Once again, it’s the Stream ID selected by the host that determines the Current Stream ID until the termination of the transfer. The host can also send a DATA OUT packet with a new Stream ID out of the blue (but not in the middle of a transaction), in which case that stream ID becomes the Current Stream ID.
The USB 3.0 spec allows the host to initiate a transfer on any Stream ID, regardless of suggestions made by the device, if any. This is referred to as “Host Initiated Data Move” in the xHCI spec, and can be disabled according to its section 4.12.1.1 by setting the Host Initiated Disable (HID) flag to ‘1′ for the relevant endpoint.
There is no interleaving of stream IDs within data transactions: At any moment, the host and device agree on a Current Stream ID (”CStream”), and limit the data transfers to that ID only. The selection of a current Stream ID is in principle made by either side, by sending a data packet with that ID, but changing the ID is allowed only between transactions (Section 8.12.1.4.3.7 in the USB 3.0 spec: “Note: The Stream ID value shall be CStream for all packets exchanged in the Move Data state. If a Stream IDvalue other than CStream is detected while in the DOMDSM the device should halt the endpoint.”).
The USB 3.0 spec says nothing about transactions on the part describing the Stream protocol, so one could get the false impression that the Current Stream can be changed in the middle of transactions. The xHCI spec says in section 4.12.1 that “The Stream Protocol allows a device to switch Streams on packet boudaries”. But section 8.1 forbids initiating a BULK IN transaction before the previous one is completed or rejected, with no special regard to Streams. So Streams doesn’t allow interleaving packets for different transactions because they have different Stream IDs.
Bandwidth efficiency
That the Stream ID suggestion mechanism comes creates longer time gaps with no data transmission, in particular for IN endpoints: With non-Stream endpoints, the host knows which endpoints are ready for transfers by virtue of flow control, so it may issue a DATA IN request with an ACK for another endpoint as soon as the previous transfer is done. With Streams, the host needs to wait for an ERDY with the Stream ID suggestion before requesting data from the device, and that suggestion can only be sent by the device after it has received the acknowledgment for the data of the last packet of the previous transfer. So there’s a gap of at least the bus’ data loop time between each Stream IN transaction — typically a few microseconds.
When the Streams feature is used as intended, there will always be a TD ready on the host for the Stream ID that the device suggests, so effectively the host will always accept those suggestions. Yet, the USB 3.0 spec defines the behavior of a variety of other scenarios.
The Prime Pipe
The Stream ID 0xFFFE is always designated to the Prime Pipe, which is not intended to carry any data. Rather, it’s a Stream ID that is selected by the host (and never by the device) to inform the device that new TDs have been queued. The device always responds with an NRDY to the packet from the host, as there is never any data to transmit on this Stream ID. Note that the Prime Pipe notification doesn’t contain any other information except for that something has changed. In the xHCI spec, a transition to the Prime Pipe corresponds to ringing the endpoint’s doorbell by software.
To understand the rationale behind this, consider the case of a disk drive as a USB device. The host may send a command to write a list of data blocks to the disk, each having a TD with a different Stream ID, but it’s not known in advance in which order the disk needs to consume the data. The Prime Pipe informs the device that all data is ready, and the disk may consequently fetch the blocks of data in the required order by selecting Stream IDs.
Are Streams useful?
For IN endpoints, Streams are not likely to be helpful. Rather than setting up TDs with different Stream IDs, it can queue a pool of TDs on a regular endpoint. The device may add the muxing information in the transmitted data, for example as the first two bytes of a transfer (for a Stream ID substitute). Assuming that the device and its driver work properly together, there is no effective difference between this and using Streams, except a reduction in protocol overhead.
For OUT endpoints, Streams are more appealing, as they allow something not possible with plain endpoints: Let the device select the order in which it receives the data. As host initiated data moves can be disabled, this gives the device to fetch data from the host as required. However for a small number of Streams, a separate OUT endpoint for each is more efficient because of the time gap between transactions for switching the Current Stream.