Xillybus FPGA designer’s guide

2 General guidelines

2.1 Clocking

All signals from and to the Xillybus IP core must be synchronous with the rising edge of bus_clk. This clock is supplied by the IP core.

For Xillybus IP cores that are based upon PCIe, this clock is generated by the PCIe block. The clock’s frequency depends on the platform: For the baseline IP core (revision A) the frequency of bus_clk is either 62.5 MHz, 125 MHz or 250 MHz. This depends on if the maximal bandwidth (as advertised) is 200 MB/s, 400 MB/s or 800 MB/s, respectively.

With later revisions (B, XL and XXL) bus_clk has a frequency of 250 MHz.

Zynq-based platforms typically have a bus_clk of 100 MHz. XillyUSB works with a bus_clk of 125 MHz.

There is often a possibility to change the clock’s frequency within a limited list of choices. This is done by configuring the PCIe block or the processor core that generates the clock.

If the timing constraints for the PCIe block are set correctly (as in the demo bundles), the application logic that relies on bus_clk is covered by proper timing constraints as well: The tools automatically create timing constraints for bus_clk that are based upon the timing constraints of the PCIe block. The same applies for Zynq-based platforms as well as XillyUSB.

This is not to say, that the application logic needs to be synchronous with bus_clk. Likewise, it’s not required that the source of the data or that the data’s destination is synchronous with bus_clk. When a different clock is involved, a dual-clock FIFO is often used together with the IP core: One side of the FIFO is connected to the Xillybus IP core. This side is therefore synchronous with bus_clk. The application logic is connected to the FIFO’s other side. This side is synchronous with the application logic’s clock. Hence the FIFO is used not only as a short-term temporary storage, but also for clock domain crossing.

2.2 Data width

Each FIFO or memory interface works with data in widths of 8 bits, 16 bits or 32 bits. This is true with baseline Xillybus IP cores (revision A). Later revisions, as well as XillyUSB, support wider data interfaces.

Wider data allows higher bandwidth performance and is also more convenient in applications where the natural transmission word is wider than 8 bits. On the other hand, the inherent data width on the host side remains 8 bits (a byte), because read() and write() function calls define their length in bytes.

The considerations for choosing the data width are discussed briefly in The guide to defining a custom Xillybus IP core.

2.3 Interfacing through a FIFO

The demo bundle demonstrates how a FIFO should be connected: It has a FIFO with both sides connected to the IP core. This implements a loopback on two streams.

The FIFOs in the demo bundle are configured for a common clock on both sides. This is not suitable when the FIFO is used for clock domain crossing. In this case, a dual-clock FIFO (often called “asynchronous FIFO”) should be used.

When a FIFO is used for a stream from the host to the FPGA, this FIFO’s “full” signal should be connected to the Xillybus IP core. The IP Core uses this signal to determine whether a burst of data transfer can be initiated.

The same principle applies to a stream from the FPGA to the host: The “empty” signal should be connected to the Xillybus IP core for the same purpose. The IP core expects the behavior of a regular FIFO (as opposed to FWFT, First Word Fall Through).

Once a burst has started, the Xillybus IP core continues to rely on these signals (“empty” and “full”): These signals prevent the IP core from reading from an empty FIFO or writing to a full FIFO.

However, even if a FIFO indicates that it is ready for a burst of data, the Xillybus IP core may not start a burst immediately. The IP core may also stop a data burst in the middle, even if the FIFO allows continuing the burst. It is normal for the pattern of the data flow to be apparently random.

The general rule is that the Xillybus IP core attempts to equally serve all FIFOs that are connected to it. The IP core grants longer bursts to FIFOs that tend to get filled faster, as these FIFOs don’t activate their “empty” or “full” as often.

This simple arbitration method ensures efficient communication with FIFOs that tend to get filled rapidly. At the same time, a low latency on FIFOs that receive data at a lower rate is achieved.

As for the depth of the FIFO, the Xillybus IP core works with any depth, in principle. However, this attribute should be chosen to cope with the expected data flow. A FIFO with a depth of 2 kBytes is almost always the correct choice for an asynchronous stream, even for high data rates. But this is sometimes a matter of trial and error.

A depth of 2 kBytes is usually enough, because the Xillybus core is not likely to neglect a FIFO of this size for a time period that is long enough to cause an overflow or underflow. This is of course true as long as the user application software that runs on the host consumes or supplies data rapidly enough. If this is not the case, the solution might be to make the DMA buffers larger. Attempting to solve this with a larger FIFO is unreasonable, as there is much less memory on the FPGA.

2.4 Behavior of “empty” and “full” signals

In a normally operating FIFO, the “empty” signal can change from low to high only one clock cycle after the read enable was high. Likewise, the “full” signal can change from low to high only one clock cycle after the write enable was high.

These two signals can change to low at any moment, of course.

The Xillybus IP core relies on this behavior: When a FIFO indicates that it is ready for a data transfer (with a low “empty” or “full”, as applicable), a state machine in the IP core may start a chain of events. This will lead to the transfer of at least one data element. Hence if the “empty” signal changes to high before the IP core fetches any data from the FIFO, it’s possible that the IP core will ignore the “empty” signal during one clock cycle. Such an event is harmless regarding the IP core’s own integrity, but may lead to an unexpected and unpredictable data flow.

The same applies to the “full” signal: If this signal changes from low to high before the IP core writes a data word to the FIFO, the IP core may ignore the “full” signal during one clock cycle. Once again, this is harmless to the IP core itself, but may result in the loss of one data word.

A properly designed FIFO can create this faulty condition only if it is reset at the same time that it is ready for communication with the Xillybus IP core. This situation should normally be avoided anyhow.

If application logic is connected directly with the IP core (without a FIFO), it’s important to imitate the behavior of a standard FIFO regarding “empty” or “full”.