Setting up a Xillyp2p IP core at the IP Core Factory

Introduction

The IP Core Factory is an on-line application on Xillybus' website that allows configuring the parameters and data streams of an IP Core in a way customized to your application. Once the configuration complete, the IP Core is submitted for generation by clicking on the "generate core" link on the web application. A short while after this (usually a few minutes), the IP Core's netlist and accompanying files are ready for download as a ZIP file.

This ZIP file also includes a README file, containing the parameters of the IP core and its streams, as generated. These parameters may divert slightly from those requested.

The IP Cores that are downloaded are fully functional for use within an FPGA project, without any technical limitations. From a legal point of view, the IP core is allowed for use for evaluation purposes only, or for certain academic purposes, as explained on this guide.

Subscribing to this service requires merely providing a valid e-mail address.

Generally, the IP Core Factory generates IP cores of three types: The mainstream Xillybus IP Core, for communication between a computer host and an FPGA through a PCIe / AXI bus; XillyUSB, which performs the same with USB 3.0; and Xillyp2p, which connects two FPGAs through an MGT (Multi-Gigabit Transceiver, often called GTX, GTH, GTY etc.), another type of SERDES, or via parallel wires between the FPGAs.

Refer to the main documentation page for more information about Xillybus and XillyUSB IP cores.

The definition of a Xillyp2p IP core consists of two stages:

Choosing the target FPGA family and setting up the parameters of the physical data link.
Setting up the streams for transmitting application data.

The web application is quite self-explanatory and includes help buttons with relevant information. Nevertheless, this guide walks through a few topics as a supplement for these help buttons.

Core A and Core B terminology

Unlike the other types of IP cores produced by the IP Core Factory, the IP core for Xillyp2p consists of two separate logic blocks, referred to as core A and core B. This is a natural consequence of the fact that a link between two FPGAs is established, where the relation between the two sides in this link isn't necessarily symmetric.

The IP Core Factory supplies these two logic blocks as two separate logic netlists, each with its own black-box file (if needed) and instantiation templates. To connect two FPGAs, the logic design of the first FPGA should include core A, and the second should include core B.

The expression "Xillyp2p IP core" refers to core A and B together, usually in the context of their joint configuration. This should not be confused with the "core A / core B" terminology, which refers to a specific logic netlist included in one of the two FPGA projects involved.

See below for guidelines on creating a symmetric IP core that allows an FPGA to connect to another FPGA with the same block (i.e. core A to core A).

For unidirectional Xillyp2p IP cores, data flows from core A to core B.

The IP core's basic parameters

First, some basic parameters are configured:

The IP core's name. This name has no practical significance. It's used only on the website and in the name of the ZIP file that is generated at the end of the process.
The IP core's FPGA target or targets: Core A and Core B may be intended for the same FPGA family or two different families.
Whether the IP core is bidirectional or unidirectional. When possible, bidirectional is recommended, even if the application is intended to transport data only in one direction. In such case, the physical link in the opposite direction is used by Xillyp2p's protocol to acknowledge properly received data, request retransmissions, or provide flow control for individual application data streams. Naturally, the opposite physical link may have a significantly lower data rate.
Type of application. This choice may have a minor impact on the core's internal parameters, with slight performance differences.
Link delay: The text box for configuring the link's delay appears only for certain application types, for example for a fiber optic channel, which may have a significant link delay. When the text box is visible, it should be filled with the time it takes data on the physical link to travel from core A to core B or vice versa. If the delay isn't the same in both directions, the average delay should be specified. This parameter influences the allocation of RAM buffers, and affects bandwidth performance if set incorrectly.

Clock frequency and tolerance

When the physical link is bidirectional, the user application logic supplies the Xillyp2p IP core with two clocks, which are usually generated by or related to the physical link's logic.

The transmission clock, tx_clk, which is used to clock the parallel word for transmission. This clock is generated by an oscillator on the transmitting side's circuitry.
The reception clock, rx_clk, which is often derived from the arriving stream of bits.

Since these two clocks are used by the IP core's logic, their frequencies should be reasonable for the target FPGA, so that the timing constraints can be achieved. A clock frequency of 250 MHz works well on most targets; however when the frequency can be traded off with the parallel word's width (more on this below), it's best to aim for the 120-200 MHz range. That said, even 300-350 MHz may work on some faster FPGAs.

There is no minimal clock frequency (as long as the FPGA itself supports it).

For a unidirectional physical link, there is only one clock used on each side. In this case, only this clock's frequency is specified in the IP Core Factory, and the rest of this section is irrelevant.

When the physical link is bidirectional, the frequencies of tx_clk and rx_clk must be approximately the same, with an allowed tolerance of 125,000 ppm (parts per million), i.e. 12.5% frequency tolerance. If different data rates are required in each direction, this is achieved by selecting different widths for the physical link's parallel data word.

When configuring an IP core in the IP Core Factory, the frequency of these two clocks, as well as the maximum frequency tolerance, are given as parameters. The tolerance is defined as the deviation of each of the clocks from its nominal frequency. This accounts for all potential sources of deviation: The clock oscillator's own tolerances (e.g. aging and temperature) as well as SSC (Spread Spectrum Clocking), if applied. All such tolerances should be summed to obtain the final number.

As both clocks may have this deviation, the maximum possible frequency difference between them is approximately twice the per-side tolerance.

The Xillyp2p IP core handles frequency tolerance by inserting idle segments into the physical link's data stream, which limits the amount of valid data. Without this limitation, the receiving side is at risk of an overflow of data, as it may not be able to deliver data to the application quickly enough.

Hence, a high clock frequency tolerance reduces the available data bandwidth. That said, a typical clock tolerance of 100 ppm, for example, has a negligible effect.

Requesting out_ready and in_valid ports

Next to the clock frequency text box, there's a checkbox labeled "A parallel word is transmitted / received on every clock cycle". For most applications, this checkbox should remain checked, in particular when the Xillyp2p IP core is connected directly to an MGT.

When this checkbox isn't checked, two input ports are added to Xillyp2p's IP core for interfacing with the physical link: "out_ready" on the transmitting side and "in_valid" on the receiving side. For a bidirectional physical link, both ports are added on both sides. These input ports act as a clock enable for the parallel word.

An alternative way to view these ports is through AMBA / AXI terminology: out_ready corresponds to the "ready" signal in a valid/ready pair for out_data, but the "valid" signal is not provided, because out_data is always valid. Likewise, in_valid corresponds to the "valid" signal of a valid/ready pair for in_data, but "ready" is not provided because the IP core is always ready to accept a parallel data word.

In other words, when out_ready is high, the Xillyp2p IP core assumes that the content of out_data will be transmitted on the physical link. Otherwise, out_data retains its value on the next clock cycle. By the same coin, when in_valid is low, the IP core ignores the content of in_data.

out_ready and in_valid are intended for usage scenarios in which the user application logic manipulates the physical link's data. If this manipulation causes an irregular flow of parallel words, these two ports allow such a flow to be interfaced with the IP core. For example, a gearbox implemented in application logic may not produce a parallel word every clock cycle. It's also possible to insert a FIFO in the flow of parallel words (not to be confused with the application data FIFOs).

For example, if a FIFO is used for transmitting data on the physical link, both out_ready and the FIFO's wr_en input may be connected to the negation of this FIFO's own "full" output. This arrangement allows the physical link's transmitter to fetch data from the other side of this FIFO at any desired pace.

There is no requirement on how often out_ready and in_valid are high. However, when these two ports are opted in at the IP Core Factory, an estimation of the percentage of the time they are high is required in the section where the parallel word's width is defined. This percentage is referred to as "utilization ratio". This estimate allows the IP Core Factory to allocate the RAM buffer resources that are adequate for a proper data rate performance.

Parallel word's width

The last part in the first stage of the core's configuration is choosing the width of the parallel word. Xillyp2p allows any parallel word width between 1 and 128 bits, inclusive. The word width may be different for transmission and reception. However, the parallel word width that core A uses for transmission must be the same as the width that core B uses for reception, and vice versa. For example, an MGT may interface with core A using an 80 bits wide parallel word for transmission and 20 bits for reception. In this case, the MGT connected to core B must use 20 bits for transmission and 80 bits for reception.

Recall from above that the clock frequencies used with the parallel words for transmission and reception must be the same (within tolerance; see below). Hence if the physical link is required to have different data rates for transmission and reception, this is achieved by choosing different word widths.

When using an MGT or SERDES, there are often different combinations of clock frequency and parallel word width to choose from, given a required data rate. A wider parallel word allows reducing the clock frequency, which can be beneficial for achieving the timing constraints easier in some situations – however Xillyp2p's logic consumption might increase as a result of that. In particular, there are two threshold points to be aware of: 32 bits and 64 bits. Choosing 33 bits instead of 32, or 65 bits instead of 64, results in significantly more logic consumption. This is related to the width of Xillyp2p IP Core's internal data word. For bidirectional physical links, the parallel word width affects logic consumption independently for each direction.

For example, MGTs often allow a choice between 32 bits and 40 bits. If a small FPGA is targeted, and logic consumption is hence a concern, it may be preferable to choose 32 bits. When targeting large and mostly unused FPGAs, choosing a wider word is preferred if that helps aiming towards a lower clock frequency, so timing constraints are achieved easier and faster. Pushing the clock frequency below 120 MHz is however most likely pointless.

Setting up streams

After the IP core's general parameters have been configured, the application data streams are set up.

Any number of streams can be defined. Each such stream is independent, and consumes bandwidth from the physical link in an efficient manner, only as necessary transport data to the other side. In particular, if no data is transmitted through a stream at a given time, it doesn't consume any resources from the physical link.

For each stream, the following parameters are configured:

The stream's name. This name is included in the names of the Xillyp2p IP core's ports related to the stream, on both core A and core B. This naming convention is should be used to ensure that the application data reaches the correct logic on the other side. The stream's name is not included in the implemented logic, and has therefore no significance outside the FPGA design process.
The data width: This is width of the data word on the port connected to the application logic's FIFO. Hence it's also the width of that FIFO's data word (at least on the side facing the IP core). The possibilities are 8, 16, 32, 64, 128 or 256 bits. This width has nothing to do with the physical link's width, and neither does this width affect performance. It's therefore recommended to choose the width that is most natural for the application.
Required bandwidth: This is the maximal data rate that this stream should be able to support. It's recommended to specify the data rate actually needed by the application. Configuring a stream for a higher rate than necessary may waste FPGA resources. Note that if multiple streams are active concurrently, they compete for the physical link's bandwidth, and their combined data rate cannot exceed this bandwidth. The application logic should not rely on Xillyp2p to limit the data rate to the requested figure, even if this may happen.

For bidirectional physical links, the following two parameters are also configured:

Direction. A stream can be configure for data flow from A to B, from B to A, or bidirectional. A bidirectional stream consists of two streams, one in each direction, having the same name. This is more likely to cause confusion than provide benefit. Hence bidirectional streams aren't recommended, unless a symmetric IP core is required (more on that below).
Enable flow control. When the checkbox is checked, a "full" input port is added to the IP core's interface with the application logic's FIFO. Using this input, the Xillyp2p IP core is guaranteed not to cause an overflow on this FIFO, by virtue of a flow control mechanism that is specific and independent to each stream. This allows the application logic on the receiving side to effectively control the flow of data all the way back to the application logic on the transmitting side, producing a practical illusion of a FIFO split between the two FPGAs.

Connecting core A to itself (another core A)

It's possible to configure a Xillyp2p IP core so it can connect to itself. In other words, so that core A is identical with core B. This allows an FPGA programmed with a design containing core A to connect to another FPGA design containing the same core A. In particular, this enables connecting two FPGAs that are programmed with the exact same bitstream.

As mentioned on a different guide, an IP core usually rejects a link partner unless it's the A/B-counterpart from the same Xillyp2p IP core. Such a rejection is indicated by the status_link_partner_mismatch output port, which is held high when this happens.

To allow core A to connect to itself, a Xillyp2p IP core must be symmetric. For this to occur, the following requirements must be met (and together they are both necessary and sufficient):

The physical link is bidirectional.
The physical link's parallel word width is the same in both directions.
All streams are bidirectional.
The "A parallel word is transmitted / received on every clock cycle" checkbox is checked; or if it's not, the utilization ratio percentage is configured with the same value in both directions.

When an IP core is symmetric and can be connected to itself, a note appears in the header of the README file. For example,

Configuration ID: 0xbc5aed42, symmetric (core can connect to itself)

The configuration ID depends on the core's settings, and the fact that the core is symmetric is indicated by the text highlighted in red above.