Xillybus FPGA IP Core's signal API

Introduction

In its simplest form, the core's interface with application logic was designed for a direct connection with either a standard FIFO or RAM memory. Examples are not given here, since the evaluation kit contains self-explanatory examples, showing how to connect these signals properly.

Note that the except for the clock frequency, this API is identical across the different platforms.

Important: All signals from and to the core must be clocked with the bus_clk supplied by the core itself. This clock is generated by Xilinx' Microblaze core, and has a frequency of 70 MHz on the Xillybus mini-distro kit. There is no need to add a timing constraint for this clock, as it's derived from the Microblaze's constraints.

In a typical application, one side of an asynchronous FIFO is connected to the Xillybus core, and the application logic to the other. One side of the FIFO is clocked by the bus_clk, and the second by the application's clock. Hence the FIFO is used for clock region crossing as well as a short-term temporary storage.

The easiest way to connect a FIFO correctly is replacing one of the two loopback FIFOs in the demo design file with two asynchronous FIFOs, following the connection scheme with the core. It's up to the application designer to choose the FIFO size, knowing that there is no other dedicated buffering space for data on the FPGA. The core works with any FIFO size.

Depending on the data direction, the FIFO's 'empty' or 'full' signal is connected to the core. Using these signals, the core assures data integrity: It won't attempt reading from an empty FIFO nor try to write to a full one. The host's kernel drivers throttle excessive data flow by putting the relevant process to sleep ("blocking") as expected in a Linux application. So the overall picture is an end-to-end data flow allowing both sides to make a natural use of their resources without risking data corruption.

When more advanced use of the core is desired, a deeper understanding of the signals is necessary. If a FIFO's or memory's interface is mimicked by logic, it's very important to know in what ways the core relies on the behavior of these standard components.

The core expects the behavior of a regular FIFO (as opposed to FWFT, First Word Fall Through).

Data width

Each FIFO or memory interface works with data in widths of 8 bits, 16 bits or 32 bits. The configuration takes place during the core formation, and cannot be done by end users.

Wider data allow higher bandwidth performance and is also more convenient in designs where the natural transmission word is wider than 8 bits. On the other hand, the inherent data width on the host side remains 8 bits (a byte's width), as read() and write() operations define their length in bytes.

A poor choice of data width may lead to undesired behavior. For example, if an host-to-FPGA link is 32 bits wide, writing 3 bytes of data at the host will make the driver wait, possibly forever, for the fourth byte before sending anything to the FPGA. This is an example of using a wide data interface in an unnatural way, leading to possibly unexpected behavior.

FPGA signal naming convention

Except for the two global signals, bus_clk and quiesce, all signals follow a simple convention. For example, a certain write enable signal may have the name user_w_write_32_wren. This name is broken into four components:

The "user" prefix marks all user interface signals.
The "w" flag indicates this signal belongs to a host-to-FPGA interface (host "write"). FPGA-to-host interfaces have an "r" instead. Address signals don't have this flag, since they apply to both directions.
The "write_32" strings appears in the respective device file's name, /dev/xillybus_write_32
The suffix signifies the signal's meaning

In what follows, the device file name (component #3) is denoted {devfile} to avoid confusion.

Signals for host to FPGA transmission

user_w_{devfile}_data -- This core output signal contains data during write cycles. As mentioned above, this signal's width can be 8, 16 or 32 bits, depending on the respective device's configuration.
user_w_{devfile}_wren -- This core output signal is a write enable signal to the FIFO: It's asserted ('1') in conjuction with valid data being present on the user_w_{devfile}_data signal, and tells the receiving party (e.g. FIFO) to sample this data.
user_w_{devfile}_full -- This core input signal informs the core that no more data can be accepted. When asserted properly, it temporarily assures no write cycles occur.
Important: The 'full' signal may transition from '0' to '1' only on the clock cycle following a write cycle. This is the way standard FIFOs behave, so this rule needs to be observed only if you want to interface the core with something else. The reason for this rule is that the Xillybus logic treats an non-asserted 'full' signal as a green light to start a data transaction with the host. Failing to observe this rule may cause sporadic writes overriding the 'full' condition.

A typical Verilog implementation of the 'full' signal should be something like this:
```
always @(posedge bus_clk)
  if (ready_to_get_more_data)
    user_w_mydevice_full <= 0; // Deassert any time
  else if (user_w_mydevice_wren && { ... some condition ... } )
    user_w_mydevice_full <= 1; // Assert only in conjunction with wren
```
The same in VHDL:
```
process (bus_clk)
begin
  if (bus_clk'event and bus_clk = '1') then
    if (ready_to_get_more_data = '1') then 
      user_w_mydevice_full <= '0'; -- Deassert any time
    elsif (user_w_mydevice_wren = '1' AND { ... some condition ... } )      
      user_w_mydevice_full <= '1'; -- Assert only in conjunction with wren
    end if;
  end if;
end process;
```
user_w_{devfile}_open -- This core output signal is asserted ('1') when the respective device file in the host is open for write (a read-only open, when allowed, will not assert this signal). This signal can optionally be used to reset the FIFO or other logic between file opens. If a file is opened by multiple processes in the host (as a result of a fork() or when nonexclusive open is allowed), this signal remains asserted until all open instances are closed.

Signals for FPGA to host transmission

user_r_{devfile}_data -- This core input signal contains data during read cycles. As mentioned above, this signal's width can be 8, 16 or 32 bits, depending on the respective device's configuration. This signal must not change except for as a result of a read enable signal, and on the clock following it (this is the normal behavior of a standard FIFO).
user_r_{devfile}_rden -- This core output signal is a read enable signal to the FIFO: When asserted, the core expects valid data to be present on user_r_{devfile}_data on the following clock.
user_r_{devfile}_empty -- This core input signal informs the core that no more data can be read. When asserted properly, it temporarily assures no read cycles occur.
Important: The 'empty' signal may transition from '0' to '1' only on the clock cycle following a read cycle. This is the way standard FIFOs behave, so this rule needs to be observed only if you want to interface the core with something else. The reason for this rule is that the Xillybus logic treats an non-asserted 'full' signal as a green light to start a data transaction with the host. Failing to observe this rule may cause sporadic reads overriding the 'empty' condition.

A typical Verilog implementation of the 'full' signal should be something like this:
```
always @(posedge bus_clk)
  if (ready_to_give_more_data)
    user_r_mydevice_empty <= 0; // Deassert any time
  else if (user_r_mydevice_rden && { ... some condition ... } )
    user_r_mydevice_empty <= 1; // Assert only in conjunction with rden
```
The same in VHDL:
```
process (bus_clk)
begin
  if (bus_clk'event and bus_clk = '1') then
    if (ready_to_give_more_data = '1') then 
      user_r_mydevice_empty <= '0'; -- Deassert any time
    elsif (user_r_mydevice_rden = '1' AND { ... some condition ... } )      
      user_r_mydevice_empty <= '1'; -- Assert only in conjunction with rden
    end if;
  end if;
end process;
```
user_r_{devfile}_eof -- This core input signal tells the core to generate an end-of-file event. It's like an 'empty' signal, but once asserted, the core will not issue any more read cycles until the file is closed and reopened. On the host side, the application reading from the file descriptor will be informed that the file has reached EOF when all data has been consumed. Like the 'empty' signal, the 'eof' signal must be asserted only on a clock cycle following a read cycle. Once asserted, it can be deasserted on any following clock, or kept high -- the core latches the EOF request until the file is closed.
The 'empty' signal may and may not be asserted in conjunction with an 'eof' assertion.
There is no need to worry about synchronization between the data and the 'eof' signal: The 'eof' signal should be asserted in conjunction with the last piece of data, and the host driver will always deliver the EOF after this data has reached the user application.
user_r_{devfile}_open -- This core output signal is asserted ('1') when the respective device file in the host is open for read (a write-only open, when allowed, will not assert this signal). This signal can optionally be used to reset the FIFO or other logic between file opens. If a file is opened by multiple processes in the host (as a result of a fork() or when nonexclusive open is allowed), this signal remains asserted until all open instances are closed.
There is no direct connection between the 'eof' signal and the 'open' signal. The 'open' signal will deassert when the file is closed, not when the EOF is delivered.

Memory interface signals

A Xillybus interface can be configured to also have an address signal. The address is automatically incremented on read and write cycles, and can be set to an arbitrary value by host, using the standard mechanism for seeking in files.

Alongside with some of the signals mentioned above, a standard RAM is easily interfaced with the core, making the the RAM's memory array available to the host as a seekable file: Read and writes to the file result in reads and writes to the memory array. The host may access single memory elements or segments, depending on the length of the read or write operations.

Also, by "faking" the memory array with registers, these registers become easily accessed by host.

The 'empty' and 'full' signals can be used to slow down reads and writes to slow memories, or memories that require some setup before access.

These are the two signals involved:

user_{devfile}_addr -- This core output signal contains the current address. When either a read enable or a write enable is asserted, this is the address to be read from or written to. Connecting this signal directly to a RAM's address input will work as naturally expected. The width of this signal is configurable up to 32 bits. The address wraps to zero. Seek requests out of range will result in the address signal taking the value of the seek request's LSBs.
user_{devfile}_addr_update -- This core output signal is asserted as a result of a seek request from host. The purpose of this signal is to give user logic a chance to indicate that it needs time to prepare data for reading, by asserting the respective 'empty' signal as a result of an address update. Despite what it says above, there is one exception for the rule that 'empty' must be asserted only a clock cycle after an read cycle: It can also be asserted one clock cycle after an update signal.
The following Verilog code is therefore correct:
```
always @(posedge bus_clk)
  if ( { ... memory is ready ... } )
    user_r_mydevice_empty <= 0;
  else if ((user_mydevice_addr_update) &&
           ( user_mydevice_addr > { ... some limit ...} ))
    user_r_mydevice_empty <= 1;
```
And the same in VHDL:
```
process (bus_clk)
begin
  if (bus_clk'event and bus_clk = '1') then
    if ( { ... memory is ready ... } ) then 
      user_r_mydevice_empty <= '0';
    elsif (user_mydevice_addr_update = '1'
           AND user_mydevice_addr > { ... some limit ...} )      
      user_r_mydevice_empty <= '1';
    end if;
  end if;
end process;
```
In this example we can also see, that the address is updated on the same clock cycle for which the update signal is asserted. Note that since 'empty' can be deasserted at any time, it makes sense, if this simplifies the design, to assert 'empty' as a result of every address update, regardless of the address, and then take the time to evaluate if 'empty' can be deasserted.
The 'full' signal can also be asserted in a similar manner, even though it's less clear why this should be useful.

The quiesce signal

The quiesce signal is asserted ('1') when the host has not turned on the Xillybus interface (e.g. driver not loaded yet), or has turned it off. Its intention is to serve as a synchronous reset.

It's most likely not necessary, though: One of the side effects of being in quiescent mode is that all files are closed, so user logic could rely on the *_open signals alone as a reset signal. The 'quiesce' signal can be used as a more global form of reset.

Synchronization with host

One delicate issue which tends to arise in I/O is synchronization: If the user application software issues a write() request, has the data reached application logic by the time the function call returns? And also, if the user application reads from the device file, is it guaranteed that this data wasn't collected from user logic before the call took place? The short answer to both questions is yes, given that the host's C library doesn't cache any data.

This deserves some elaboration: There are two sets of functions available for accessing files: One is the set including functions such as open(), close(), read(), write() etc, and there's a second set including functions such as fopen(), fclose(), fread(), fwrite(), fprintf() and fscanf(). The latter group, with the "f" prefix, may cache data as part of the wrapper function's mechanism. So even though these functions are otherwise fine to use with Xillybus device files, they do not ensure synchronization as is. For example, returning from a fwrite() or fprintf() call says nothing about whether the data has indeed arrived to the user logic in the FPGA.

The recommended set of functions, when synchronization is an issue, are therefore the down-to-earth set without the "f" prefix: open(), close(), read(), write(), lseek() etc. These functions are also used in the demo applications.

When necessary, the "f" prefixed functions can be synchronized by using flushing. This is a C library issue, and is not covered here.

It's important in particular to ensure synchronized I/O when using memory-like interfaces. Since the address lines in hardware are set by a seek operation, predictable behavior requires that these seek operations take place between I/O in hardware. Xillybus' inherent synchronization mechanism assures that, but C library caching can mess it up.

Asynchronous I/O

The most significant drawback of full synchronization is bandwidth performance. Since no data can move unless there is an explicit user application read() or write() waiting, there are necessarily "holes" in time, during which no I/O can take place.

An advantage of asynchronous I/O is that the buffers allocated in the host's RAM effectively become an extension of the FIFO in the FPGA, since they begin to take data as soon as it arrives at the FIFO. So even though memory buffers in host's RAM are suitable for extending the FPGA's FIFO's depth, synchronous I/O can't allow this to happen.

Xillybus interfaces can be configured individually to work in asynchronous mode. This is not shown in the evaluation kit, since it may cause some confusion, but demanding applications should consider this option.