4 Implementing data acquisition

4.1 Introduction

The need to capture data from an FPGA to a computer often occurs, for example:

  • Frame grabbing from a source of a video signal.

  • Data from an analog to digital converter (ADC).

  • Receiving debug information from the FPGA.

The data rate can be high for applications like this. Nevertheless, the continuity of the data flow must be guaranteed: No loss of data is allowed.

A data acquisition application is easily implemented with Xillybus by writing the data to a FIFO. This section focuses on how to guarantee that the data that arrives to the host is contiguous.

In theory, it’s impossible to ensure a sustained data rate between a peripheral and a computer, since the operating system may deprive the CPU from the application software for as long as it wants.

There are nevertheless methods for maintaining a continuous stream of data. The first and obvious condition to achieve this goal is to use a Xillybus stream that is capable to transport the data at the required rate. On top of that, certain host programming techniques should be used. This issue is discussed extensively in both programming guides:

In particular, pay attention to section 4 of these two guides, which discusses how to work with high data rates.

For high bandwidth applications, it’s also recommended to refer to section 5 of one of these two guides, which contains several topics to be aware of:

But even if the design is carried out perfectly, there is always a possibility that the continuity of the data stream is broken: The nature of the operating system is that it’s allowed to deprive the CPU from the application software for a long period of time.

So the first goal is to make sure that the continuity of the data stream is practically never broken. The second goal is to ensure that if this happens despite all efforts, this event is noticed. Even more important, that all data that arrives to the host is guaranteed to be contiguous.

In order to accomplish this second goal, the application logic should stop the flow of the data at the point where the continuity is broken. An EOF is sent to the host after this point, in order to tell the host that something went wrong. This way, the application software can rely upon that the data that arrives is indeed contiguous.

Ideally, this stopping mechanism should never become active. But when it does, it allows an awareness of the problem, as well as an opportunity to solve it.

In what follows, it’s shown how Xillybus is used to capture data from a continuous source. The emphasis in this section is on ensuring that all data that arrives to the host is a reliable copy of the data source.

4.2 Example code

There are two ways to modify the design for the purpose of including the stopping mechanism:

  • Use a modified FIFO, which stops the data flow if the contiguity is broken. This modified FIFO is a wrapper for a standard FIFO. It also produces the EOF signal.

  • Make modifications to the logic that uses the FIFO.

The example code that is shown and explained below can be downloaded from this link:

https://xillybus.com/downloads/xillycapture.zip

The zip file consists of three files:

  • eof_fifo.v, written in Verilog

  • xillycapture.v, written in Verilog

  • xillycapture.vhd, written in VHDL

There are two ways to try out the example. In both ways, you should edit xillydemo.v or xillydemo.vhd.

The first possibility is to use eof_fifo.v: Replace the instantiation of fifo_32x512 with an instantiation of eof_fifo. Then connect user_r_read_32_eof to this FIFO’s eof port.

In addition to that, it’s recommended to generate a source of data instead of the loopback. This can be a fake data source that is only intended for testing (for example, a counter).

The second possibility is to use xillycapture.v or xillycapture.vhd: Disconnect the signals that are related to read_32 in the demo bundle, and insert the example code in one of these files instead.

Note that in these two files, there’s a signal named “slowdown”. The purpose of this signal is to reduce the data rate of the fake data source. This signal should be removed when a real source of data is used.

In both possibilities, the example code performs an instantiation of a standard dual clock FIFO. The width of this FIFO is 32 bits. Before attempting to perform synthesis of the example code, generate this FIFO with the tools (e.g. Vivado or Quartus). The name of this FIFO should be async_fifo_32. A depth of 512 words is enough.

The rest of this section is based upon xillycapture.v and xillycapture.vhd. But the principles that are explained are relevant for understanding eof_fifo.v as well.

4.3 FIFO connections

Let’s assume that the data source is synchronous with capture_clk. Accordingly, the data is connected the regular way to a standard dual-clock FIFO. This FIFO connects between the data source and the Xillybus IP core.

In Verilog:

   async_fifo_32 fifo_32
     (
       .rst(!user_r_read_32_open),
       .wr_clk(capture_clk),
       .rd_clk(bus_clk),
       .din(capture_data),
       .wr_en(capture_en),
       .rd_en(user_r_read_32_rden),
       .dout(user_r_read_32_data),
       .full(capture_full),
       .empty(user_r_read_32_empty)
       );

And in VHDL:

  fifo_32 : async_fifo_32
    port map(
      rst        => reset_32,
      wr_clk     => capture_clk,
      rd_clk     => bus_clk,
      din        => capture_data,
      wr_en      => capture_en,
      rd_en      => user_r_read_32_rden,
      dout       => user_r_read_32_data,
      full       => capture_full,
      empty      => user_r_read_32_empty
      );

  reset_32 <= not user_r_read_32_open;

This is quite similar to the demo bundle: The FIFO is reset when the file is closed, and its user_r_read_32_* signals are connected as in the demo bundle.

4.4 Data acquisition control

The capture_en signal works as a write enable signal. There are three situations that prevent writing data to the FIFO:

  • When the file is closed

  • When the FIFO is full

  • When the FIFO has been full in the past, since the file was opened

So the condition for capture_en (in Verilog) boils down to:

assign capture_en = capture_open && !capture_full &&
                    !capture_has_been_full ;

And in VHDL:

  capture_en <= capture_open and not capture_full
                and not capture_has_been_full ;

The capture_open signal is a copy of user_r_read_32_open for the clock domain of capture_clk.

In a real-life application, there are often other conditions for writing to the FIFO. For example, waiting for the beginning of a video frame, or waiting for a specific error condition (when using data acquisition for debugging). This kind of conditions can be added to this expression as required (by virtue of a logic AND).

The signal capture_has_been_full changes to high when the FIFO is full, and it returns to low only when the file is closed. So when the FIFO is full, the data acquisition stops and doesn’t start again as long as the file remains open.

IMPORTANT:
In the example code there is a different definition for capture_en, which helps slowing down the fake data source. For a real application, capture_en should be changed to the above.

Now to the code that implements capture_has_been_full in Verilog:

always @(posedge capture_clk)
  begin
    if (!capture_full)
      capture_has_been_nonfull <= 1;
    else if (!capture_open)
      capture_has_been_nonfull <= 0;

    if (capture_full && capture_has_been_nonfull)
       capture_has_been_full <= 1;
    else if (!capture_open)
       capture_has_been_full <= 0;
   end

And VHDL:

  process (capture_clk)
  begin
    if (capture_clk'event and capture_clk = '1') then
      if ( capture_full = '0' ) then
        capture_has_been_nonfull <= '1' ;
      elsif ( capture_open = '0' ) then
        capture_has_been_nonfull <= '0' ;
      end if;

        if (capture_full = '1' and capture_has_been_nonfull = '1') then
          capture_has_been_full <= '1' ;
        elsif ( capture_open = '0' ) then
          capture_has_been_full <= '0' ;
        end if;

    end if;
  end process;

When the FIFO’s capture_full goes high, capture_has_been_full goes high. When the file closes, capture_has_been_full goes low.

The other signal, capture_has_been_nonfull, solves a different issue: The FIFO’s ’full’ signal is high as long as the FIFO is reset. When the ’full’ signal is high because of this reason, capture_has_been_full should not be high. In other words, capture_has_been_full should be high only when capture_full has been low (meaning that the FIFO came out of reset) and then became high (meaning the FIFO was really full).

So this code is a bit complicated, but quite straightforward once the principle is understood.

4.5 Generating EOF

An end-of-file is generated when the two following conditions are met:

  • All data in the FIFO has been consumed (i.e. all data has been read by the IP core).

  • No more data will be written to the FIFO, because the FIFO has been full in the past.

In Verilog, this is written as:

assign user_r_read_32_eof = user_r_read_32_empty && has_been_full;

And in VHDL (note that this is a combinatorial function):

user_r_read_32_eof <= user_r_read_32_empty and has_been_full;

As can be seen in the example code, has_been_full copies the value of capture_has_been_full by virtue of a clock domain crossing to bus_clk.

Note that user_r_read_32_eof goes from low to high as allowed by the API. This is because there is a logical AND with user_r_read_32_empty, as suggested in section 3.3.

4.6 A test run

IMPORTANT:
This test run deliberately shows a bad example of an unsuitable configuration of the IP core. The purpose of this deliberate mistake is to demonstrate how the EOF comes to action. The IP core that was used for this test had small buffers with a synchronous stream. These are incorrect choices for a data acquisition application. A properly configured IP core will not perform as poorly as shown below.

In order to ensure repeatability of the transmitted data, the data source is chosen as a simple counter, which counts the number of sent words. The amount of data until EOF is random: The EOF happened when the computer became busy doing something else, and momentarily neglected the task of reading from the device file.

The test run is shown for Linux, but it can be run on Windows as well. More about running command line utilities can be found in either of these guides:

This is what a test run can look like:

$ cat /dev/xillybus_read_32 > first
$ cat /dev/xillybus_read_32 > second
$ ls -l
total 77740
-rw-rw-r--. 1 liveuser liveuser 71727100 Jul 13 15:31 first
-rw-rw-r--. 1 liveuser liveuser 7874556 Jul 13 15:31 second

So about 71 MB were collected on the first attempt, but only 7 MB on the second attempt. The amount of data in each run depends on how much data was received before the operating system neglected the reading process, in order to do something else. Most likely, the read process was stopped briefly in order to write to the disk.

But even when discarding all data by sending it to /dev/null, it will eventually stop (try “man dd” for more about the dd utility):

$ dd if=/dev/xillybus_read_32 of=/dev/null bs=1M
0+34365 records in
0+34365 records out
140756988 bytes (141 MB) copied, 18.0364 s, 7.8 MB/s
$ dd if=/dev/xillybus_read_32 of=/dev/null bs=1M
0+6027 records in
0+6027 records out
24684540 bytes (25 MB) copied, 3.16028 s, 7.8 MB/s

In both of these two tests, moving the computer’s mouse stopped the data flow. This distracted the operating system enough.

Once again it’s important to emphasize: These are really bad results, because a synchronous stream is used. With an asynchronous stream and the correct amount of DMA buffers, problems of this sort are not expected at all.

And finally, we’ll look what’s in one of the files:

$ hexdump -C -v first | head
00000000 f8 fb a2 01 f9 fb a2 01 fa fb a2 01 fb fb a2 01 |................|
00000010 fc fb a2 01 fd fb a2 01 fe fb a2 01 ff fb a2 01 |................|
00000020 00 fc a2 01 01 fc a2 01 02 fc a2 01 03 fc a2 01 |................|
00000030 04 fc a2 01 05 fc a2 01 06 fc a2 01 07 fc a2 01 |................|
00000040 08 fc a2 01 09 fc a2 01 0a fc a2 01 0b fc a2 01 |................|
00000050 0c fc a2 01 0d fc a2 01 0e fc a2 01 0f fc a2 01 |................|
00000060 10 fc a2 01 11 fc a2 01 12 fc a2 01 13 fc a2 01 |................|
00000070 14 fc a2 01 15 fc a2 01 16 fc a2 01 17 fc a2 01 |................|
00000080 18 fc a2 01 19 fc a2 01 1a fc a2 01 1b fc a2 01 |................|
00000090 1c fc a2 01 1d fc a2 01 1e fc a2 01 1f fc a2 01 |................|

As expected, the data contains a counting up sequence. The counter which is used for generating data is never reset, which is why the sequence doesn’t start at 0.

4.7 Monitoring the amount of buffered data

It’s often desired to keep track on how much data is held in Xillybus’ buffers that belong to a specific stream. This can help in controlling latency, preventing overflow or underflow, or to prevent the application software from sleeping during function calls to read() or write().

For example, with relation to the data flow from the FPGA to host: There may be an amount of data that is stored in the buffers, because the IP Core has read this data from the FIFO in the FPGA, but the application software has not consumed this data yet. It’s often desirable to know how much data is waiting like this.

Likewise, in the opposite direction: There may be data that the application software has written to the stream, but it has not reached the FIFO in the FPGA yet. The direct reason is that the FIFO in the FPGA is full, so no more data can be accepted from the IP core. However, the real explanation is that the data is waiting to be consumed by the application logic.

Xillybus doesn’t provide a dedicated feature for estimating the amount of data in the buffers. However, there’s a simple way to implement this functionality by using Xillybus’ existing features, as shown next.

To explain the suggested solution, let’s say that one of the streams in the demo bundle (FPGA to host, 32 bits) is used for data acquisition.

The following counter is used to count the number of data words that were fetched from the FIFO (by the IP core) since the file was opened:

reg [31:0] count_data;

always @(posedge bus_clk)
  if (!user_r_read_32_open)
    count_data <= 0;
  else if (user_r_read_32_rden)
    count_data <= count_data + 1;

count_data can be a register in an array of registers, as suggested in section 3.4.

An alternative solution is to add another Xillybus stream (from the FPGA to the host) to the IP core. This stream is used to send the value of count_data to the host by connecting count_data directly to data port of this additional stream (i.e. the port that is usually connected to a FIFO’s data output).

The ’eof’ port and ’empty’ port of this stream should be held constantly low. This stream should be configured as a synchronous stream, by setting the “use” parameter in the IP Core Factory to “Command and status”. As a result, the application software can read 4 bytes from this stream at any time, in order to get the updated value of count_data.

Note that count_data is synchronous with bus_clk, and can therefore be connected directly to the data port of the Xillybus IP core.

The amount of data in the buffers can be calculated as the difference between count_data and the amount of data that the application software has read from its device file since it was opened (i.e. /dev/xillybus_read_32 in this example). The software must keep track on the amount of data that it reads from this stream, of course.

In the opposite direction (from host to FPGA) a similar counter can be maintained in the FPGA with

reg [31:0] count_data;

always @(posedge bus_clk)
  if (!user_w_write_32_open)
    count_data <= 0;
  else if (user_w_write_32_wren)
    count_data <= count_data + 1;

This works by the same principle: The application software keeps track on how much data it writes to the relevant device file. The application software reads count_data when there is a need to know how much data is stored in the buffers. This amount of data is calculated as the difference between how much data has been written (to the device file since it was opened) and the value of count_data.

Note that in the discussion so far, the data in the FIFOs wasn’t included in the calculation: Only the data that Xillybus keeps in its buffers was taken into account. Sometimes it’s desired to get an end-to-end number, including the data that is stored in the FIFOs. For this purpose, the operations on the opposite side of the FIFOs should be counted. In other words, this is the number of elements that are written to the FIFO for a stream from the FPGA to the host. In the opposite direction, this is the number of elements that are read from the FIFO.

However, if the other side of the FIFO is synchronous with a different clock (e.g. capture_clk as presented previously), this might be harder to implement. That is because count_data needs to be synchronous with this other clock as well. As a result, a clock domain crossing is necessary to connect count_data’s value to the IP core. Hence there is a tradeoff between accuracy and simplicity when two different clocks are connected to the FIFO.