2 Usage
2.1 Sample design
A sample design, consisting of a connected IP core, a preinstalled Linux driver and a couple of simple demo user space programs is included in Xillinux (versions 1.1 and later).
On the logic side, the xillydemo.v(hd) module source file contains an implementation of a 32x32 bit RAM, which is inferred from an array. This RAM is accessed in the host’s sample program. Its compilation and execution can be done directly on Xillinux as follows:
# make gcc -g -Wall -I. -O3 -c -o uiotest.o uiotest.c gcc -g -Wall -I. -O3 uiotest.o -o uiotest gcc -g -Wall -I. -O3 -c -o intdemo.o intdemo.c gcc -g -Wall -I. -O3 intdemo.o -o intdemo # ./uiotest /dev/uio0 4096 0123
The C sources can be found in Xillinux’ file systems at /usr/src/xillinux/xillybus-lite/ (version 1.1 and up).
The “uiotest” program merely writes four values to the first 32-bit elements in the register array, and then reads back and prints their values, but it’s easily changed into something more useful.
The “intdemo” program shows how interrupts are handled. Since the sample logic doesn’t trigger any interrupts, there’s no point running it as is. Nevertheless, it shows how interrupts are waited for.
2.2 Interface with Host application
Xillybus Lite is based upon Linux’ User I/O interface (UIO), which represents a peripheral as a device file which is primarily accessed by its memory mapping. To obtain access, the following code applies:
#include <sys/mman.h>
int fd;
void *map_addr;
int size = ...;
fd = open("/dev/uio0", O_RDWR);
if (fd < 0) {
perror("Failed to open devfile");
exit(1);
}
map_addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED,
fd, 0);
if (map_addr == MAP_FAILED) {
perror("Failed to mmap");
exit(1);
}
Except for error checking, this code snippet performs two operations:
-
Calling the function open() to open the device file (obtaining a file handle).
-
Calling the function mmap() to obtain an address for accessing the device. The second parameter (“size”) is the number of bytes that are mapped. It must not exceed the number of bytes allocated for the peripheral, according to the device tree (4096 on unmodified Xillinux).
map_addr is an address in the virtual memory space of the process, but for all purposes it can be treated as if it was the physical address to which the peripheral is mapped in a bare-metal environment (i.e. with no operating system).
The allowed access range goes from mem_addr to mem_addr + size - 1, where “size” is the second argument given to mmap(). Attempting to access memory beyond this range may cause a segmentation fault.
With the address at hand, writing and reading a 32-bit word to the register at the peripheral’s base address (offset zero) is just:
volatile unsigned int *pointer = map_addr; *pointer = the_value_to_write; the_value_read_from_register = *pointer;
On the specific memory region, memory caching is disabled by the Linux driver, and the pointer is flagged volatile. Hence each read and write operation in the program triggers a bus operation, and consequently an access cycle on the Xillybus Lite’s logic interface signals.
IMPORTANT:
The pointer must be flagged as volatile with the “volatile” keyword, as shown in the example above. The lack of this flag will allow the C
compiler to reorder and possibly optimize out I/O operations.
It is also fine to access the peripheral with an 8-bit volatile char pointer or a 16-bit volatile short int pointer, provided that the logic supports byte granularity access.
In the example above, it’s assumed that only one Xillybus Lite peripheral is present. The first instance, “/dev/uio0” is therefore opened. If additional UIO devices are present (e.g. there’s more than one Xillybus Lite instance), they are represented as /dev/uio1, /dev/uio2, etc.
In order to know which device file belongs to which logic element, the application should obtain the information in /sys/class/uio/ (e.g. /sys/class/uio/uio0/name or /sys/class/uio/uio0/maps/map0/addr). The udev framework is recommended for consistent naming of the device files when several UIO devices are created.
2.3 Interface with logic design
2.3.1 Register related signals
The Xillybus Lite IP core presents seven signals to the application logic, given here in Verilog format:
output user_clk; output [31:0] user_addr; output user_wren; output [3:0] user_wstrb; output [31:0] user_wr_data; output user_rden; input [31:0] user_rd_data;
The interface is synchronous, and based upon user_clk, which is provided by Xillybus Lite (it’s wired to the AXI Lite clock of the processor).
The signal names above are those appearing in the Xillydemo module (part of the Xillinux bundle). The signals’ names in the processor’s module are slightly different, e.g. user_wren may appear as xillybus_lite_0_user_wren_pin.
These signals can be connected directly to a standard block RAM, in which case the host gets direct access to that RAM (which can be used as a “mailbox” if a dual-port RAM is chosen). They can also be connected to registers defined in logic, as detailed below.
2.3.2 Module hierarchy
When an AMD logic design involves an embedded processor, there is a module representing it, typically instantiated in the top level module. Usually, the ports exposed by this module are all connected directly to physical pins, following the paradigm that the processor is the center of things, and that any logic around it is some kind of peripheral.
Xillybus Lite is intended for interfacing with a substantial piece of application logic, and therefore breaks this common structure somewhat: Its user_* signals are intended for routing to the top level module, so custom logic is instantiated in that top level module as well. The overall project’s structure ends up in two large chunks: An instantiated module that contains a processor and its IP cores (including the Xillybus Lite IP core) and a second module with the application logic. The user_* signals connect between the two.
So even though the Xillybus Lite IP core itself is instantiated by AMD’s tools somewhere deep inside the processor’s hierarchy, it is interfaced with from the top level module.
This is the chosen layout in the demo bundle for Xillinux (shown in the drawing below) and also what this guide assumes. It’s possible to connect Xillybus Lite’s signals internally within the processor’s hierarchy, but it’s not necessarily going to make things simpler.
2.3.3 32-bit aligned register access
To access a 32x32 bit array in the logic (“litearray” below), code like the following can be used. This works fine only if the host sticks to 32-bit word access (using pointers to e.g. unsigned int only):
In Verilog:
always @(posedge user_clk)
begin
if (user_wren)
litearray[user_addr[6:2]] <= user_wr_data;
if (user_rden)
user_rd_data <= litearray[user_addr[6:2]];
end
Or in VHDL:
lite_addr <= conv_integer(user_addr(6 DOWNTO 2));
process (user_clk)
begin
if (user_clk'event and user_clk = '1') then
if (user_wren = '1') then
litearray(lite_addr) <= user_wr_data;
end if;
if (user_rden = '1') then
user_rd_data <= litearray(lite_addr);
end if;
end if;
end process;
The waveforms for an aligned write cycle and any read cycle are:
Notes:
-
Any bus operation on the address region allocated in XPS to the Xillybus Lite peripheral always results in user_wren or user_rden being high for exactly one clock cycle.
-
The user_rd_data is sensed by the Xillybus Lite core only one clock cycle after user_rden is high. There is hence no practical need to monitor user_rden: It’s also fine to always update user_rd_data depending on user_addr (with one clock’s latency), for instance,
always @(posedge user_clk) user_rd_data <= litearray[user_addr[6:2]];
-
The code above demonstrates access of a 32-bit wide array of 32 elements. A more common setting is accessing registers e.g. in Verilog
always @(posedge user_clk) if ((user_wren) && (user_addr[6:2] == 5)) myregister <= user_wr_data;for mapping “myregister” at address offset 0x14.
-
Likewise, a case statement that depends on user_addr is the common implementation of user_rd_data’s value assignment, such as
always @(posedge user_clk) case (user_addr[6:2]) 5: user_rd_data <= myregister; 6: user_rd_data <= hisregister; 7: user_rd_data <= herregister; default: user_rd_data <= 0; endcase -
user_addr is 32 bit wide, and holds the full physical address being accessed. Since the enable signals are high only when the address is within the allocated range, there is no need to verify the address’ MSBs.
-
Always ignore user_addr[1:0]. These two LSBs are always zero on 32-bit aligned bus accesses, and as explained below, they should be ignored even for unaligned access.
2.3.4 Unaligned register access
When there’s a possibility that the host will access the register space in a 32-bit unaligned manner, each byte needs to be handled separately in the logic.
Note that accessing a byte and a 32-bit word on the bus take the same time, so unaligned access is bandwidth inefficient by four times.
Suppose that litearray3, litearray2, litearray1 and litearray0 are memory arrays of 32 elements with 8 bits each. The following code snippets demonstrate how the examples in 2.3.3 are rewritten to support unaligned access. In Verilog:
always @(posedge user_clk)
begin
if (user_wstrb[0])
litearray0[user_addr[6:2]] <= user_wr_data[7:0];
if (user_wstrb[1])
litearray1[user_addr[6:2]] <= user_wr_data[15:8];
if (user_wstrb[2])
litearray2[user_addr[6:2]] <= user_wr_data[23:16];
if (user_wstrb[3])
litearray3[user_addr[6:2]] <= user_wr_data[31:24];
if (user_rden)
user_rd_data <= { litearray3[user_addr[6:2]],
litearray2[user_addr[6:2]],
litearray1[user_addr[6:2]],
litearray0[user_addr[6:2]] };
end
Or in VHDL:
lite_addr <= conv_integer(user_addr(6 DOWNTO 2));
process (user_clk)
begin
if (user_clk'event and user_clk = '1') then
if (user_wstrb(0) = '1') then
litearray0(lite_addr) <= user_wr_data(7 DOWNTO 0);
end if;
if (user_wstrb(1) = '1') then
litearray1(lite_addr) <= user_wr_data(15 DOWNTO 8);
end if;
if (user_wstrb(2) = '1') then
litearray2(lite_addr) <= user_wr_data(23 DOWNTO 16);
end if;
if (user_wstrb(3) = '1') then
litearray3(lite_addr) <= user_wr_data(31 DOWNTO 24);
end if;
if (user_rden = '1') then
user_rd_data <= litearray3(lite_addr) & litearray2(lite_addr) &
litearray1(lite_addr) & litearray0(lite_addr);
end if;
end if;
end process;
The waveform for an unaligned write cycle on a single byte with 0x01 offset from the base address follows.
Notes:
-
A write bus operation on the allocated address region always results in user_wren and at least one of user_wstrb’s bits being high simultaneously for one clock cycle. As shown above, if the value assignment depends on user_wstrb, there is no need to check user_wren.
-
Unaligned read accesses are handled the same by the logic as aligned ones. For example, when the program running on the processor reads a byte, the whole 32-bit word is read on the bus, and the processor picks the required portion from the word.
-
user_addr[1:0] may be non-zero when the address required by the processor is unaligned. This has no significance, since the logic’s correct behavior on write cycles depends on user_wstrb only. These two bits are therefore best ignored even for unaligned access.
2.4 Interrupts
The Xillybus Lite IP core exposes an input signal, user_irq, which allows the application logic to send hardware interrupts to the processor. It is treated as a synchronous positive edge-triggered interrupt request signal, i.e. an interrupt is generated when this signal changes from low to high from one clock cycle to the next.
This signal is held zero in the xillydemo.v(hd) module.
Xillybus Lite adopts UIO’s method of handling interrupts: The user space program sleeps as it attempts to read data from the device file. When the interrupt arrives, four bytes of data is read, waking up the process. These four bytes should be treated as an unsigned int, having the value of the total number of interrupts that have been triggered since the driver was loaded. The program may ignore this value, or use it to check if interrupts have been missed, by verifying that the value is one plus the value previously read.
Note that in a normal system’s operation, this interrupt counter is never zeroed.
For example, assuming that “fd” is the file handle to /dev/uio0:
unsigned int interrupt_count;
int rc;
while (1) {
rc = read(fd, &interrupt_count, sizeof(interrupt_count));
if ((rc < 0) && (errno == EINTR))
continue;
if (rc < 0) {
perror("read");
exit(1);
}
printf("Received interrupt, count is %d\n", interrupt_count);
}
Note that the read() function call must require 4 bytes. Any other length argument will return an error. The interrupt file descriptor may be used in select() function calls.
Also note that the part checking for EINTR handles software interrupts properly (e.g. the process being stopped and restarted) and has nothing to do with the hardware interrupt.
