5 Making modifications
5.1 Integration with custom logic
The Xillinux distribution is set up for easy integration with application logic. The front end for connecting data sources and data consumers is the xillydemo.v or xillydemo.vhd file (depending on the preferred language). All other HDL files in the boot partition kit can be ignored for the purpose of using the Xillybus IP core as a transport of data between the Linux host and the logic fabric.
Additional HDL files with custom logic designs may be added to the project presented in paragraph 3.3, and then rebuilt the same way it was done in the beginning. To execute a boot of the system with the updated logic, copy the new xillydemo.bit into the (Micro)SD card’s boot partition, overwriting the existing one. Note that it’s possible to use the Zynq board itself for copying xillydemo.bit into the boot partition, as shown in paragraph 3.5.
There is no need to repeat the other steps of the initial distribution deployment, so the development cycle for logic is fairly quick and simple.
Programming the PL part through JTAG is not supported.
When attaching the Xillybus IP core to custom application logic, it is warmly recommended to interact with the Xillybus IP core only through FIFOs, and not attempt to mimic a FIFO’s behavior with logic, at least not in the first stage.
An exception for this is when connecting Xillybus with a block RAM or with registers, in which case the method shown in the xillydemo module should be followed.
In the xillydemo module, FIFOs are used to perform a loopback of data arriving from the host and back to it. Both of the FIFOs’ sides are connected to the Xillybus IP core, which makes the core function as its own data source and data consumer.
In a more useful scenario, only one of the FIFO’s ends is connected to the Xillybus IP core, and the other end to an application data source or data consumer.
The FIFOs used in the xillydemo module accept only one common clock for both sides, as both sides are driven by Xillybus’ main clock. In a real-life application, it may be desirable to replace them with FIFOs having separate clocks for reading and writing, allowing data sources and data consumers to be driven by a clock other than the bus clock. By doing this, the FIFOs serve not just as mediators, but also for proper clock domain crossing.
Note that the Xillybus IP core expects a plain FIFO (as opposed to First Word Fall Through), for streams from the FPGA to host.
The following documents are related to integrating custom logic:
-
The API for logic design: Xillybus FPGA designer’s guide
-
Basic concepts with the Linux host: Getting started with Xillybus on a Linux host
-
Programming applications: Xillybus host application programming guide for Linux
-
Requesting a custom Xillybus IP core: The guide to defining a custom Xillybus IP core
5.2 Using other boards
Before attempting to run Xillinux on a board other than Z-Turn Lite, Zedboard, MicroZed or Zybo, certain modifications may be necessary.
It’s however not recommended to attempt adapting Xillinux to other hardware, as the procedure is difficult. Experience shows, that if the purpose of adapting Xillinux is other than to use the Xillybus IP core, it’s easier to start from scratch.
This is a partial list of issues to pay attention to.
-
A purchased board should have an XML file as a reference (for use as ps7_system_prj.xml). This file contains the processor’s settings, including de-facto use of the MIO pins and the electrical parameters of the DDR pins. The recommended practice is adopting the reference file, at least as a starting point.
-
If an XML file is adopted as reference, the FPGA CLK1 (FCLK_CLK1) must be set to 100 MHz, regardless of what the reference file says.
-
If changes are made manually, attention should be paid to the processor core’s MIO assignments: The ARM core has 54 I/O pins which are routed to physical pins on the chip with a fixed placement. The ARM core is configured in the project’s block design to assign specific roles to these pins (e.g. USB interface, Ethernet etc.), which must match what these pins are wired to on the board.
-
If changes are made in the processor’s configuration (i.e. in the XML file), the boot.bin must be rebuilt, based upon an FSBL (First Stage Boot Loader) that is derived from the new XML file, and a U-boot binary. The changes made in Vivado’s block design tool take effect through the initialization routine that is part of the FSBL. This routine writes to registers in the ARM processor, with values that reflect the settings made in Vivado, and exported to the SDK. Note that the parameters of the processor in the Vivado project may not be accurate, so the FSBL should be generated based upon the XPS project available in the bundle. To set up the sources for U-boot, please refer to the README file in /usr/src/xillinux/uboot-patches/.
-
Alternatively, in some cases, rebuilding boot.bin can be avoided by pinpointing the changes in the registers’ settings, by virtue of the “poke” feature. See paragraph 5.6.
-
It may also be necessary to make changes in devicetree.dtb, in order to reflect the new setting. The sources of the existing DTB (in DTS format) can be found in the sources of the Linux kernel (see paragraph 6.2).
-
The VGA/DVI outputs (if applicable) need to be matched to the intended board. This is done by editing the xillybus.v file in the src/ subdirectory. Note that the signals arriving from the “system” module are 8 bits wide, and the truncation to 4 bits takes place in xillybus.v. Hence it’s fairly easy to connect these signal to any encoder chip for VGA/DVI.
5.3 Changing the frequencies of clocks in the system
The ARM processor’s core supplies four clocks for use by the logic fabric, commonly referred to as FCLK_CLKn. It’s important to note, that their frequencies are set by the FSBL (First Stage Boot Loader), before U-boot is loaded.
Accordingly, even though the clocks’ frequencies are set in Vivado, these frequencies are effective only for propagating timing constraints and initialization by bare-metal applications that are compiled on the SDK.
If the hardware application requires different frequencies, the following series of actions is suggested:
-
Update the frequencies of the clock in Vivado.
-
Rebuild the netlist (this is necessary for updating the timing constraints in the .ncf files)
-
Export the project to SDK, and create an FSBL application project based upon this.
-
Learn the registers’ settings required for the desired setting from Vivado’s reports.
-
Adjust as necessary, by virtue of the “poke” feature, as described in paragraph 5.6.
Please refer to AMD’s guides for the details of how to perform each step (except for the last).
5.4 Taking over GPIO I/O pins for PL logic
5.4.1 Z-Turn Lite
While the Z-Turn Lite board itself supplies no convenient way to access the I/O pins for lab purposes, attaching it to the Z-Turn Lite IO Cape board exposes 68 I/O pins and a push button through standard connectors, in addition to the HDMI interface.
All of these 68 pins are wired to two 40-pin connectors, J3 and J8, to which standard flat ribbon cables can be attached. The IO Cape board has a few additional connectors, which share pins with J3 and J8. Since all of the additional connectors’ pins are available on J3 and J8, Xillydemo’s top-level module’s ports for these pins are vectors named J3 and J8, with pin placement constraints routing them to the corresponding connectors.
The vectors in the ports in Verilog / VHDL of both J3 and J8 correspond to the pin numbers of the connectors minus 3: The signal J3[0] in Verilog / VHDL goes to physical pin J3/3. J3[1] goes to J3/4 etc. up to J3[33] going to J3/36. Same goes with J8.
For the sake of simplicity, all pins belonging to J8 are connected in xillydemo.v and xillydemo.vhd to the processor’s GPIO pins, and can be controlled directly from software running on the processor. All pins of J3 are driven low by the xillydemo.v and xillydemo.vhd, and can easily be utilized by application logic by modifying the relevant xillydemo module file.
This division of the pins to GPIO and pins driven by application logic is also easy to change, by altering their wiring in the xillydemo module. If the number of pins that are used as GPIO is changed, the gpio_width instantiation parameter (generic) of the xillybus module should be altered accordingly. It currently stands at 35, which accounts for 34 I/O pins going to the J8 connector, plus one GPIO for the Cape Board’s pushbutton.
And as mentioned before, there are additional connectors on the board which share their pins with J3 and J8, which can still be used. This requires however looking up which pins goes where in the Cape Board’s schematics.
One side effect of this pin sharing is that some pins, which are used as I2C by their alternative connectors, have pull-ups on the Cape board: J3[19], J3[18], J8[28] and J8[31] (using Verilog / VHDL vector signal notation). As these are pull-ups with resistors on the board, they are in effect even when using these pins on the J3 and J8 connectors.
The HDMI connector is independent, and shares no pin with neither J3, J8 nor any of the other connectors.
5.4.2 Zedboard and Zybo
On the Zedboard and Zybo boards, many physical I/O pins are connected to the ARM processor’s GPIO ports (PS), which allows controlling and monitoring these pins directly from Linux. It’s however often desired to connect these physical pins to FPGA logic instead (i.e. to PL).
The technique for using Zynq’s PL pins for I/O is exactly the same as with any AMD FPGA: The signals are exposed in the toplevel module (xillydemo.v or xillydemo.vhd) as inputs, outputs or inouts. The assignment of physical pins to these signals takes place in xillydemo.xdc.
Since the pins are used as GPIO signals, they are taken away from the processor and given to the PL part. For example, the following line in the XDC file,
set_property -dict "PACKAGE_PIN U5 IOSTANDARD LVCMOS33" [get_ports "PS_GPIO[55]"]
can be replaced with
set_property -dict "PACKAGE_PIN U5 IOSTANDARD LVCMOS33" [get_ports "my_output"]
if we want my_output to appear on pin U5.
But doing this replacement causes PS_GPIO[55] to lack a pin assignment. Even though there is a chance that AMD’s tools will place this port automatically during implementation, it’s recommended to assign any evicted PS_GPIO with an I/O pin. The alternative is to eliminate the signal, as explained below.
So there are two solutions for these evicted PS_GPIO signals:
-
The easy way: Finding unused pins on the device, and assign these pins to the evicted PS_GPIO signals. Even though it’s not a very clean solution (GPIO pins are connected to just anything on the board), it’s practically harmless, because GPIOs are inputs by default. The electrical condition on these pins remains, unless the GPIO gets driven by software accidentally (which isn’t likely). For example, on Zedboard, the FMC connector often supplies many unused pins.
-
The harder way: Reducing the number of PS_GPIO pins. This may be necessary on Zybo, which doesn’t have many vacant pins.
In what follows, the second solution is discussed. For example, let’s assume that PS_GPIO[55:48] were removed from the XDC file, for the sake of replacing their pins with signals from the PL. Note that if the pins from lower PS_GPIO indexes were needed, the evicted PS_GPIO signals should take over the pins of those with the highest indexes, and the latter are then eliminated. There is no possibility to eliminate a certain range of PS_GPIO indexes, only reduce the maximal index.
The width of PS_GPIO should be reduced in xillydemo.v/vhd to reflect those that have pin assignments in the XDC file.
This is not enough however. Attempting to build the project at this state, critical warnings will be issued for these pins (possibly claiming that these are multiple driven, with high-Z and GND).
To resolve this, edit vivado-essentials/system.v in the following part:
generate
for (i=0; i<56; i=i+1)
begin: gpio
assign gpio_tri_i[i] = processing_system7_0_GPIO[i];
assign processing_system7_0_GPIO[i] = gpio_tri_t[i] ? 1'bz :
gpio_tri_o[i];
end
endgenerate
Reduce the index range (i.e. the i<56 part) to the number of used GPIOs (48 in the example).
Depending on Vivado’s revision, it might also be necessary to adjust the widths of the signals.
It case the GPIO width needs to be corrected in the block design: In Vivado’s main window, click “Open Block Design” on the left column. Right-click the processor block (processing_system_7_0, with the ZYNQ marking) and select “Customize block”. Select “MIO Configuration” on the left column, expand the “I/O Peripherals” hierarchy, and expand the GPIO hierarchy (at the bottom). The EMIO GPIO (Width) parameters currently stands at 56, the number of GPIO pins. Reduce it to the desired number (48 in this example).
5.5 Working with 7020 MicroZed
The boot partition kit available for MicroZed is intended for 7010 MicroZed boards by default. It’s however possible to work with 7020 MicroZed using Vivado, after making a minor change in the xillydemo-vivado.tcl file used to create the Vivado project (that is, verilog/xillydemo-vivado.tcl or vhdl/xillydemo-vivado.tcl in the bundle, depending on the language chosen).
Just after unzipping the kit (and before using it in Vivado), the file should be edited, changing the line saying
set thepart "xc7z010clg400-1"
(around line 11) to
set thepart "xc7z020clg400-1"
The rest of the build process is exactly the same.
5.6 Pre-boot manipulation of hardware registers (“poke”)
It’s often desirable to make slight changes in the ARM processor’s hardware setup without rebuilding the boot.bin file (paragraph 5.3 discusses a typical rebuild sequence).
For example, slight changes in the processor’s MIO/EMIO configuration result in a few changes in the registers’ settings, which can be deduced quite easily by looking for differences in the reports that are generated when the system’s settings are exported to the software tools.
The processor’s hardware registers are documented in AMD’s Zynq-7000 AP SoC Technical Reference Manual, also known as the TRM or ug585.
In order to manipulate registers, an entry is added to the kernel’s device tree (typically by editing the respective DTS file given in the Linux kernel sources, see paragraph 6.2).
An entry like the following example is added anywhere in the device tree’s hierarchy (preferably after the “chosen” entry):
poke {
compatible = "xillybus,poke-1.0";
sequence = < 0 0xf8002000 0
0 0xf800200c 0
0 0xf8002018 0
1 0xf800200c 0x20
0 0xf8002018 0
0 0xf8002018 0
1 0xf800200c 0x21
0 0xf8002018 0
0 0xf8002018 0
>;
};
The “sequence” part should be altered to set up the desired sequence of register reads and writes. Each operation is defined by three values in the “sequence” array of elements. There is no restriction on the number of triplets (and hence operations). The formatting of the “sequence” entry above, with tabs and three values per row, has no syntactic significance – what matters is that each triplet represents an operation as follows:
-
First element: Read or write. A value of 0 means read, otherwise write.
-
Second element: The address. Must be 32-bit aligned (2 LSBs of the address must be zero).
-
Third element: The value to write. Ignored on read operations.
The operations are carried out in the order listed in the device tree entry, with unpredictable delays between each operation.
In the example above, which has no practical significance, the registers of the processor’s ttc2 (Triple Timer Counter 2) are manipulated: The first three operations read registers for the sake of demonstration. Then the counter is enabled briefly, its counter value is read twice to show that it’s changing, and then the counter is disabled. After this, the counter value is read twice again, to show that is has stopped.
The result of these operations can be found in the kernel’s message log, which is available at the serial console (UART), and/or with the dmesg command at shell prompt:
[ 0.000000] poke read addr=f8002000: value=00000000 [ 0.000000] poke read addr=f800200c: value=00000021 [ 0.000000] poke read addr=f8002018: value=00000000 [ 0.000000] poke write addr=f800200c: value=00000020 [ 0.000000] poke read addr=f8002018: value=00000009 [ 0.000000] poke read addr=f8002018: value=00004f68 [ 0.000000] poke write addr=f800200c: value=00000021 [ 0.000000] poke read addr=f8002018: value=000013ec [ 0.000000] poke read addr=f8002018: value=000013ec
The values read from 0xf8002018 vary, of course, as they’re read from a running counter.
The register modification takes place early in the kernel’s boot process, before any device driver is loaded. Note however that U-boot sets up a few of the ARM processor’s hardware peripherals before Linux starts its boot process, so they are already active. Also note that modifying registers that relate to the ARM processor’s basic functionality (e.g. clocks and interrupts) may disrupt the proper functionality of the processor itself. This can happen even though the interrupts are disabled by the kernel at the time the “poke” is executed.
Attempting to access addresses that are disallowed, either by the kernel’s memory management and/or by the hardware itself, leads to a kernel Oops, kernel panic and possibly a complete freeze. The two latter result in a boot failure. A kernel panic on grounds of “imprecise external abort (0x406)” is probably due to an attempt to access an hardware-wise illegal address.
Moreover, since the early kernel console messages are stored in an internal memory buffer at the stage they’re generated, and written to the console only at a later stage (when the serial port is set up), an early freeze is likely to result in no output to the console at all – nothing appearing after U-boot’s “done, booting the kernel” message.
Hence when no kernel messages appear on the console, it doesn’t necessarily mean that the kernel didn’t kick off. It could be a result of a freeze before the kernel messages, that were stored in memory, were written to the console. The reason can be an illegal modification of a register.
The “poke” feature was added by patching the kernel of Xillinux-2.0 specifically, and is not part of mainline Linux kernels.
