Published: 8 September 2013

General tips

A few words on sporadic issues:

  • By all means, use a recent version of the PCIe block. For example, early versions of Coregen forced automatically generated tags on read requests. Not because the hardware didn’t support it. Coregen wasn’t programmed to support tags generated by the user.
  • m_axis_cq_tdata: Only lower 64 or 128 bits are used (out of 256 available) if the Data Width is lower than 256. Effectively the same as Series-7.
  • Those who miss the start of packet signal in the good old TRN interface, can find it as m_axis_*_tuser[40] (optional use as receiver).
  • For convenience, the Integrated Block sets the byte-enable signals on arriving completions as well, based upon remembering the relevant fields of the request, and pairing the tags. This can simplify the application logic somewhat.
  • The “Straddle” option is relevant only when the data width is 256, in which case it allows two payload-less TLPs in one transfer beat. If enabled, of course.
  • Unlike what the docs say, s_axis_rq_tready and s_axis_cc_tready outputs are 4 bits wide vectors, as opposed to a single bit, as the AXI Stream standard requires. These vectors go all the way to the PCIE_3_0 primitive, which has 4 bits vectors for the SAXISRQTREADY0-3 and SAXISCCTREADY0-3 pins, respectively. Why there are four bits isn’t clear, but the example design uses bit 0 only, so this whole vector thing can be ignored.
  • Again, unlike the docs, m_axis_rc_tready and m_axis_cq_tready outputs are 22 bits wide vectors (?!), as opposed to a single bit, as the AXI Stream standard requires. And again, these vectors go all the way to the PCIE_3_0 primitive, which has 22 bits vectors for the respective MAXISRCTREADY0-21 and MAXISCQTREADY0-21, respectively. The example design sets all bits to the same value, which is the plain _tready signal (the bit is duplicated).
  • There is a “discontinue” signal as m_axis_rc_tuser[42] and m_axis_cq_tuser[41], which is possibly asserted on the last beat of a transmission. This is probably an extremely rare condition (error in the internal FIFO’s data), but the guide requires to discard the TLP that has already been received. In other words, the incoming packet’s data must be stored in some RAM, and related to only after the last beat has arrived with this signal not asserted. And what happens then? The error that caused this is “fatal” (see page 103, “Aborting a Transfer”), so is the packet lost? If it is, who cares what happened with this specific TLP? It’s time for a general shutdown anyhow! It seems like the real use of this signal is to turn on a LED saying “listen, something is deadly wrong here”. For the record, the example design ignores this signal (i.e. m_axis_cq_tuser[41]).
  • Detecting errors has been made simpler: Rather than monitoring the cfg_dstatus register for errors, there are dedicated signals that go high for a single clock: cfg_err_cor_out, cfg_err_nonfatal_out and cfg_err_fatal_out. The clear advantage is that correctable and non-fatal errors can be counted easily (they aren't latched high).
  • Outgoing poisoned packets: Completions arriving at the RC interface have an error code in bits [15:12] of the descriptor, which may contain a “poisoned” status. Bit 46 also indicates poisoning.
  • The documentation is somewhat vague about what it does with incoming packets carrying ECRC (does anyone use that? Really?). But since there’s no way to tell from the packet descriptor or the *_tuser signals whether the digest is present as the last DW in the payload, and the overall saying of the Gen3 block is “dont worry, I’m handling it”, it seems that the entire ECRC saga can be ignored by the application logic. Maybe this is what stands behind “End-to-End Cyclic Redundancy Check (ECRC)” in the feature list of the Product Guide.
  • It’s a good idea to verify that all AXISTEN_IF_xx_PARITY_CHK are FALSE. Unless the application logic generates parity bits on the data lines (spacecraft applications, anyone?).
  • Completions arriving through the RC interface have a 12 bit “Address” field in the descriptor, which is the Lower Address field. It has the same meaning as the Lower Address field in a completion TLP, but the TLP contains only the 7 LSBs. The Gen3 Block hence extends this into 12 bit based upon its internal records of the completion’s state.
  • Read completions have a 13 bit byte count field, which is exactly like the byte count field in the completion TLP, only with the value 4096 given explicitly, rather than the corresponding zero that appear in the TLP in this case.

Notable things that are left the same

Well, more or less the same. But close enough not to require work.

  • The access to the configuration registers (cfg_mgmt_addr etc.). Only some signal names changes.
  • The maximal payload is available as cfg_max_payload rather than cfg_dcommand[7:5]. The meaning of these bits hasn’t changed.
  • The maximal read request size is available as cfg_max_read_req rather than cfg_dcommand[14:12]. The bits’ meaning hasn’t changed here either.
  • Previous pl_sel_lnk_rate is now available as cfg_current_speed (8.0 GT/s is now also an option)
  • The indication of RCB as 64 or 128 byte, previously given as cfg_lcommand[3] is now cfg_rcb_status, two bits wide, one for each physical function.
  • These pieces of configuration data are also available as readings from cfg_per_func_status_data[15:0], followed by setting cfg_per_func_status_control[2:0]. The bit positions have nothing to do with the respective configuration status register. It’s just an invented register set.
  • cfg_dsn remains the same (64-bit device serial number)

Comments and corrections are warmly welcomed in the Xillybus forum. Posting is possibly anonymous.