BxB Logo BxBFFT for Xilinx Versal

The BxBFFT is an amazing high-speed streaming Fast Fourier Transform, and one of the FPGAs with standard support is the Xilinx Versal FPGA family. In the Versal the BxBFFT has all the advantages specified on the main BxBFFT page, plus additional advantages specific to the Versal that are documented here.

The BxBFFT's advantages are highlighted below in a series of plots. These plots show important FFT statistics as a function of FFT size and Complex Points Per Clock (PPC), where PPC is a measure of the FFT's speed -- the number of input samples processed in parallel on every clock. PPC is sometimes also called the SuperSample Rate (SSR).

The BxBFFT doesn't always do the best in every category for every FFT size and PPC, but it consistently does well and is often at the top. Between that and the large number of supported features, the BxBFFT is an option to be highly considered.

On these plots, all FFTs are run with parameters matched as closely as possible, which means their features are reduced to those that most FFTs support. This means 18-bit operation, and fully natural data order in/out.

Note that the Astron and CAStron FFTs only support output data in a partially natural order. This was deemed to be close enough to fully natural order for comparison.

Some FFTs do not support certain values for FFT size and PPC, and the corresponding points on those FFT's plots are missing.

Power Savings

It is not uncommon for an FPGA design to approach either power limits or resource limits. Even when this is not true of a baseline design, it often becomes true because of the introduction of new product features. Power consumption of an FFT can thus make or break a design, or allow or disallow product upgrades. Power consumption also affects product life and reliability, as high consumption puts extra stress on the power supply, and high temperatures and large temperature swings increase the rate of component degradation. High-speed FFTs require intensive processing, and thus may use a large percentage of the total power consumption of a design. Thus power reduction in the FFT can be of particularly high importance.

The BxBFFT is highly optimized for power consumption. Multiple customers have found that a switch to the BxBFFT saved significant amounts of power in their designs, making those designs viable where before they were not.

Below are results from Xilinx Vivado synthesis for power consumption of the BxBFFT vs several other FFTs. It shows that BxBFFT power is typically lower than other FFTs by a factor of 1.2X to 1.5X in Versal FPGAs.

Versal Power

Resource Savings

FPGA resources are another common design limitation. Designs that use fewer resources have more margin for initial implementation and for future upgrades. For the same design, they can use fewer FPGAs of smaller size and be cheaper to manufacture. They also often achieve higher clock speeds because resources do not become tightly constrained.

The BxBFFT uses substantially fewer FPGA LUTs than competing FFTs in Versal FPGAs, as shown in the graph below. DSPs are also excellent, as shown on the following graph. Required memory is not significantly different among the best FFTs. Sometimes FFTs use fewer BRAMs by using distributed LUT memory instead. This can be seen in some cases where BRAM usage is especially low but LUTs are especially high.

Versal LUTs Versal DSPs Versal BRAMs

Throughput and Latency Advantages

Sometimes designs need to meet strict real-time requirements, either in throughput or in latency. Both of these improve when an FFT runs faster. A faster FFT can be achieved with a higher achieved FPGA clock rate (Fmax) or with increased PPC. Throughput is Fmax * PPC.

One issue is that as PPC increases, more resources are used, there is more resource contention, and thus the achieved Fmax of an FFT goes down. This may make the desired throughput unachievable.

For the BxBFFT, Fmax degrades less from resource contention than for other FFTs. BxBFFTs are thus able to achieve higher throughput, because a high Fmax and high PPC are simultaneously achievable. The graph below shows this. The BxBFFT achieves high PPC and high Fmax simultaneously, when the other FFTs do not. Thus the BxBFFT provides the best throughput and latency.

Versal Fmax

Xilinx-Specific Ease of Use and Productivity Enhancements

The BxBFFT was designed to get you running quickly. It has features to make configuration, synthesis, and simulation faster and easier, saving NRE. Many of these features are mentioned on the main BxBFFT page. One productivity feature is specific to Xilinx FPGAs, which is support for IP Integrator. This allows quick integration of the BxBFFT with Xilinx IP. Most major BxBFFT features are controllable from a GUI selection box with this approach. This allows extreme ease-of-use.

For those using Xilinx block designs, this is the fastest way to instantiate and configure a BxBFFT.

Conclusions

These results illustrate how the BxBFFT is superior in most ways to other FFTs in Xilinx Versal FPGAs. It uses less power, uses fewer resources, and attains higher speeds. It is unmatched at almost all FFT sizes and speeds. It is unmatched in supported features. It is also cross-platform, supporting both Xilinx and Altera FPGAs, with a path into ASICs.

Links

Bit by Bit Signal Processing Main Page
BxBFFT Product Main Page with these pages for specific FPGAs:
Xilinx Ultrascale FPGAs
Xilinx Versal FPGAs
Altera Agilex7 FPGAs
Altera Stratix10 FPGAs
Altera Arria10 FPGAs
BxBFFT Product Comparison PDF
BxBChan Product Main Page
BxBApp Demonstration
Tutorials
Email Contact: ross@bitbybitsp.com
Phone Contact: +1-623-487-8011 (this has automated call screening)