The BxBFFT is an amazing high-speed streaming Fast Fourier Transform, with full support for the Altera Stratix10 FPGA family. In the Stratix10 the BxBFFT has all the advantages specified on the main BxBFFT page, plus additional advantages specific to the Stratix10 that are documented here.
The BxBFFT's advantages are highlighted below in a series of plots. These plots show important FFT statistics as a function of FFT size and Complex Points Per Clock (PPC), where PPC is a measure of the FFT's speed -- the number of input samples processed in parallel on every clock. PPC is sometimes also called the SuperSample Rate (SSR).
The BxBFFT doesn't always do the best in every category for every FFT size and PPC, but it consistently does well and is often at the top. Between that and the large number of supported features, the BxBFFT is an option to be highly considered.
On these plots, all FFTs are run with parameters matched as closely as possible, which means their features are reduced to those that most FFTs support. This means 18-bit operation, and fully natural data order in/out.
Note that the Astron and CAStron FFTs only support output data in a partially natural order. This was deemed to be close enough to fully natural order for comparison.
The Altera Parallel FFT only supports bit-reversed output. Although Altera does have bit-reverse IP, it does not support PPC greater than PPC=1. Thus to compare the Altera Parallel FFT on an equal footing, the bit-reverse implementation from the BxBFFT was added to it to produce fully natural order output.
Some FFTs do not support certain values for FFT size and PPC, and the corresponding points on those FFT's plots are missing.
It is not uncommon for an FPGA design to approach either power limits or resource limits. Even when this is not true of a baseline design, it often becomes true because of the introduction of new product features. Power consumption of an FFT can thus make or break a design, or allow or disallow product upgrades. Power consumption also affects product life and reliability, as high consumption puts extra stress on the power supply, and high temperatures and large temperature swings increase the rate of component degradation. High-speed FFTs require intensive processing, and thus may use a large percentage of the total power consumption of a design. Thus power reduction in the FFT can be of particularly high importance.
Below are results from Altera Quartus synthesis for power consumption of the BxBFFT vs several other FFTs in Agilex7. It shows that the BxBFFT is one of the best for power consumption.
FPGA resources are another common design limitation. Designs that use fewer resources have more margin for initial implementation and for future upgrades. For the same design, they can use fewer FPGAs of smaller size and be cheaper to manufacture. They also often achieve higher clock speeds because resources do not become tightly constrained.
Resources can be tricky to compare, since an FFT's ALM usage can often be lowered by increasing M20K memory or increasing DSPs, and vice-versa. These graphs show that the BxBFFT is among the best FFTs in resource usage, when considering these tradeoffs.
Sometimes designs need to meet strict real-time requirements, either in throughput or in latency. Both of these improve when an FFT runs faster. A faster FFT can be achieved with a higher achieved FPGA clock rate (Fmax) or with increased PPC. Throughput is Fmax * PPC.
Limitations in Fmax occur in two forms -- (1) Setup-limited Fmax, limited by setup times in the FPGA fabric, and (2) Restricted Fmax, constrained also by internal speed limits of DSPs and M20Ks.
Setup-Limited Fmax is significant because setup times become worse as resources become more highly utilized. Thus it indicates how much margin the design has to meet timing as FPGA utilization increases. Having more margin doesn't just mean success or failure; better margins also make it easier to achieve timing closure, which translates to reduced engineering time and effort.
Restricted Fmax is significant because it is the maximum frequency that the design can run. If Restricted Fmax is less than Setup-Limited Fmax, that indicates the design is constrained by DSP or M20K internals. These internal constraints depend on DSP and M20K configuration. Some FFTs achieve higher values than others by using the DSPs and/or M20Ks in a more favorable configuration.
The BxBFFT has been carefully optimized to achieve the highest Fmax with the greatest timing margins, as the plots show. For the Stratix10, Restricted Fmax and Setup-limited Fmax are almost identical. This is not the case for the Altera Agilex7 and Arria10.
These results illustrate how the BxBFFT is one of the top high-speed FFTs in Altera Stratix10 FPGAs. It is among the top in power and resource usage. It attains the highest speeds. It is unmatched in supported features and support. It is also cross-platform, supporting both Altera and Xilinx FPGAs, with a path into ASICs.