The BxBFFT is an amazing high-speed streaming Fast Fourier Transform, and it now supports the Altera Stratix10 FPGA family. In the Stratix10 the BxBFFT has all the advantages specified on the main BxBFFT page, plus additional advantages specific to the Stratix10 that are documented here.
It is not uncommon for an FPGA design to approach either power limits or resource limits. Even when this is not true of a baseline design, it often becomes true because of the introduction of new product features. Power consumption of an FFT can thus make or break a design, or allow or disallow product upgrades. Power consumption also affects product life and reliability, as high consumption puts extra stress on the power supply, and high temperatures and large temperature swings increase the rate of component degradation. High-speed FFTs require intensive processing, and thus may use a large percentage of the total power consumption of a design. Thus power reduction in the FFT can be of particularly high importance.
The BxBFFT is highly optimized for power consumption. Multiple customers have found that a switch to the BxBFFT saved significant amounts of power in their designs, making those designs viable where before they were not.
Below are results from Altera Quartus synthesis for power consumption of the BxBFFT vs several other FFTs. It shows that BxBFFT power is typically lower than other FFTs by a factor of about 1.2X in Stratix10 FPGAs (20%). There is a separate graph for comparing the BxBFFT vs the Altera Parallel FFT. This is because the Altera Parallel FFT doesn't support natural output data order, and that support consumes power. So a separate comparison is made of the Altera Parallel FFT vs the BxBFFT using their scrambled output data orders. It shows that the BxBFFT's power advantage over the Altera Parallel FFT is about 10% to 20%.
Some FFTs do not support the larger FFT sizes and larger complex Points Per Clock (PPC), and thus the corresponding points on those FFT's plots are missing.
FPGA resources are another common design limitation. Designs that use fewer resources have more margin for initial implementation and for future upgrades. For the same design, they can use fewer FPGAs of smaller size and be cheaper to manufacture.
The BxBFFT uses substantially fewer FPGA ALMs than most competing FFTs in Stratix10 FPGAs, as shown in the graphs below. The Spiral FFT is an exception; it uses fewer ALMs but more DSPs and more memories. Again, most FFTs are on the first graph, with a separate graph for comparison with the Altera FFT that must use a scrambled output order.
BxBFFT DSPs are also excellent, as shown on the following graph. Only one graph is needed for DSPs since descrambling the data doesn't use DSPs.
Required memory is shown on the next two plots. BxBFFT can save memory by autogeneration of twiddle coefficients rather than storing them in ROM. The BxBFFT also offers memory savings for the case where a scrambled output data order is acceptable.
Sometimes designs need to meet strict real-time requirements, either in throughput or in latency. Both of these improve when an FFT runs faster. A faster FFT can be achieved with a higher achieved FPGA clock rate (Fmax) or with increased parallelism. Parallelism is measured by the processed complex data Points Per Clock (PPC), also called SuperSample Rate (SSR). Throughput is Fmax * PPC.
One issue is that as PPC increases, more resources are used, there is more resource contention, and thus the achieved Fmax of an FFT goes down. This may make the desired throughput unachievable.
Fmax degrades less from resource contention for BxBFFTs than for other FFTs. BxBFFTs are thus able to achieve higher throughput, because a high Fmax and high PPC are simultaneously achievable. The first graph below shows this. The BxBFFT achieves high PPC and high Fmax simultaneously, when the other FFTs do not. Thus the BxBFFT provides the best throughput and latency.
The second graph below shows the setup-limited Fmax. This is the speed that would be obtained if there were not limitations in hard IP such as the memories or DSPs, but instead the speed was only limited by the FPGA fabric. The BxBFFT achieves the highest speeds on this graph, especially as resource contention becomes an issue with the larger FFTs. This means that the BxBFFT has more timing margin, which means it is easier to achieve timing closure with the BxBFFT than with other FFTs.
These results illustrate how the BxBFFT is superior in most ways to other high-speed FFTs in Altera Stratix10 FPGAs. It uses less power, uses fewer resources, and attains higher speeds. It is unmatched at almost all FFT sizes and speeds. It is unmatched in supported features. It is also cross-platform, supporting both Altera and Xilinx FPGAs, with a path into ASICs.