The BxBFFT is an amazing high-speed streaming Fast Fourier Transform, and it now supports the Altera Arria10 FPGA family. In the Arria10 the BxBFFT has all the advantages specified on the main BxBFFT page, plus additional advantages specific to the Arria10 that are documented here.
It is not uncommon for an FPGA design to approach either power limits or resource limits. Even when this is not true of a baseline design, it often becomes true because of the introduction of new product features. Power consumption of an FFT can thus make or break a design, or allow or disallow product upgrades. Power consumption also affects product life and reliability, as high consumption puts extra stress on the power supply, and high temperatures and large temperature swings increase the rate of component degradation. High-speed FFTs require intensive processing, and thus may use a large percentage of the total power consumption of a design. Thus power reduction in the FFT can be of particularly high importance.
The BxBFFT is highly optimized for power consumption. Multiple customers have found that a switch to the BxBFFT saved significant amounts of power in their designs, making those designs viable where before they were not.
Below are results from Altera Quartus synthesis for power consumption of the BxBFFT vs several other FFTs. It shows that BxBFFT power is typically lower than other FFTs by a factor of about 1.2 in Arria10 FPGAs. There is a separate graph for comparing the BxBFFT vs the Altera Parallel FFT. This is because the Altera Parallel FFT doesn't support natural output data order, and that support consumes power. So a separate comparison is made of the Altera Parallel FFT vs the BxBFFT using their scrambled output data orders. This shows that the BxBFFT's power advantage over the Altera Parallel FFT is often about 40%, or a factor of 1.4.
For some FFTs, power values for some FFT sizes and/or PPCs are omitted. This is because that particular FFT doesn't support that size.
FPGA resources are another common design limitation. Designs that use fewer resources have more margin for initial implementation and for future upgrades. For the same design, they can use fewer FPGAs of smaller size and be cheaper to manufacture.
The BxBFFT uses substantially fewer FPGA ALMs than competing FFTs in Arria10 FPGAs, as shown in the graphs below. Again, most FFTs are on the first graph, which shows the BxBFFT using about 1.4 times fewer ALMs. A separate graph for comparison with the Altera FFT use a scrambled output order, and it shows that the BxBFFT uses about 1.8 times fewer ALMs.
BxBFFT DSPs are also excellent, as shown on the following graph. Only one graph is needed for DSPs since descrambling the data order doesn't use DSPs. The BxBFFT uses perhaps 20% fewer DSPs than competing FFTs.
Required memory is significantly better for the BxBFFT than other FFTs, except the Altera FFT. The Altera FFT appears to be better optimized to the Altera memory architecture.
Sometimes designs need to meet strict real-time requirements, either in throughput or in latency. Both of these improve when an FFT runs faster. A faster FFT can be achieved with a higher achieved FPGA clock rate (Fmax) or with increased parallelism. Parallelism is measured by the processed complex data Points Per Clock (PPC), also called SuperSample Rate (SSR). Throughput is Fmax * PPC.
One issue is that as PPC increases, more resources are used, there is more resource contention, and thus the achieved Fmax of an FFT goes down. This may make the desired throughput unachievable.
Fmax degrades less from resource contention for BxBFFTs than for other FFTs. BxBFFTs are thus able to achieve higher throughput, because a high Fmax and high PPC are simultaneously achievable. The first graph below shows this. The BxBFFT achieves high PPC and high Fmax simultaneously, when the other FFTs do not. Thus the BxBFFT provides the best throughput and latency.
The second graph below shows the setup-limited Fmax. This is the speed that would be obtained if there were not limitations in hard IP such as the memories or DSPs, but instead the speed was only limited by the FPGA fabric. Achieving a higher speed on this graph is an indication of resilience against speed degradation when resource contention increases. As can be seen, the BxBFFT has substantially more resilience than the other FFTs.
BxBFFTs are faster in Arria10 Quartus implementation than competitors, which can save significant engineering time during product development. In part this is because BxBFFT code is written in a SystemVerilog style that is direct and easy to parse, which reduces time in synthesis.
Part of the savings is also in place and route, because BxBFFTs have more timing margin than competitors. This additional timing margin is what allows BxBFFTs to achieve high Fmax and thus high throughput. Timing margin also means that the Quartus place and route steps don't need to work as hard to meet desired timing constraints. As a result, Quartus implementation time is shorter.
The graph below shows how other FFTs take more implementation time than the BxBFFT.
These results illustrate how the BxBFFT is superior in most ways to other high-speed FFTs in Altera Arria10 FPGAs. It uses less power, uses fewer resources, and attains higher speeds. It is unmatched at almost all FFT sizes and speeds. It is unmatched in supported features. It is also cross-platform, supporting both Altera and Xilinx FPGAs, with a path into ASICs.