# **FFT IP Core**

**User Guide** 





# **Contents**

| About This IP Core                            |      |
|-----------------------------------------------|------|
| Altera DSP IP Core Features                   | 1-1  |
| FFT IP Core Features                          | 1-1  |
| General Description                           | 1-2  |
| Fixed Transform Size FFT                      | 1-2  |
| Variable Streaming FFT                        | 1-2  |
| DSP IP Core Device Family Support             | 1-2  |
| DSP IP Core Verification                      |      |
| FFT IP Core Release Information               | 1-3  |
| Performance and Resource Utilization          | 1-4  |
| FFT IP Core Getting Started                   | 2-1  |
| Installing and Licensing IP Cores             | 2-1  |
| OpenCore Plus IP Evaluation                   | 2-1  |
| FFT II IP Core OpenCore Plus Timeout Behavior | 2-2  |
| IP Catalog and Parameter Editor               | 2-2  |
| Specifying IP Core Parameters and Options     | 2-3  |
| Files Generated for Altera IP Cores           |      |
| Simulating Altera IP Cores in other EDA Tools | 2-7  |
| DSP Builder Design Flow                       | 2-8  |
|                                               |      |
| FFT IP Core Functional Description            | 3-1  |
| Fixed Transform FFTs                          | 3-1  |
| Variable Streaming FFTs                       | 3-1  |
| Fixed-Point Variable Streaming FFTs           | 3-2  |
| Floating-Point Variable Streaming FFTs        | 3-2  |
| Input and Output Orders                       | 3-2  |
| FFT Processor Engines                         | 3-3  |
| Quad-Output FFT Engine                        | 3-3  |
| Single-Output FFT Engine                      | 3-4  |
| I/O Data Flow                                 | 3-5  |
| Streaming FFT                                 | 3-5  |
| Variable Streaming                            | 3-7  |
| Buffered Burst                                |      |
| Burst                                         |      |
| FFT IP Core Parameters                        |      |
| FFT IP Core Interfaces and Signals            |      |
| Avalon-ST Interfaces in DSP IP Cores          |      |
| FFT IP Core Avalon-ST Signals                 | 3-17 |
| FFT IP Core Signals in Qsys Systems           | 3-19 |

| <b>Block Floating Point Scaling</b> | 4-1 |
|-------------------------------------|-----|
| Possible Exponent Values            |     |
| Implementing Scaling                |     |
| Example of Scaling                  |     |
| Unity Gain in an IFFT+FFT Pair      |     |
| Document Revision History           | 5-1 |

### **About This IP Core**

1

2014.12.15

**UG-FFT** 





### Altera DSP IP Core Features

- Avalon<sup>®</sup> Streaming (Avalon-ST) interfaces
- DSP Builder ready
- Testbenches to verify the IP core
- IP functional simulation models for use in Altera-supported VHDL and Verilog HDL simulators

### **FFT IP Core Features**

- Bit-accurate MATLAB models
- Variable streaming FFT:
  - Single-precision floating-point or fixed-point representation
  - Radix-4, mixed radix-4/2 implementations (for floating-point FFT), and radix-2<sup>2</sup> single delay feedback implementation (for fixed-point FFT)
  - Input and output orders: natural order, bit-reversed or digit-reversed, and DC-centered (-N/2 to N/2)
  - Reduced memory requirements
  - Support for 8 to 32-bit data and twiddle width (foxed-point FFTs)
- Fixed transform size FFT that implements block floating-point FFTs and maintains the maximum dynamic range of data during processing (not for variable streaming FFTs)
  - Multiple I/O data flow options: streaming, buffered burst, and burst
  - Uses embedded memory
  - Maximum system clock frequency more than 300 MHz
  - Optimized to use Stratix series DSP blocks and TriMatrix memory
  - High throughput quad-output radix 4 FFT engine
  - Support for multiple single-output and quad-output engines in parallel
- User control over optimization in DSP blocks or in speed in Stratix V devices, for streaming, buffered burst, burst, and variable streaming fixed-point FFTs
- Avalon Streaming (Avalon-ST) compliant input and output interfaces
- Parameterization-specific VHDL and Verilog HDL testbench generation
- Transform direction (FFT/IFFT) specifiable on a per-block basis

© 2015 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, ENPIRION, MAX, MEGACORE, NIOS, QUARTUS and STRATIX words and logos are trademarks of Altera Corporation and registered in the U.S. Patent and Trademark Office and in other countries. All other words and logos identified as trademarks or service marks are the property of their respective holders as described at <a href="https://www.altera.com/common/legal.html">www.altera.com/common/legal.html</a>. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera's standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services.

ISO 9001:2008 Registered



# **General Description**

The FFT IP core is a high performance, highly-parameterizable Fast Fourier transform (FFT) processor. The FFT IP core implements a complex FFT or inverse FFT (IFFT) for high-performance applications.

The FFT MegaCore function implements:

- Fixed transform size FFT
- Variable streaming FFT

#### Fixed Transform Size FFT

The fixed transform FFT implements a radix-2/4 decimation-in-frequency (DIF) FFT fixed-transform size algorithm for transform lengths of 2m where  $6 \le m \le 16$ . This FFT uses block-floating point representations to achieve the best trade-off between maximum signal-to-noise ratio (SNR) and minimum size requirements.

The fixed transform FFT accepts a two's complement format complex data vector of length N inputs, where N is the desired transform length in natural order. The function outputs the transform-domain complex vector in natural order. The FFT produces an accumulated block exponent to indicate any data scaling that has occurred during the transform to maintain precision and maximize the internal signal-to-noise ratio. You can specify the transform direction on a per-block basis using an input port.

### **Variable Streaming FFT**

The variable streaming FFT implements two different types of FFT. The variable streaming FFTs implement either a radix- $2^2$  single delay feedback FFT, using a fixed-point representation, or a mixed radix-4/2 FFT, using a single precision floating point representation. After you select your FFT type, you can configure your FFT variation during runtime to perform the FFT algorithm for transform lengths of 2m where  $3 \le m \le 18$ .

The fixed-point representation grows the data widths naturally from input through to output thereby maintaining a high SNR at the output. The single precision floating-point representation allows a large dynamic range of values to be represented while maintaining a high SNR at the output.

The order of the input data vector of size N can be natural, bit- or digit-reversed, or -N/2 to N/2 (DC-centered). The fixed-point representation supports a natural, bit-reversed, or DC-centered order and the floating point representation supports a natural, digit-reversed order. The FFT outputs the transform-domain complex vector in natural, bit-reversed, or digit-reversed order. You can specify the transform direction on a per-block basis using an input port.

# **DSP IP Core Device Family Support**

Altera Corporation About This IP Core



Altera® offers the following device support levels for Altera IP cores:

- Preliminary support—Altera verifies the IP core with preliminary timing models for this device family. The IP core meets all functional requirements, but might still be undergoing timing analysis for the device family. You can use it in production designs with caution.
- Final support—Altera verifies the IP core with final timing models for this device family. The IP core meets all functional and timing requirements for the device family. You can use it in production designs.

**Table 1-1: DSP IP Core Device Family Support** 

| Device Family              | Support    |
|----------------------------|------------|
| Arria <sup>®</sup> II GX   | Final      |
| Arria II GZ                | Final      |
| Arria V                    | Final      |
| Arria 10                   | Final      |
| Cyclone <sup>®</sup> IV    | Final      |
| Cyclone V                  | Final      |
| MAX <sup>®</sup> 10 FPGA   | Final      |
| Stratix <sup>®</sup> IV GT | Final      |
| Stratix IV GX/E            | Final      |
| Stratix V                  | Final      |
| Other device families      | No support |

### **DSP IP Core Verification**

Before releasing a version of an IP core, Altera runs comprehensive regression tests to verify its quality and correctness. Altera generates custom variations of the IP core to exercise the various parameter options and thoroughly simulates the resulting simulation models with the results verified against master simulation models.

# **FFT IP Core Release Information**

Table 1-2: FFT IP Core Release Information

| ltem          | Description   |
|---------------|---------------|
| Version       | 14.1          |
| Release Date  | December 2014 |
| Ordering Code | IP-FFT        |
| Product ID    | 0034          |

About This IP Core Altera Corporation



| ltem      | Description |  |  |  |  |  |
|-----------|-------------|--|--|--|--|--|
| Vendor ID | 6AF7        |  |  |  |  |  |

# **Performance and Resource Utilization**

### **Table 1-3: Performance and Resource Utilization**

Typical performance using the Quartus II software with the Arria V (5AGXFB3H4F40C4), Cyclone V (5CGXFC7D6F31C6), and Stratix V (5SGSMD4H2F35C2) devices

|            | /D0F31C0),              | arameters | •       |       | DSP    |      | nory | Reg         | gisters       | f .                       |
|------------|-------------------------|-----------|---------|-------|--------|------|------|-------------|---------------|---------------------------|
| Device     | Туре                    | Length    | Engines | ALM   | Blocks | M10K | M20K | Primar<br>y | Secondar<br>y | f <sub>MAX</sub><br>(MHz) |
| Arria<br>V | Buffered<br>Burst       | 1,024     | 1       | 1,572 | 6      | 16   |      | 3,903       | 143           | 27<br>5                   |
| Arria<br>V | Buffered<br>Burst       | 1,024     | 2       | 2,512 | 12     | 30   |      | 6,027       | 272           | 27<br>4                   |
| Arria<br>V | Buffered<br>Burst       | 1,024     | 4       | 4,485 | 24     | 59   |      | 10,765      | 426           | 26<br>2                   |
| Arria<br>V | Buffered<br>Burst       | 256       | 1       | 1,532 | 6      | 16   |      | 3,713       | 136           | 27<br>5                   |
| Arria<br>V | Buffered<br>Burst       | 256       | 2       | 2,459 | 12     | 30   |      | 5,829       | 246           | 24 5                      |
| Arria<br>V | Buffered<br>Burst       | 256       | 4       | 4,405 | 24     | 59   |      | 10,539      | 389           | 26<br>0                   |
| Arria<br>V | Buffered<br>Burst       | 4,096     | 1       | 1,627 | 6      | 59   |      | 4,085       | 130           | 27<br>5                   |
| Arria<br>V | Buffered<br>Burst       | 4,096     | 2       | 2,555 | 12     | 59   |      | 6,244       | 252           | 27<br>5                   |
| Arria<br>V | Buffered<br>Burst       | 4,096     | 4       | 4,526 | 24     | 59   |      | 10,986      | 438           | 26<br>5                   |
| Arria<br>V | Burst<br>Quad<br>Output | 1,024     | 1       | 1,565 | 6      | 8    |      | 3,807       | 147           | 27                        |
| Arria<br>V | Burst<br>Quad<br>Output | 1,024     | 2       | 2,497 | 12     | 14   |      | 5,952       | 225           | 27<br>5                   |
| Arria<br>V | Burst<br>Quad<br>Output | 1,024     | 4       | 4,461 | 24     | 27   |      | 10,677      | 347           | 25<br>7                   |
| Arria<br>V | Burst<br>Quad<br>Output | 256       | 1       | 1,527 | 6      | 8    |      | 3,610       | 153           | 27 2                      |

Altera Corporation About This IP Core



|            | Р                         | arameters |         |       | DSP    | Men  | nory | Reg         | gisters       | £                           |
|------------|---------------------------|-----------|---------|-------|--------|------|------|-------------|---------------|-----------------------------|
| Device     | Туре                      | Length    | Engines | ALM   | Blocks | M10K | M20K | Primar<br>y | Secondar<br>y | · f <sub>MAX</sub><br>(MHz) |
| Arria<br>V | Burst<br>Quad<br>Output   | 256       | 2       | 2,474 | 12     | 14   |      | 5,768       | 233           | 27<br>5                     |
| Arria<br>V | Burst<br>Quad<br>Output   | 256       | 4       | 4,403 | 24     | 27   |      | 10,443      | 437           | 25<br>7                     |
| Arria<br>V | Burst<br>Quad<br>Output   | 4,096     | 1       | 1,597 | 6      | 27   |      | 3,949       | 151           | 27<br>5                     |
| Arria<br>V | Burst<br>Quad<br>Output   | 4,096     | 2       | 2,551 | 12     | 27   |      | 6,119       | 223           | 27<br>5                     |
| Arria<br>V | Burst<br>Quad<br>Output   | 4,096     | 4       | 4,494 | 24     | 27   |      | 10,844      | 392           | 25<br>6                     |
| Arria<br>V | Burst<br>Single<br>Output | 1,024     | 1       | 672   | 2      | 6    |      | 1,488       | 101           | 27<br>5                     |
| Arria<br>V | Burst<br>Single<br>Output | 1,024     | 2       | 994   | 4      | 10   |      | 2,433       | 182           | 27<br>5                     |
| Arria<br>V | Burst<br>Single<br>Output | 256       | 1       | 636   | 2      | 3    |      | 1,442       | 95            | 27<br>5                     |
| Arria<br>V | Burst<br>Single<br>Output | 256       | 2       | 969   | 4      | 8    |      | 2,375       | 152           | 27<br>5                     |
| Arria<br>V | Burst<br>Single<br>Output | 4,096     | 1       | 702   | 2      | 19   |      | 1,522       | 126           | 27                          |
| Arria<br>V | Burst<br>Single<br>Output | 4,096     | 2       | 1,001 | 4      | 25   |      | 2,521       | 156           | 27<br>5                     |
| Arria<br>V | Streaming                 | 1,024     | _       | 1,880 | 6      | 20   |      | 4,565       | 167           | 27<br>5                     |
| Arria<br>V | Streaming                 | 256       | _       | 1,647 | 6      | 20   |      | 3,838       | 137           | 27<br>5                     |
| Arria<br>V | Streaming                 | 4,096     | _       | 1,819 | 6      | 71   |      | 4,655       | 137           | 27<br>5                     |

About This IP Core

Altera Corporation



|                  | Р                                          | arameters |         |        | DCD           | Mer  | nory | Re          | gisters       | 6                         |
|------------------|--------------------------------------------|-----------|---------|--------|---------------|------|------|-------------|---------------|---------------------------|
| Device           | Туре                                       | Length    | Engines | ALM    | DSP<br>Blocks | M10K | M20K | Primar<br>y | Secondar<br>y | f <sub>MAX</sub><br>(MHz) |
| Arria<br>V       | Variable<br>Streaming<br>Floating<br>Point | 1,024     | _       | 11,195 | 48            | 89   |      | 18,843      | 748           | 16 3                      |
| Arria<br>V       | Variable<br>Streaming<br>Floating<br>Point | 256       | _       | 8,639  | 36            | 62   |      | 15,127      | 609           | 16                        |
| Arria<br>V       | Variable<br>Streaming<br>Floating<br>Point | 4,096     | _       | 13,947 | 60            | 138  |      | 22,598      | 854           | 16<br>2                   |
| Arria<br>V       | Variable<br>Streaming                      | 1,024     | _       | 2,535  | 11            | 14   |      | 6,269       | 179           | 22                        |
| Arria<br>V       | Variable<br>Streaming                      | 256       | _       | 1,913  | 8             | 8    |      | 4,798       | 148           | 22<br>9                   |
| Arria<br>V       | Variable<br>Streaming                      | 4,096     | _       | 3,232  | 15            | 31   |      | 7,762       | 285           | 21                        |
| Cycl<br>one<br>V | Buffered<br>Burst                          | 1,024     | 1       | 1,599  | 6             | 16   |      | 3,912       | 114           | 22<br>6                   |
| Cycl<br>one<br>V | Buffered<br>Burst                          | 1,024     | 2       | 2,506  | 12            | 30   |      | 6,078       | 199           | 21                        |
| Cycl<br>one<br>V | Buffered<br>Burst                          | 1,024     | 4       | 4,505  | 24            | 59   |      | 10,700      | 421           | 20<br>7                   |
| Cycl<br>one<br>V | Buffered<br>Burst                          | 256       | 1       | 1,528  | 6             | 16   |      | 3,713       | 115           | 22<br>7                   |
| Cycl<br>one<br>V | Buffered<br>Burst                          | 256       | 2       | 2,452  | 12            | 30   |      | 5,833       | 211           | 23 2                      |
| Cycl<br>one<br>V | Buffered<br>Burst                          | 256       | 4       | 4,487  | 24            | 59   |      | 10,483      | 424           | 22                        |
| Cycl<br>one<br>V | Buffered<br>Burst                          | 4,096     | 1       | 1,649  | 6             | 59   |      | 4,060       | 138           | 22 3                      |
| Cycl<br>one<br>V | Buffered<br>Burst                          | 4,096     | 2       | 2,555  | 12            | 59   |      | 6,254       | 199           | 22<br>7                   |

Altera Corporation About This IP Core



|                  | Р                         | arameters |         |       | DSP    | Men  | nory | Re          | gisters       | £                           |
|------------------|---------------------------|-----------|---------|-------|--------|------|------|-------------|---------------|-----------------------------|
| Device           | Type                      | Length    | Engines | ALM   | Blocks | M10K | M20K | Primar<br>y | Secondar<br>y | · f <sub>MAX</sub><br>(MHz) |
| Cycl<br>one<br>V | Buffered<br>Burst         | 4,096     | 4       | 4,576 | 24     | 59   |      | 10,980      | 377           | 21<br>4                     |
| Cycl<br>one<br>V | Burst<br>Quad<br>Output   | 1,024     | 1       | 1,562 | 6      | 8    |      | 3,810       | 122           | 22<br>5                     |
| Cycl<br>one<br>V | Burst<br>Quad<br>Output   | 1,024     | 2       | 2,501 | 12     | 14   |      | 5,972       | 196           | 23                          |
| Cycl<br>one<br>V | Burst<br>Quad<br>Output   | 1,024     | 4       | 4,480 | 24     | 27   |      | 10,643      | 372           | 21<br>6                     |
| Cycl<br>one<br>V | Burst<br>Quad<br>Output   | 256       | 1       | 1,534 | 6      | 8    |      | 3,617       | 120           | 22<br>6                     |
| Cycl<br>one<br>V | Burst<br>Quad<br>Output   | 256       | 2       | 2,444 | 12     | 14   |      | 5,793       | 153           | 22                          |
| Cycl<br>one<br>V | Burst<br>Quad<br>Output   | 256       | 4       | 4,443 | 24     | 27   |      | 10,402      | 379           | 22 3                        |
| Cycl<br>one<br>V | Burst<br>Quad<br>Output   | 4,096     | 1       | 1,590 | 6      | 27   |      | 3,968       | 120           | 23 7                        |
| Cycl<br>one<br>V | Burst<br>Quad<br>Output   | 4,096     | 2       | 2,547 | 12     | 27   |      | 6,135       | 209           | 22<br>7                     |
| Cycl<br>one<br>V | Burst<br>Quad<br>Output   | 4,096     | 4       | 4,512 | 24     | 27   |      | 10,798      | 388           | 21 0                        |
| Cycl<br>one<br>V | Burst<br>Single<br>Output | 1,024     | 1       | 673   | 2      | 6    |      | 1,508       | 83            | 22 2                        |
| Cycl<br>one<br>V | Burst<br>Single<br>Output | 1,024     | 2       | 984   | 4      | 10   |      | 2,475       | 126           | 23                          |
| Cycl<br>one<br>V | Burst<br>Single<br>Output | 256       | 1       | 639   | 2      | 3    |      | 1,382       | 159           | 22<br>9                     |
| Cycl<br>one<br>V | Burst<br>Single<br>Output | 256       | 2       | 967   | 4      | 8    |      | 2,353       | 169           | 24 0                        |

About This IP Core

Altera Corporation



|                  | P                                          | arameters |         |        | Den           | Mer  | nory | Re          | gisters       |                             |
|------------------|--------------------------------------------|-----------|---------|--------|---------------|------|------|-------------|---------------|-----------------------------|
| Device           | Туре                                       | Length    | Engines | ALM    | DSP<br>Blocks | M10K | M20K | Primar<br>y | Secondar<br>y | · f <sub>MAX</sub><br>(MHz) |
| Cycl<br>one<br>V | Burst<br>Single<br>Output                  | 4,096     | 1       | 695    | 2             | 19   |      | 1,540       | 105           | 23 7                        |
| Cycl<br>one<br>V | Burst<br>Single<br>Output                  | 4,096     | 2       | 1,009  | 4             | 25   |      | 2,536       | 116           | 24                          |
| Cycl<br>one<br>V | Streaming                                  | 1,024     | _       | 1,869  | 6             | 20   |      | 4,573       | 132           | 21                          |
| Cycl<br>one<br>V | Streaming                                  | 256       | _       | 1,651  | 6             | 20   |      | 3,878       | 85            | 22<br>6                     |
| Cycl<br>one<br>V | Streaming                                  | 4,096     | _       | 1,822  | 6             | 71   |      | 4,673       | 124           | 19<br>9                     |
| Cycl<br>one<br>V | Variable<br>Streaming<br>Floating<br>Point | 1,024     | _       | 11,184 | 48            | 89   |      | 18,830      | 628           | 13                          |
| Cycl<br>one<br>V | Variable<br>Streaming<br>Floating<br>Point | 256       | _       | 8,611  | 36            | 62   |      | 15,156      | 467           | 13                          |
| Cycl<br>one<br>V | Variable<br>Streaming<br>Floating<br>Point | 4,096     | _       | 13,945 | 60            | 138  |      | 22,615      | 701           | 13 2                        |
| Cycl<br>one<br>V | Variable<br>Streaming                      | 1,024     | _       | 2,533  | 11            | 14   |      | 6,254       | 240           | 17<br>9                     |
| Cycl<br>one<br>V | Variable<br>Streaming                      | 256       | _       | 1,911  | 8             | 8    |      | 4,786       | 176           | 18                          |
| Cycl<br>one<br>V | Variable<br>Streaming                      | 4,096     | _       | 3,226  | 15            | 31   |      | 7,761       | 320           | 17<br>6                     |
| Strati<br>x V    | Buffered<br>Burst                          | 1,024     | 1       | 1,610  | 6             |      | 16   | 4,141       | 107           | 42 4                        |
| Strati<br>x V    | Buffered<br>Burst                          | 1,024     | 2       | 2,545  | 12            |      | 30   | 6,517       | 170           | 42<br>7                     |
| Strati<br>x V    | Buffered<br>Burst                          | 1,024     | 4       | 4,554  | 24            |      | 59   | 11,687      | 250           | 36<br>6                     |

Altera Corporation About This IP Core



|               | Р                       | arameters |         |       | DSP    | Mer  | nory | Re          | gisters       | £                         |
|---------------|-------------------------|-----------|---------|-------|--------|------|------|-------------|---------------|---------------------------|
| Device        | Туре                    | Length    | Engines | ALM   | Blocks | M10K | M20K | Primar<br>y | Secondar<br>y | f <sub>MAX</sub><br>(MHz) |
| Strati<br>x V | Buffered<br>Burst       | 256       | 1       | 1,546 | 6      |      | 16   | 3,959       | 110           | 49                        |
| Strati<br>x V | Buffered<br>Burst       | 256       | 2       | 2,475 | 12     |      | 30   | 6,314       | 134           | 44 0                      |
| Strati<br>x V | Buffered<br>Burst       | 256       | 4       | 4,480 | 24     |      | 59   | 11,477      | 281           | 38                        |
| Strati<br>x V | Buffered<br>Burst       | 4,096     | 1       | 1,668 | 6      |      | 30   | 4,312       | 122           | 43 2                      |
| Strati<br>x V | Buffered<br>Burst       | 4,096     | 2       | 2,602 | 12     |      | 30   | 6,718       | 176           | 41<br>6                   |
| Strati<br>x V | Buffered<br>Burst       | 4,096     | 4       | 4,623 | 24     |      | 59   | 11,876      | 249           | 39                        |
| Strati<br>x V | Burst<br>Quad<br>Output | 1,024     | 1       | 1,550 | 6      |      | 8    | 4,037       | 115           | 45<br>5                   |
| Strati<br>x V | Burst<br>Quad<br>Output | 1,024     | 2       | 2,444 | 12     |      | 14   | 6,417       | 164           | 43                        |
| Strati<br>x V | Burst<br>Quad<br>Output | 1,024     | 4       | 4,397 | 24     |      | 27   | 11,548      | 330           | 41<br>6                   |
| Strati<br>x V | Burst<br>Quad<br>Output | 256       | 1       | 1,487 | 6      |      | 8    | 3,868       | 83            | 47<br>7                   |
| Strati<br>x V | Burst<br>Quad<br>Output | 256       | 2       | 2,387 | 12     |      | 14   | 6,211       | 164           | 45<br>8                   |
| Strati<br>x V | Burst<br>Quad<br>Output | 256       | 4       | 4,338 | 24     |      | 27   | 11,360      | 307           | 40 9                      |
| Strati<br>x V | Burst<br>Quad<br>Output | 4,096     | 1       | 1,593 | 6      |      | 14   | 4,222       | 93            | 44 8                      |
| Strati<br>x V | Burst<br>Quad<br>Output | 4,096     | 2       | 2,512 | 12     |      | 14   | 6,588       | 154           | 47                        |
| Strati<br>x V | Burst<br>Quad<br>Output | 4,096     | 4       | 4,468 | 24     |      | 27   | 11,773      | 267           | 40 3                      |

About This IP Core

Altera Corporation



|               | Р                                          | arameters |         |        | DSP    | Mer  | nory | Re          | gisters       | E .                         |
|---------------|--------------------------------------------|-----------|---------|--------|--------|------|------|-------------|---------------|-----------------------------|
| Device        | Туре                                       | Length    | Engines | ALM    | Blocks | M10K | M20K | Primar<br>y | Secondar<br>y | · f <sub>MAX</sub><br>(MHz) |
| Strati<br>x V | Burst<br>Single<br>Output                  | 1,024     | 1       | 652    | 2      |      | 4    | 1,553       | 111           | 50                          |
| Strati<br>x V | Burst<br>Single<br>Output                  | 1,024     | 2       | 1,011  | 4      |      | 8    | 2,687       | 149           | 47<br>6                     |
| Strati<br>x V | Burst<br>Single<br>Output                  | 256       | 1       | 621    | 2      |      | 3    | 1,502       | 132           | 50                          |
| Strati<br>x V | Burst<br>Single<br>Output                  | 256       | 2       | 978    | 4      |      | 8    | 2,555       | 173           | 50                          |
| Strati<br>x V | Burst<br>Single<br>Output                  | 4,096     | 1       | 681    | 2      |      | 9    | 1,589       | 149           | 50                          |
| Strati<br>x V | Burst<br>Single<br>Output                  | 4,096     | 2       | 1,039  | 4      |      | 14   | 2,755       | 161           | 47<br>6                     |
| Strati<br>x V | Streaming                                  | 1,024     | _       | 1,896  | 6      |      | 20   | 4,814       | 144           | 49                          |
| Strati<br>x V | Streaming                                  | 256       | _       | 1,604  | 6      |      | 20   | 4,062       | 99            | 44<br>9                     |
| Strati<br>x V | Streaming                                  | 4,096     | _       | 1,866  | 6      |      | 38   | 4,889       | 118           | 46<br>1                     |
| Strati<br>x V | Variable<br>Streaming<br>Floating<br>Point | 1,024     | _       | 11,607 | 32     |      | 87   | 19,031      | 974           | 35<br>5                     |
| Strati<br>x V | Variable<br>Streaming<br>Floating<br>Point | 256       | _       | 8,850  | 24     |      | 59   | 15,297      | 820           | 37 4                        |
| Strati<br>x V | Variable<br>Streaming<br>Floating<br>Point | 4,096     | _       | 14,335 | 40     |      | 115  | 22,839      | 1,047         | 32<br>5                     |
| Strati<br>x V | Variable<br>Streaming                      | 1,024     | _       | 2,334  | 14     |      | 13   | 5,623       | 201           | 38 2                        |
| Strati<br>x V | Variable<br>Streaming                      | 256       | _       | 1,801  | 10     |      | 8    | 4,443       | 174           | 36<br>5                     |

Altera Corporation About This IP Core



|               | Parameters            |        |         | DSP   | Memory |      | Registers |             | f             |         |
|---------------|-----------------------|--------|---------|-------|--------|------|-----------|-------------|---------------|---------|
| Device        | Type                  | Length | Engines | ALM   |        | M10K | M20K      | Primar<br>y | Secondar<br>y | (MHz)   |
| Strati<br>x V | Variable<br>Streaming | 4,096  | _       | 2,924 | 18     |      | 23        | 6,818       | 238           | 35<br>5 |

About This IP Core

Altera Corporation



2014.12.15

**UG-FFT** 





# **Installing and Licensing IP Cores**

The Altera IP Library provides many useful IP core functions for your production use without purchasing an additional license. Some Altera MegaCore<sup>®</sup> IP functions require that you purchase a separate license for production use. However, the OpenCore<sup>®</sup> feature allows evaluation of any Altera IP core in simulation and compilation in the Quartus<sup>®</sup> II software. After you are satisfied with functionality and perfformance, visit the Self Service Licensing Center to obtain a license number for any Altera product.

Figure 2-1: IP Core Installation Path



**Note:** The default IP installation directory on Windows is *drive*:\altera\<*version number*>; on Linux it is <*home directory*>/altera/<*version number*>.

#### **Related Information**

- Altera Licensing Site
- Altera Software Installation and Licensing Manual

# **OpenCore Plus IP Evaluation**

Altera's free OpenCore Plus feature allows you to evaluate licensed MegaCore IP cores in simulation and hardware before purchase. You need only purchase a license for MegaCore IP cores if you decide to take your design to production. OpenCore Plus supports the following evaluations:

- Simulate the behavior of a licensed IP core in your system.
- Verify the functionality, size, and speed of the IP core quickly and easily.
- Generate time-limited device programming files for designs that include IP cores.
- Program a device with your IP core and verify your design in hardware.

© 2015 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, ENPIRION, MAX, MEGACORE, NIOS, QUARTUS and STRATIX words and logos are trademarks of Altera Corporation and registered in the U.S. Patent and Trademark Office and in other countries. All other words and logos identified as trademarks or service marks are the property of their respective holders as described at <a href="https://www.altera.com/common/legal.html">www.altera.com/common/legal.html</a>. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera's standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services.

ISO 9001:2008 Registered



OpenCore Plus evaluation supports the following two operation modes:

- Untethered—run the design containing the licensed IP for a limited time.
- Tethered—run the design containing the licensed IP for a longer time or indefinitely. This requires a connection between your board and the host computer.

**Note:** All IP cores that use OpenCore Plus time out simultaneously when any IP core in the design times out.

### FFT II IP Core OpenCore Plus Timeout Behavior

All IP cores in a device time out simultaneously when the most restrictive evaluation time is reached. If there is more than one IP core in a design, the time-out behavior of the other IP cores may mask the time-out behavior of a specific IP core .

All IP cores in a device time out simultaneously when the most restrictive evaluation time is reached. If there is more than one IP core in a design, a specific IP core's time-out behavior may be masked by the time-out behavior of the other IP cores. For IP cores, the untethered time-out is 1 hour; the tethered time-out value is indefinite. Your design stops working after the hardware evaluation time expires. The Quartus II software uses OpenCore Plus Files (.ocp) in your project directory to identify your use of the OpenCore Plus evaluation program. After you activate the feature, do not delete these files..

When the evaluation time expires, the source\_real, source\_imag, and source\_exp signals go low.

#### Related Information

AN 320: OpenCore Plus Evaluation of Megafunctions

# **IP Catalog and Parameter Editor**

The Quartus II IP Catalog (**Tools** > **IP Catalog**) and parameter editor help you easily customize and integrate IP cores into your project. You can use the IP Catalog and parameter editor to select, customize, and generate files representing your custom IP variation.

**Note:** The IP Catalog (**Tools** > **IP Catalog**) and parameter editor replace the MegaWizard<sup>™</sup> Plug-In Manager for IP selection and parameterization, beginning in Quartus II software version 14.0. Use the IP Catalog and parameter editor to locate and parameterize Altera IP cores.

The IP Catalog lists installed IP cores available for your design. Double-click any IP core to launch the parameter editor and generate files representing your IP variation. The parameter editor prompts you to specify an IP variation name, optional ports, and output file generation options. The parameter editor generates a top-level Qsys system file (.qsys) or Quartus II IP file (.qip) representing the IP core in your project. You can also parameterize an IP variation without an open project.

Use the following features to help you quickly locate and select an IP core:

- Filter IP Catalog to **Show IP for active device family** or **Show IP for all device families**. If you have no project open, select the **Device Family** in IP Catalog.
- Type in the Search field to locate any full or partial IP core name in IP Catalog.
- Right-click an IP core name in IP Catalog to display details about supported devices, open the IP core's installation folder, and view links to documentation.
- Click **Search for Partner IP**, to access partner IP information on the Altera website.

Altera Corporation FFT IP Core Getting Started



Figure 2-2: Quartus II IP Catalog



**Note:** The IP Catalog is also available in Qsys (**View** > **IP Catalog**). The Qsys IP Catalog includes exclusive system interconnect, video and image processing, and other system-level IP that are not available in the Quartus II IP Catalog. For more information about using the Qsys IP Catalog, refer to *Creating a System with Qsys* in the *Quartus II Handbook*.

# **Specifying IP Core Parameters and Options**

You can quickly configure a custom IP variation in the parameter editor. Use the following steps to specify IP core options and parameters in the parameter editor. Refer to *Specifying IP Core Parameters* and *Options (Legacy Parameter Editors)* for configuration of IP cores using the legacy parameter editor.

- 1. In the IP Catalog (**Tools** > **IP Catalog**), locate and double-click the name of the IP core to customize. The parameter editor appears.
- **2.** Specify a top-level name for your custom IP variation. The parameter editor saves the IP variation settings in a file named <*your\_ip*>.qsys. Click OK.
- **3.** Specify the parameters and options for your IP variation in the parameter editor, including one or more of the following. Refer to your IP core user guide for information about specific IP core parameters.

FFT IP Core Getting Started Altera Corporation



- Optionally select preset parameter values if provided for your IP core. Presets specify initial parameter values for specific applications.
- Specify parameters defining the IP core functionality, port configurations, and device-specific features.
- Specify options for processing the IP core files in other EDA tools.
- **4.** Click **Generate HDL**, the **Generation** dialog box appears.
- **5.** Specify output file generation options, and then click **Generate**. The IP variation files generate according to your specifications.
- **6.** To generate a simulation testbench, click **Generate > Generate Testbench System**.
- 7. To generate an HDL instantiation template that you can copy and paste into your text editor, click **Generate** > **HDL Example**.
- **8.** Click **Finish**. The parameter editor adds the top-level **.qsys** file to the current project automatically. If you are prompted to manually add the **.qsys** file to the project, click **Project** > **Add/Remove Files in Project** to add the file.
- **9.** After generating and instantiating your IP variation, make appropriate pin assignments to connect ports.

Figure 2-3: IP Parameter Editor



#### Files Generated for Altera IP Cores

The Quartus II software generates the following IP core output file structure:

Altera Corporation FFT IP Core Getting Started



Figure 2-4: IP Core Generated Files



Table 2-1: IP Core Generated Files

| File Name                  | Description                                                                                                                                                                                                                                                                                                                             |
|----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <my_ip>.qsys</my_ip>       | The Qsys system or top-level IP variation file. < <i>my_ip</i> > is the name that you give your IP variation.                                                                                                                                                                                                                           |
| <system>.sopcinfo</system> | Describes the connections and IP component parameterizations in your Qsys system. You can parse its contents to get requirements when you develop software drivers for IP components.                                                                                                                                                   |
|                            | Downstream tools such as the Nios II tool chain use this file. The .sopcinfo file and the system.h file generated for the Nios II tool chain include address map information for each slave relative to each master that accesses the slave. Different masters may have a different address map to access a particular slave component. |

FFT IP Core Getting Started Altera Corporation



| File Name                           | Description                                                                                                                                                                                                                                                                                                                                                     |
|-------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| < <i>my_ip</i> >.cmp                | The VHDL Component Declaration (.cmp) file is a text file that contains local generic and port definitions that you can use in VHDL design files.                                                                                                                                                                                                               |
| <my_ip>.html</my_ip>                | A report that contains connection information, a memory map showing the address of each slave with respect to each master to which it is connected, and parameter assignments.                                                                                                                                                                                  |
| <my_ip>_generation.rpt</my_ip>      | IP or Qsys generation log file. A summary of the messages during IP generation.                                                                                                                                                                                                                                                                                 |
| <my_ip>.debuginfo</my_ip>           | Contains post-generation information. Used to pass System Console and Bus Analyzer Toolkit information about the Qsys interconnect. The Bus Analysis Toolkit uses this file to identify debug components in the Qsys interconnect.                                                                                                                              |
| < <i>my_ip</i> >.qip                | Contains all the required information about the IP component to integrate and compile the IP component in the Quartus II software.                                                                                                                                                                                                                              |
| <my_ip>.csv</my_ip>                 | Contains information about the upgrade status of the IP component.                                                                                                                                                                                                                                                                                              |
| <my_ip>.bsf</my_ip>                 | A Block Symbol File (. <b>bsf</b> ) representation of the IP variation for use in Quartus II Block Diagram Files ( <b>.bdf</b> ).                                                                                                                                                                                                                               |
| <my_ip>.spd</my_ip>                 | Required input file for ip-make-simscript to generate simulation scripts for supported simulators. The <b>.spd</b> file contains a list of files generated for simulation, along with information about memories that you can initialize.                                                                                                                       |
| <my_ip>.ppf</my_ip>                 | The Pin Planner File (.ppf) stores the port and node assignments for IP components created for use with the Pin Planner.                                                                                                                                                                                                                                        |
| <my_ip>_bb.v</my_ip>                | You can use the Verilog black-box ( <b>_bb.v</b> ) file as an empty module declaration for use as a black box.                                                                                                                                                                                                                                                  |
| <my_ip>.sip</my_ip>                 | Contains information required for NativeLink simulation of IP components. You must add the <b>.sip</b> file to your Quartus project.                                                                                                                                                                                                                            |
| <my_ip>_inst.v or _inst.vhd</my_ip> | HDL example instantiation template. You can copy and paste the contents of this file into your HDL file to instantiate the IP variation.                                                                                                                                                                                                                        |
| <my_ip>.regmap</my_ip>              | If the IP contains register information, the .regmap file generates. The .regmap file describes the register map information of master and slave interfaces. This file complements the .sopcinfo file by providing more detailed register information about the system. This enables register display views and user customizable statistics in System Console. |

Altera Corporation FFT IP Core Getting Started



| File Name                        | Description                                                                                                                                                                                                                                                                               |
|----------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <my_ip>.svd</my_ip>              | Allows HPS System Debug tools to view the register maps of peripherals connected to HPS within a Qsys system.                                                                                                                                                                             |
|                                  | During synthesis, the .svd files for slave interfaces visible to System Console masters are stored in the .sof file in the debug section. System Console reads this section, which Qsys can query for register map information. For system slaves, Qsys can access the registers by name. |
| <my_ip>.v<br/>or</my_ip>         | HDL files that instantiate each submodule or child IP core for synthesis or simulation.                                                                                                                                                                                                   |
| <my_ip>.vhd</my_ip>              |                                                                                                                                                                                                                                                                                           |
| mentor/                          | Contains a ModelSim® script msim_setup.tcl to set up and run a simulation.                                                                                                                                                                                                                |
| aldec/                           | Contains a Riviera-PRO script <b>rivierapro_setup.tcl</b> to setup and run a simulation.                                                                                                                                                                                                  |
| /synopsys/vcs<br>/synopsys/vcsmx | Contains a shell script <b>vcs_setup.sh</b> to set up and run a VCS <sup>®</sup> simulation.                                                                                                                                                                                              |
| /synopsys/vesinx                 | Contains a shell script <b>vcsmx_setup.sh</b> and <b>synopsys_sim.setup</b> file to set up and run a VCS MX <sup>®</sup> simulation.                                                                                                                                                      |
| /cadence                         | Contains a shell script ncsim_setup.sh and other setup files to set up and run an NCSIM simulation.                                                                                                                                                                                       |
| /submodules                      | Contains HDL files for the IP core submodule.                                                                                                                                                                                                                                             |
| <child cores="" ip="">/</child>  | For each generated child IP core directory, Qsys generates /synth and / sim sub-directories.                                                                                                                                                                                              |

# **Simulating Altera IP Cores in other EDA Tools**

The Quartus II software supports RTL and gate-level design simulation of Altera IP cores in supported EDA simulators. Simulation involves setting up your simulator working environment, compiling simulation model libraries, and running your simulation.

You can use the functional simulation model and the testbench or example design generated with your IP core for simulation. The functional simulation model and testbench files are generated in a project subdirectory. This directory may also include scripts to compile and run the testbench. For a complete list of models or libraries required to simulate your IP core, refer to the scripts generated with the testbench. You can use the Quartus II NativeLink feature to automatically generate simulation files and scripts. NativeLink launches your preferred simulator from within the Quartus II software.

FFT IP Core Getting Started

Altera Corporation



Figure 2-5: Simulation in Quartus II Design Flow



Note: Post-fit timing simulation is supported only for Stratix IV and Cyclone IV devices in the current version of the Quartus II software. Altera IP supports a variety of simulation models, including simulation-specific IP functional simulation models and encrypted RTL models, and plain text RTL models. These are all cycle-accurate models. The models support fast functional simulation of your IP core instance using industry-standard VHDL or Verilog HDL simulators. For some cores, only the plain text RTL model is generated, and you can simulate that model. Use the simulation models only for simulation and not for synthesis or any other purposes. Using these models for synthesis creates a nonfunctional design.

#### **Related Information**

**Simulating Altera Designs** 

# **DSP Builder Design Flow**

DSP Builder shortens digital signal processing (DSP) design cycles by helping you create the hardware representation of a DSP design in an algorithm-friendly development environment.

This IP core supports DSP Builder. Use the DSP Builder flow if you want to create a DSP Builder model that includes an IP core variation; use IP Catalog if you want to create an IP core variation that you can instantiate manually in your design. For more information about the DSP Builder flow, refer to the

Altera Corporation FFT IP Core Getting Started



#### **Related Information**

Using MegaCore Functions chapter in the DSP Builder Handbook.

FFT IP Core Getting Started

Altera Corporation



# **FFT IP Core Functional Description**

3

2014.12.15

**UG-FFT** 





### **Fixed Transform FFTs**

The buffered, burst, and streaming FFTs use a radix-4 decomposition, which divides the input sequence recursively to form four-point sequences, requires only trivial multiplications in the four-point DFT. Radix-4 gives the highest throughput decomposition, while requiring non-trivial complex multiplications in the post-butterfly twiddle-factor rotations only. In cases where N is an odd power of two, the FFT MegaCore automatically implements a radix-2 pass on the last pass to complete the transform.

To maintain a high signal-to-noise ratio throughout the transform computation, the fixed transform FFTs use a block-floating-point architecture, which is a trade-off point between fixed-point and full-floating-point architectures.

**Related Information** 

**Block Floating Point Scaling** 

# **Variable Streaming FFTs**

The variable streaming FFTs use fixed-point data representation or the floating point representation.

If you select the fixed-point data representation, the FFT variation uses a radix  $2^2$  single delay feedback, which is fully pipelined. If you select the floating point representation, the FFT variation uses a mixed radix-4/2. For a length N transform,  $\log_4(N)$  stages are concatenated together. The radix  $2^2$  algorithm has the same multiplicative complexity of a fully pipelined radix-4 FFT, but the butterfly unit retains a radix-2 FFT. The radix-4/2 algorithm combines radix-4 and radix-2 FFTs to achieve the computational advantage of the radix-4 algorithm while supporting FFT computation with a wider range of transform lengths. The butterfly units use the DIF decomposition.

Fixed point representation allows for natural word growth through the pipeline. The maximum growth of each stage is 2 bits. After the complex multiplication the data is rounded down to the expanded data size using convergent rounding. The overall bit growth is less than or equal to  $log_2(N)+1$ .

The floating point internal data representation is single-precision floating-point (32-bit, IEEE 754 representation). Floating-point operations provide more precise computation results but are costly in hardware resources. To reduce the amount of logic required for floating point operations, the variable streaming FFT uses fused floating point kernels. The reduction in logic occurs by fusing together several floating point operations and reducing the number of normalizations that need to occur.

© 2015 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, ENPIRION, MAX, MEGACORE, NIOS, QUARTUS and STRATIX words and logos are trademarks of Altera Corporation and registered in the U.S. Patent and Trademark Office and in other countries. All other words and logos identified as trademarks or service marks are the property of their respective holders as described at <a href="https://www.altera.com/common/legal.html">www.altera.com/common/legal.html</a>. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera's standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services.

ISO 9001:2008 Registered



### **Fixed-Point Variable Streaming FFTs**

Fixed point variable streaming FFTs implements a radix-2<sup>2</sup> single delay feedback. It is similar to radix-2 single delay feedback. However, the twiddle factors are rearranged such that the multiplicative complexity is equivalent to a radix-4 single delay feedback.

Log<sub>2</sub>(N) stages each containing a single butterfly unit and a feedback delay unit that delays the incoming data by a specified number of cycles, halved at every stage. These delays effectively align the correct samples at the input of the butterfly unit for the butterfly calculations. Every second stage contains a modified radix-2 butterfly whereby a trivial multiplication by j is performed before the radix-2 butterfly operations. The output of the pipeline is in bit-reversed order.

The following scheduled operations occur in the pipeline for an FFT of length N = 16.

- 1. For the first 8 clock cycles, the samples are fed unmodified through the butterfly unit to the delay feedback unit.
- 2. The next 8 clock cycles perform the butterfly calculation using the data from the delay feedback unit and the incoming data. The higher order calculations are sent through to the delay feedback unit while the lower order calculations are sent to the next stage.
- **3.** The next 8 clock cycles feed the higher order calculations stored in the delay feedback unit unmodified through the butterfly unit to the next stage.

Subsequent data stages use the same principles. However, the delays in the feedback path are adjusted accordingly.

### **Floating-Point Variable Streaming FFTs**

floatin-point variable streaming FFTs implments a mixed radix-4/2, which combines the advantages of using radix-2 and radix-4 butterflies.

The FFT has  $ceiling(log_4(N))$  stages. If transform length is an integral power of four, a radix-4 FFT implements all of the  $log_4(N)$  stages. If transform length is not an integral power of four, the FFT implements  $ceiling(log_4(N))$  1 of the stages in a radix-4, and implements the remaining stage using a radix-2.

Each stage contains a single butterfly unit and a feedback delay unit. The feedback delay unit delays the incoming data by a specified number of cycles; in each stage the number of cycles of delay is one quarter of the number of cycles of delay in the previous stage. The delays align the butterfly input samples correctly for the butterfly calculations. The output of the pipeline is in index-reversed order.

# **Input and Output Orders**

You can select input and output orders generated by the FFT.

**Table 3-1: Input and Output Order Options** 

| Input Order     | Output<br>Order  | Mode                     | Comments                                                                                                |  |
|-----------------|------------------|--------------------------|---------------------------------------------------------------------------------------------------------|--|
| Natural         | Bit reversed     |                          |                                                                                                         |  |
| Bit<br>reversed | Natural          | Engine-<br>only          | Requires minimum memory and minimum latency.                                                            |  |
| DC-<br>centered | Bit-<br>reversed |                          |                                                                                                         |  |
| Natural         | Natural          | Engine with bit-reversal |                                                                                                         |  |
| Bit<br>reversed | Bit<br>reversed  |                          | At the output, requires an extra N complex memory words and an additional N clock cycles latency, where |  |
| DC-<br>centered | Natural          |                          | N is the size of the transform.                                                                         |  |

Some applications for the FFT require an FFT > user operation > IFFT chain. In this case, choosing the input order and output order carefully can lead to significant memory and latency savings. For example, consider where the input to the first FFT is in natural order and the output is in bit-reversed order (FFT is operating in engine-only mode). In this example, if the IFFT operation is configured to accept bit-reversed inputs and produces natural order outputs (IFFT is operating in engine-only mode), only the minimum amount of memory is required, which provides a saving of N complex memory words, and a latency saving of N clock cycles, where N is the size of the current transform.

# **FFT Processor Engines**

You can parameterize the FFT MegaCore function to use either quad-output or single-output engines. To increase the overall throughput of the FFT MegaCore function, you may also use multiple parallel engines of a variation.

# **Quad-Output FFT Engine**

To minimize transform time, use a quad-output FFT engine. Quad-output refers to the throughput of the internal FFT butterfly processor. The engine implementation computes all four radix-4 butterfly complex outputs in a single clock cycle.



**Send Feedback** 

Figure 3-1: Quad-Output FFT Engine



The FFT reads complex data samples x[k,m] from internal memory in parallel and reorders by switch (SW). Next, the radix-4 butterfly processor processes the ordered samples to form the complex outputs G[k,m]. Because of the inherent mathematics of the radix-4 DIF decomposition, only three complex multipliers perform the three non-trivial twiddle-factor multiplications on the outputs of the butterfly processor. To discern the maximum dynamic range of the samples, the block-floating point units (BFPU) evaluate the four outputs in parallel. The FFT discards the appropriate LSBs and rounds and reorders the complex values before writing them back to internal memory.

# **Single-Output FFT Engine**

For the minimum-size FFT function, use a single-output engine. The term single-output refers to the throughput of the internal FFT butterfly processor. In the engine, the FFT calculates a single butterfly output per clock cycle, requiring a single complex multiplier.

Figure 3-2: Single-Output FFT Engine



# I/O Data Flow

### **Streaming FFT**

The streaming FFT allows continuous processing of input data, and outputs a continuous complex data stream without the need to halt the data flow in or out of the FFT IP core.

The streaming FFT generates a design with a quad output FFT engine and the minimum number of parallel FFT engines for the required throughput.

A single FFT engine provides enough performance for up to a 1,024-point streaming I/O data flow FFT.

#### **Using the Streaming FFT**

When the data transfer is complete, the FFT deasserts sink\_sop and loads the data samples in natural order.



**Send Feedback** 

Figure 3-3: FFT Streaming Data Flow Simulation Waveform



When the final sample loads, the source asserts sink\_eop and sink\_valid for the last data transfer.

Figure 3-4: FFT Streaming Data Flow Input Flow Control



- 1. Deassert the system reset, The data source asserts sink\_valid to indicate to the FFT function that valid data is available for input.
- **2.** Assert both the sink\_valid and the sink\_ready for a successful data transfer.

#### **Related Information**

**Avalon Interface Specifications** 

#### Changing the Direction on a Block-by-Block Basis

1. Assert or deassert inverse (appropriately) simultaneously with the application of the sink\_sop pulse (concurrent with the first input data sample of the block).

When the FFT completes the transform of the input block, it asserts source\_valid and outputs the complex transform domain data block in natural order. The FFT function asserts <code>source\_sop</code> to indicate the first output sample.

Figure 3-5: FFT Streaming Data Flow Output Flow Control



After N data transfers, the FFT asserts source\_eop to indicate the end of the output data block

### **Enabling the Streaming FFT**

- 1. You must assert the sink\_valid signal for the FFT to assert source\_valid (and a valid data output).
- 2. To extract the final frames of data from the FFT, you need to provide several frames where the sink\_valid signal is asserted and apply the sink\_sop and sink\_eop signals in accordance with the Avalon-ST specification.

### **Variable Streaming**

The variable streaming FFT allows continuous streaming of input data and produces a continuous stream of output data similar to the streaming FFT. With the variable streaming FFT, the transform length represents the maximum transform length. You can perform all transforms of length 2m where  $6 < m < \log 2 (\text{transform length})$  at runtime.

### **Changing Block Size**

To change the size of the FFT on a block-by-block basis, change the value of the fftpts simultaneously with the application of the sink\_sop pulse (concurrent with the first input data sample of the block). fftpts uses a binary representation of the size of the transform, therefore for a block with maximum transfer size of 1,024. Table 3-2 shows the value of the fftpts signal and the equivalent transform size.

Table 3-2: fftpts and Transform Size

| fftpts      | Transform Size |
|-------------|----------------|
| 1000000000  | 1,024          |
| 0100000000  | 512            |
| 00100000000 | 256            |
| 00010000000 | 128            |



| ı | fftpts      | Transform Size |
|---|-------------|----------------|
|   | 00001000000 | 64             |

#### **Changing Direction**

To change direction on a block-by-block basis:

1. Assert or deassert inverse (appropriately) simultaneously with the application of the sink\_sop pulse (concurrent with the first input data sample of the block).

When the FFT completes the transform of the input block, it asserts <code>source\_valid</code> and outputs the complex transform domain data block. The FFT function asserts the <code>source\_sop</code> to indicate the first output sample. The order of the output data depends on the output order that you select in IP Toolbench. The output of the FFT may be in natural order or bit-reversed order. **Figure 3-6** shows the output flow control when the output order is bit-reversed. If the output order is natural order, data flow control remains the same, but the order of samples at the output is in sequential order 1..N.

Figure 3-6: Output Flow Control—Bit Reversed Order



#### I/O Order

The input order allows you to select the order in which you feed the samples to the FFT.

Table 3-3: Input Order

| Order                     | Description                                                                                                                      |
|---------------------------|----------------------------------------------------------------------------------------------------------------------------------|
| Natural<br>order          | The FFT requires the order of the input samples to be sequential $(1, 2,, n-1, n)$ where n is the size of the current transform. |
| Bit<br>reverse<br>order   | The FFT requires the input samples to be in bit-reversed order.                                                                  |
| Digit<br>Reverse<br>Order | The FFT requires the input samples to be in digit-reversed order.                                                                |
| -N/2 to<br>N/2            | The FFT requires the input samples to be in the order –N/2 to (N/2) – 1 (also known as DC-centered order)                        |

Similarly the output order specifies the order in which the FFT generates the output. Whether you can select **Bit Reverse Order** or **Digit Reverse Order** depends on your **Data Representation** (**Fixed Point** or **Floating Point**). If you select **Fixed Point**, the FFT variation implements the radix-22 algorithm and the

reverse I/O order option is **Bit Reverse Order**. If you select **Floating Point**, the FFT variation implements the mixed radix-4/2 algorithm and the reverse I/O order option is **Digit Reverse Order**.

For sample digit-reversed order, if n is a power of four, the order is radix-4 digit-reversed order, in which two-bit digits in the sample number are units in the reverse ordering. For example, if n = 16, sample number 4 becomes the second sample in the sample stream (by reversal of the digits in 0001, the location in the sample stream, to 0100). However, in mixed radix-4/2 algorithm, n need not be a power of four. If n is not a power of four, the two-bit digits are grouped from the least significant bit, and the most significant bit becomes the least significant bit in the digit-reversed order. For example, if n = 32, the sample number 18 (10010) in the natural ordering becomes sample number 17 (10001) in the digit-reversed order.

#### **Enabling the Variable Streaming FFT**

- 1. Assert sink\_valid.
- **2.** Transfer valid data to the FFT. The FFT processes data.

Example 3-1: FFT Behavior When sink\_valid is Deasserted



- 1. Deassert sink\_valid during a frame to stall the FFT, which then processes no data until you assert sink\_valid. Any previous frames that are still in the FFT also stall.
- 2. If you deassert sink\_valid between frames, the FFT processes and transfers the data currently in the FFT to the output.
- 3. Disable the FFT by deasserting the clk\_en signal.

#### **Dynamically Changing the FFT Size**

The FFT stalls the incoming data (deasserts the sink\_ready signal) until all the FFT processes and transfers all of the previous FFT frames of the previous FFT size to the output.

Send Feedback

Figure 3-7: Dynamically Changing the FFT Size



1. Change the size of the incoming FFT,

#### I/O Order

The **I/O order** determines order of samples entering and leaving the FFT and also determines if the FFT is operating in engine-only mode or engine with bit-reversal or digit-reversal mode.

If the FFT operates in engine-only mode, the output data is available after approximately N+ latency clocks cycles after the first sample was input to the FFT. Latency represents a small latency through the FFT core and depends on the transform size. For engine with bit-reversal mode, the output is available after approximately 2N+ latency cycles.

Figure 3-8: Data Flow—Engine-Only Mode



Figure 3-9: Data Flow—Engine with Bit-Reversal or Digit-Reversal Mode



### **Buffered Burst**

The buffered burst I/O data flow FFT requires fewer memory resources than the streaming I/O data flow FFT, but the tradeoff is an average block throughput reduction.

#### **Enabling the Buffered Burst FFT**

Figure 3-10: FFT Buffered Burst Data Flow Input Flow Control



- 1. Following the interval of time where the FFT processor reads the input samples from an internal input buffer, it re-asserts sink\_ready indicating it is ready to read in the next input block. Apply a pulse on sink\_sop aligned in time with the first input sample of the next block to indicate the beginning of the subsequent input block.
- 2. As in all data flows, the logical level of inverse for a particular block is registered by the FFT at the time when you assert the start-of-packet signal, sink\_sop.

When the FFT completes the transform of the input block, it asserts the source\_valid and outputs the complex transform domain data block in natural order .

Send Feedback

Figure 3-11: FFT Buffered Burst Data Flow Output Flow Control



Signals source\_sop and source\_eop indicate the start-of-packet and end-of-packet for the output block data respectively.

Note: You must assert the sink\_valid signal for source\_valid to be asserted (and a valid data output). You must leave sink\_valid signal asserted at the end of data transfers to extract the final frames of data from the FFT.

RT\*\* For information about enabling the buffered burst FFT, refer to Enabling the Streaming FFT.

- **1.** Deassert the system reset.
- 2. Asserts sink\_valid to indicate to the FFT function that valid data is available for input. A successful data transfer occurs when both the sink\_valid and the sink\_ready are asserted.
- 3. Load the first complex data sample into the FFT function and simultaneously asserts sink sop to indicate the start of the input block.
- 4. On the next clock cycle, sink\_sop is deasserted and you must load the following N 1 complex input data samples in natural order.
- 5. On the last complex data sample, assert sink\_eop.
- **6.** When you load the input block, the FFT function begins computing the transform on the stored input block. Hold the sink ready signal high as you can transfer the first few samples of the subsequent frame into the small FIFO at the input. If this FIFO buffer is filled, the FFT deasserts the sink\_ready signal. It is not mandatory to transfer samples during sink\_ready cycles.

Example 3-2: FFT Buffered Burst Data Flow Simulation Waveform



#### **Related Information**

**Enabling the Streaming FFT** on page 3-7

#### **Burst**

The burst I/O data flow FFT operates similarly to the buffered burst FFT, except that the burst FFT requires even lower memory resources for a given parameterization at the expense of reduced average throughput. The following figure shows the simulation results for the burst FFT. The signals source\_valid and sink\_ready indicate, to the system data sources and slave sinks either side of the FFT, when the FFT can accept a new block of data and when a valid output block is available on the FFT output.



Figure 3-12: FFT Burst Data Flow Simulation Waveform



In a burst I/O data flow FFT, the FFT can process a single input block only. A small FIFO buffer at the sink of the block and <code>sink\_ready</code> is not deasserted until this FIFO buffer is full. You can provide a small number of additional input samples associated with the subsequent input block. You don't have to provide data to the FFT during <code>sink\_ready</code> cycles. The burst FFT can load the rest of the subsequent FFT frame only when the previous transform is fully unloaded.

RT\*\* For information about enabling the buffered burst FFT, refer to **Enabling the Streaming FFT**.

### **Related Information**

**Enabling the Streaming FFT** on page 3-7

### **FFT IP Core Parameters**

**Table 3-4: Basic Parameters** 

| Parameter           | Value                                                                                                                                              | Description                                                                         |
|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|
| Transform Length    | 64, 128, 256, 512,<br>1024, 2048, 4096,<br>8192, 16384, 32768,<br>or 65536. Variable<br>streaming also<br>allows 8, 16, 32,<br>131072, and 262144. | The transform length. For variable streaming, this value is the maximum FFT length. |
| Transform Direction | Forward, reverse, bidirectional                                                                                                                    | The transform direction.                                                            |

| Parameter           | Value                                                                      | Description                                                                                                                                                                                                                                                                                                                                                                                             |
|---------------------|----------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| I/O Data Flow       | Streaming<br>Variable Streaming<br>Buffered Burst<br>Burst                 | If you select Variable Streaming and Floating Point, the precision is automatically set to 32, and the reverse I/O order options are Digit Reverse Order.                                                                                                                                                                                                                                               |
| I/O Order           | Bit Reverse Order,<br>Digit Reverse Order,<br>Natural Order, N/2<br>to N/2 | The input and output order for data entering and leaving the FFT (variable streaming FFT only).  The <b>Digit Reverse Order</b> option replaces the <b>Bit Reverse Order</b> in variable streaming floating point variations.                                                                                                                                                                           |
| Data Representation | Fixed point or single<br>floating point, or<br>block floating point        | The internal data representation type (variable streaming FFT only), either fixed point with natural bit-growth or single precision floating point. Floating-point bidirectional IP cores expect input in natural order for forward transforms and digit reverse order for reverse transforms. The output order is digit reverse order for forward transforms and natural order for reverse transforms. |
| Data Width          | 8, 10, 12, 14, 16, 18,<br>20, 24, 28, 32                                   | The data precision. The values 28 and 32 are available for variable streaming only.                                                                                                                                                                                                                                                                                                                     |
| Twiddle Width       | 8, 10, 12, 14, 16, 18,<br>20, 24, 28, 32                                   | The twiddle precision. The values 28 and 32 are available for variable streaming only. Twiddle factor precision must be less than or equal to data precision.                                                                                                                                                                                                                                           |

The FFT IP core's advanced parameters.

**Table 3-5: Advanced Parameters** 

| Parameter                         | Value                         | Description                                                                                                                                                                                                                                                                                                                                     |
|-----------------------------------|-------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| FFT Engine Architecture           | Quad Output, Single<br>Output | Choose between one, two, and four quad-output FFT engines working in parallel. Alternatively, if                                                                                                                                                                                                                                                |
| Number of Parallel<br>FFT Engines | 1, 2, 4                       | you have selected a single-output FFT engine architecture, you may choose to implement one or two engines in parallel. Multiple parallel engines reduce transform time at the expense of device resources, which allows you to select the desired area and throughput trade-off point.  Not available for variable streaming or streaming FFTs. |

**FFT IP Core Functional Description** 

| Parameter                            | Value     | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|--------------------------------------|-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DSP Block Resource<br>Optimization   | On or Off | Turn on for multiplier structure optimizations. These optimizations use different DSP block configurations to pack multiply operations and reduce DSP resource requirements. This optimization may reduce $F_{MAX}$ because of the structure of the specific configurations of the DSP blocks when compared to the basic operation. Specifically, on Stratix V devices, this optimization may also come at the expense of accuracy. You can evaluate it using the MATLAB model provided and bit wise accurate simulation models. If you turn on <b>DSP Block Resource Optimization</b> and your variation has data precision between 18 and 25 bits, inclusive, and twiddle precision less than or equal to 18 bits, the FFT MegaCore function configures the DSP blocks in complex 18 x 25 multiplication mode. |
| Enable Hard Floating<br>Point Blocks | On or off | For Arria 10 devices and single-floating-point FFTs only.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |

## **FFT IP Core Interfaces and Signals**

The FFT IP core uses the Avalon-ST interface. You may achieve a higher clock rate by driving the source ready signal source\_ready of the FFT high, and not connecting the sink ready signal sink\_ready.

The FFT MegaCore function has a READY\_LATENCY value of zero.

### **Avalon-ST Interfaces in DSP IP Cores**

Avalon-ST interfaces define a standard, flexible, and modular protocol for data transfers from a source interface to a sink interface.

The input interface is an Avalon-ST sink and the output interface is an Avalon-ST source. The Avalon-ST interface supports packet transfers with packets interleaved across multiple channels.

Avalon-ST interface signals can describe traditional streaming interfaces supporting a single stream of data without knowledge of channels or packet boundaries. Such interfaces typically contain data, ready, and valid signals. Avalon-ST interfaces can also support more complex protocols for burst and packet transfers with packets interleaved across multiple channels. The Avalon-ST interface inherently synchronizes multichannel designs, which allows you to achieve efficient, time-multiplexed implementations without having to implement complex control logic.

Avalon-ST interfaces support backpressure, which is a flow control mechanism where a sink can signal to a source to stop sending data. The sink typically uses backpressure to stop the flow of data when its FIFO buffers are full or when it has congestion on its output.

### **Related Information**

Avalon Interface Specifications



# **FFT IP Core Avalon-ST Signals**

**Table 3-6: Avalon-ST Signals** 

| Signal Name | Directio<br>n | Avalon-ST Type | Size                       | Description                                                                                                                                                                                                                                                                                                                     |
|-------------|---------------|----------------|----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| clk         | Input         | clk            | 1                          | Clock signal that clocks all internal FFT engine components.                                                                                                                                                                                                                                                                    |
| reset_n     | Input         | reset_n        | 1                          | Active-low asynchronous reset signal. This signal can be asserted asynchronously, but must remain asserted at least one clk clock cycle and must be deasserted synchronously with clk.                                                                                                                                          |
| sink_eop    | Input         | endofpacket    | 1                          | Indicates the end of the incoming FFT frame.                                                                                                                                                                                                                                                                                    |
| sink_error  | Input         | error          | 2                          | Indicates an error has occurred in an upstream module, because of an illegal usage of the Avalon-ST protocol. The following errors are defined:  • 00 = no error  • 01 = missing start of packet (SOP)  • 10 = missing end of packet (EOP)  • 11 = unexpected EOP  If this signal is not used in upstream modules, set to zero. |
| sink_imag   | Input         | data           | data<br>precision<br>width | Imaginary input data, which represents a signed number of data precision bits.                                                                                                                                                                                                                                                  |
| sink_ready  | Output        | ready          | 1                          | Asserted by the FFT engine when it can accept data. It is not mandatory to provide data to the FFT during ready cycles.                                                                                                                                                                                                         |
| sink_real   | Input         | data           | data<br>precision<br>width | Real input data, which represents a signed number of data precision bits.                                                                                                                                                                                                                                                       |
| sink_sop    | Input         | startofpacket  | 1                          | Indicates the start of the incoming FFT frame.                                                                                                                                                                                                                                                                                  |
| sink_valid  | Input         | valid          | 1                          | Asserted when data on the data bus is valid. When sink_valid and sink_ready are asserted, a data transfer takes place                                                                                                                                                                                                           |
| sink_data   | Input         | data           | Variable                   | In Qsys systems, this Avalon-ST-compliant data bus includes all the Avalon-ST input data signals.                                                                                                                                                                                                                               |



| Signal Name      | Directio<br>n | Avalon-ST Type | Size                                     | Description                                                                                                                                                                                                                                                                                                                                                                                                         |
|------------------|---------------|----------------|------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| source_eop       | Output        | endofpacket    | 1                                        | Marks the end of the outgoing FFT frame. Only valid when source_valid is asserted.                                                                                                                                                                                                                                                                                                                                  |
| source_<br>error | Output        | error          | 2                                        | Indicates an error has occurred either in an upstream module or within the FFT module (logical OR of sink_error with errors generated in the FFT).                                                                                                                                                                                                                                                                  |
| source_exp       | Output        | data           | 6                                        | Streaming, burst, and buffered burst<br>FFTs only. Signed block exponent:<br>Accounts for scaling of internal signal<br>values during FFT computation.                                                                                                                                                                                                                                                              |
| source_<br>imag  | Output        | data           | (data precision width + growth) (1)      | Imaginary output data. For burst, buffered burst, streaming, and variable streaming floating point FFTs, the output data width is equal to the input data width. For variable streaming fixed point FFTs, the size of the output data is dependent on the number of stages defined for the FFT and is 2 bits per radix 2 <sup>2</sup> stage.                                                                        |
| source_<br>ready | Input         | ready          | 1                                        | Asserted by the downstream module if it is able to accept data.                                                                                                                                                                                                                                                                                                                                                     |
| source_<br>real  | Output        | data           | (data<br>precision<br>width +<br>growth) | Real output data. For burst, buffered burst, streaming, and variable streaming floating point FFTs, the output data width is equal to the input data width. For variable streaming fixed point FFTs, the size of the output data is dependent on the number of stages defined for the FFT and is 2 bits per radix 2 <sup>2</sup> stage. Variable streaming fixed point FFT only. Growth is log <sub>2</sub> (N) +1. |
| source_sop       | Output        | startofpacket  | 1                                        | Marks the start of the outgoing FFT frame. Only valid when source_valid is asserted.                                                                                                                                                                                                                                                                                                                                |
| source_<br>valid | Output        | valid          | 1                                        | Asserted by the FFT when there is valid data to output.                                                                                                                                                                                                                                                                                                                                                             |
| source_<br>data  | Output        | data           | Variable                                 | In Qsys systems, this Avalon-ST-compliant data bus includes all the Avalon-ST output data signals.                                                                                                                                                                                                                                                                                                                  |

### **Related Information**

- Avalon Streaming Interface Specification
- Recommended Design Practices

### **Component Specific Signals**

The component specific signals.

**Table 3-7: Component Specific Signals** 

| Signal Name    | Directio<br>n | Size                                              | Description                                                                                                                                                                                                    |
|----------------|---------------|---------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| fftpts_<br>in  | Input         | log <sub>2</sub> (maximum<br>number of<br>points) | The number of points in this FFT frame. If this value is not specified, the FFT can not be a variable length. The default behavior is for the FFT to have fixed length of maximum points. Only sampled at SOP. |
| fftpts_<br>out | Output        | log <sub>2</sub> (maximum<br>number of<br>points) | The number of points in this FFT frame synchronized to the Avalon-ST source interface. Variable streaming only.                                                                                                |
| inverse        | Input         | 1                                                 | Inverse FFT calculated if asserted. Only sampled at SOP.                                                                                                                                                       |

Incorrect usage of the Avalon-ST interface protocol on the sink interface results in a error on source\_error. Table 3-8 defines the behavior of the FFT when an incorrect Avalon-ST transfer is detected. If an error occurs, the behavior of the FFT is undefined and you must reset the FFT with reset\_n.

**Table 3-8: Error Handling Behavior** 

| Error              | source_<br>error | Description                                                                              |
|--------------------|------------------|------------------------------------------------------------------------------------------|
| Missing<br>SOP     | 01               | Asserted when valid goes high, but there is no start of frame.                           |
| Missing<br>EOP     | 10               | Asserted if the FFT accepts N valid samples of an FFT frame, but there is no EOP signal. |
| Unexpecte<br>d EOP | 11               | Asserted if EOP is asserted before N valid samples are accepted.                         |

## **FFT IP Core Signals in Qsys Systems**

When you instantiate your design in a Qsys sytem and target Arria 10 devices, the signals appear as a single bus:

- In:
  - Real
  - Imaginary
- Out:
  - Real
  - Imaginary

**Send Feedback** 

# **Block Floating Point Scaling**

4

2014.12.15

**UG-FFT** 





Block-floating-point (BFP) scaling is a trade-off between fixed-point and full floating-point FFTs.

In fixed-point FFTs, the data precision needs to be large enough to adequately represent all intermediate values throughout the transform computation. For large FFT transform sizes, an FFT fixed-point implementation that allows for word growth can make either the data width excessive or can lead to a loss of precision.

Floating-point FFTs represents each number as a mantissa with an individual exponent. The improved precision is offset by demand for increased device resources.

In a block-floating point FFT, all of the values have an independent mantissa but share a common exponent in each data block. Data is input to the FFT function as fixed point complex numbers (even though the exponent is effectively 0, you do not enter an exponent).

The block-floating point FFT ensures full use of the data width within the FFT function and throughout the transform. After every pass through a radix-4 FFT, the data width may grow up to  $\log_2{(42)} = 2.5$  bits. The data scales according to a measure of the block dynamic range on the output of the previous pass. The FFT accumulates the number of shifts and then outputs them as an exponent for the entire block. This shifting ensures that the minimum of least significant bits (LSBs) are discarded prior to the rounding of the post-multiplication output. In effect, the block-floating point representation is as a digital automatic gain control. To yield uniform scaling across successive output blocks, you must scale the FFT function output by the final exponent.

In comparing the block-floating point output of the Altera FFT MegaCore function to the output of a full precision FFT from a tool like MATLAB, you must scale the output by 2 (–exponent\_out) to account for the discarded LSBs during the transform.

Unlike an FFT block that uses floating point arithmetic, a block-floating-point FFT block does not provide an input for exponents. Internally, a complex value integer pair is represented with a single scale factor that is typically shared among other complex value integer pairs. After each stage of the FFT, the largest output value is detected and the intermediate result is scaled to improve the precision. The exponent records the number of left or right shifts used to perform the scaling. As a result, the output magnitude relative to the input level is:

output\*2-exponent

For example, if exponent = -3, the input samples are shifted right by three bits, and hence the magnitude of the output is output \*23.

© 2015 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, ENPIRION, MAX, MEGACORE, NIOS, QUARTUS and STRATIX words and logos are trademarks of Altera Corporation and registered in the U.S. Patent and Trademark Office and in other countries. All other words and logos identified as trademarks or service marks are the property of their respective holders as described at <a href="https://www.altera.com/common/legal.html">www.altera.com/common/legal.html</a>. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera's standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services.

ISO 9001:2008 Registered



After every pass through a radix-2 or radix-4 engine in the FFT core, the addition and multiplication operations cause the data bits width to grow. In other words, the total data bits width from the FFT operation grows proportionally to the number of passes. The number of passes of the FFT/IFFT computation depends on the logarithm of the number of points.

A fixed-point FFT needs a huge multiplier and memory block to accommodate the large bit width growth to represent the high dynamic range. Though floating-point is powerful in arithmetic operations, its power comes at the cost of higher design complexity such as a floating-point multiplier and a floating-point adder. BFP arithmetic combines the advantages of floating-point and fixed-point arithmetic. BFP arithmetic offers a better signal-to-noise ratio (SNR) and dynamic range than does floating-point and fixed-point arithmetic with the same number of bits in the hardware implementation.

In a block-floating-point FFT, the radix-2 or radix-4 computation of each pass shares the same hardware, with the data being read from memory, passed through the core engine, and written back to memory. Before entering the next pass, each data sample is shifted right (an operation called "scaling") if there is a carry-out bit from the addition and multiplication operations. The number of bits shifted is based on the difference in bit growth between the data sample and the maximum data sample detected in the previous stage. The maximum bit growth is recorded in the exponent register. Each data sample now shares the same exponent value and data bit width to go to the next core engine. The same core engine can be reused without incurring the expense of a larger engine to accommodate the bit growth.

The output SNR depends on how many bits of right shift occur and at what stages of the radix core computation they occur. In other words, the signal-to-noise ratio is data dependent and you need to know the input signal to compute the SNR.

### **Possible Exponent Values**

Depending on the length of the FFT/IFFT, the number of passes through the radix engine is known and therefore the range of the exponent is known. The possible values of the exponent are determined by the following equations:

 $P = ceil\{log_4N\}$ , where N is the transform length

R = 0 if  $log_2N$  is even, otherwise R = 1

Single output range = (-3P+R, P+R-4)

Quad output range = (-3P+R+1, P+R-7)

These equations translate to the values in **Table 4-1**.

Table 4-1: Exponent Scaling Values for FFT / IFFT (1)

| N     | Р | Single Out | put Engine      | Quad Output Engine |         |  |  |
|-------|---|------------|-----------------|--------------------|---------|--|--|
| IN    |   | Max (2)    | Max (2) Min (2) |                    | Min (2) |  |  |
| 64    | 3 | -9         | -1              | -8                 | -4      |  |  |
| 128   | 4 | -11        | 1               | -10                | -2      |  |  |
| 256   | 4 | -12        | 0               | -11                | -3      |  |  |
| 512   | 5 | -14        | 2               | -13                | -1      |  |  |
| 1,024 | 5 | -15        | 1               | -14                | -2      |  |  |

Altera Corporation Block Floating Point Scaling



| N      | P | Single Out | put Engine | Quad Output Engine |         |  |  |
|--------|---|------------|------------|--------------------|---------|--|--|
| 14     | ľ | Max (2)    | Min (2)    | Max (2)            | Min (2) |  |  |
| 2,048  | 6 | -17        | 3          | -16                | 0       |  |  |
| 4,096  | 6 | -18        | 2          | -17                | -1      |  |  |
| 8,192  | 7 | -20        | 4          | -19                | 1       |  |  |
| 16,384 | 7 | -21        | 3          | -20                | 0       |  |  |

#### Note to **Table 4-1**:

- 1. This table lists the range of exponents, which is the number of scale events that occurred internally. For IFFT, the output must be divided by N externally. If more arithmetic operations are performed after this step, the division by N must be performed at the end to prevent loss of precision.
- **2.** The maximum and minimum values show the number of times the data is shifted. A negative value indicates shifts to the left, while a positive value indicates shifts to the right.

## **Implementing Scaling**

To implement the scaling algorithm, follow these steps:

- 1. Determine the length of the resulting full scale dynamic range storage register. To get the length, add the width of the data to the number of times the data is shifted. For example, for a 16-bit data, 256-point Quad Output FFT/IFFT with Max = -11 and Min = -3. The Max value indicates 11 shifts to the left, so the resulting full scaled data width is 16 + 11, or 27 bits.
- 2. Map the output data to the appropriate location within the expanded dynamic range register based upon the exponent value. To continue the above example, the 16-bit output data [15..0] from the FFT/ IFFT is mapped to [26..11] for an exponent of –11, to [25..10] for an exponent of –10, to [24..9] for an exponent of –9, and so on.
- 3. Sign extend the data within the full scale register.

## **Example of Scaling**

A sample of Verilog HDL code that illustrates the scaling of the output data (for exponents -11 to -9) with sign extension is shown in the following example:

```
case (exp)
6'b110101 : //-11 Set data equal to MSBs
begin
full_range_real_out[26:0] <= {real_in[15:0],11'b0};
full_range_imag_out[26:0] <= {imag_in[15:0],11'b0};
end
    6'b110110 : //-10 Equals left shift by 10 with sign extension
begin
full_range_real_out[26] <= {real_in[15]};</pre>
```

Block Floating Point Scaling

Altera Corporation



```
full_range_real_out[25:0] <= {real_in[15:0],10'b0};
full_range_imag_out[26] <= {imag_in[15]};
full_range_imag_out[25:0] <= {imag_in[15:0],10'b0};
end
6'bl10111 : //-9 Equals left shift by 9 with sign extension
begin
full_range_real_out[26:25] <= {real_in[15],real_in[15]};
full_range_real_out[24:0] <= {real_in[15:0],9'b0};
full_range_imag_out[26:25] <= {imag_in[15],imag_in[15]};
full_range_imag_out[24:0] <= {imag_in[15:0],9'b0};
end
    .
    .
    .
    endcase</pre>
```

In this example, the output provides a full scale 27-bit word. You must choose how many and which bits must be carried forward in the processing chain. The choice of bits determines the absolute gain relative to the input sample level.

Figure 4-1 demonstrates the effect of scaling for all possible values for the 256-point quad output FFT with an input signal level of 0x5000. The output of the FFT is 0x280 when the exponent = -5. The figure illustrates all cases of valid exponent values of scaling to the full scale storage register [26..0]. Because the exponent is -5, you must check the register values for that column. This data is shown in the last two columns in the figure. Note that the last column represents the gain compensated data after the scaling (0x0005000), which agrees with the input data as expected. If you want to keep 16 bits for subsequent processing, you can choose the bottom 16 bits that result in 0x5000. However, if you choose a different bit range, such as the top 16 bits, the result is 0x0000A. Therefore, the choice of bits affects the relative gain through the processing chain.

Because this example has 27 bits of full scale resolution and 16 bits of output resolution, choose the bottom 16 bits to maintain unity gain relative to the input signal. Choosing the LSBs is not the only solution or the correct one for all cases. The choice depends on which signal levels are important. One way to empirically select the proper range is by simulating test cases that implement expected system data. The output of the simulations must tell what range of bits to use as the output register. If the full scale data is not used (or just the MSBs), you must saturate the data to avoid wraparound problems.

Altera Corporation Block Floating Point Scaling



Figure 4-1: Scaling of Input Data Sample = 0x5000

|                |        |             |                | Exponent    |             |             |        |    | Looking at  | t Exponent  |             |             |                    |                         |
|----------------|--------|-------------|----------------|-------------|-------------|-------------|--------|----|-------------|-------------|-------------|-------------|--------------------|-------------------------|
| Bit            | hput   |             | Output<br>Data | -11         | -10         | -9          | -8     | -7 | -6          | -5          | -4          | -3          |                    | -5                      |
|                | 5000 1 | 1           | 280 H          |             |             |             |        |    |             |             |             |             | Taking All<br>Bits | Sign<br>Extend /<br>Pad |
| 26<br>25       |        |             |                | 0           | 0           |             |        |    |             |             |             |             |                    | 0                       |
| 24<br>23       |        |             |                | 0           | 0           | 0           | 0      |    |             |             |             |             |                    | 0                       |
| 22<br>21       |        |             |                | 0           | 0           | 0           | 0      | 0  | 0           |             |             |             |                    | 0                       |
| 20<br>19<br>18 |        |             |                | 1<br>0<br>1 | 0<br>1<br>0 | 0<br>0<br>1 | 0      | 0  | 0           | 0           | 0           | 0           | 0                  | 0                       |
| 17<br>16       |        |             |                | 0           | 1           | 0           | 1      | 0  | 0           | 0           | 0           | 0           | 0                  | 0                       |
| 15<br>14       |        | 0<br>1      | 0              | 0           | 0           | 0           | 1<br>0 | 0  | 1<br>0      | 0<br>1      | 0           | 0           | 0<br>1             | 0<br>1                  |
| 13<br>12<br>11 |        | 0<br>1<br>0 | 0              | 0           | 0           | 0           | 0      | 0  | 1<br>0<br>0 | 0<br>1<br>0 | 1<br>0<br>1 | 0<br>1<br>0 | 0<br>1<br>0        | 0<br>1<br>0             |
| 10             |        | 0           | 0              |             | 0           | 0           | 0      | 0  | 0           | 0           | 0           | 1           | 0                  | 0                       |
| 8<br>7         |        | 0           | 0              |             |             |             | 0      | 0  | 0           | 0           | 0           | 0           | 0                  | 0                       |
| 6<br>5<br>4    |        | 0           | 0              |             |             |             |        |    | 0           | 0           | 0           | 0           | 0                  | 0                       |
| 3 2            |        | 0           | 0              |             |             |             |        |    |             |             |             | 0           |                    | 0                       |
| 1<br>0         |        | 0           | 0              |             |             |             |        |    |             |             |             |             |                    | 0                       |

## **Unity Gain in an IFFT+FFT Pair**

Given sufficiently high precision, such as with floating-point arithmetic, it is theoretically possible to obtain unity gain when an IFFT and FFT are cascaded. However, in BFP arithmetic, special attention must be paid to the exponent values of the IFFT/FFT blocks to achieve the unity gain. This section explains the steps required to derive a unity gain output from an Altera IFFT/FFT MegaCore pair, using BFP arithmetic.

BFP arithmetic does not provide an input for the exponent, so you must keep track of the exponent from the IFFT block if you are feeding the output to the FFT block immediately thereafter and divide by N at the end to acquire the original signal magnitude.

Block Floating Point Scaling

Altera Corporation



### Figure 4-2: Derivation to Achieve IFFT/FFT Pair Unity Gain

IFFT

$$x0$$

$$= \frac{1}{N} \times IFFTa \quad (x0)$$

$$= \frac{1}{N} \times data1 \times 2^{-exp1}$$

$$= \frac{1}{N} \times 2^{-exp1} \times FFTa \quad (data1)$$

$$= \frac{1}{N} \times 2^{-exp1} \times data2 \times 2^{-exp2}$$

$$= \frac{1}{N} \times 2^{-exp1} \times data2 \times 2^{-exp2}$$

$$= \frac{1}{N} \times 2^{-exp1} - exp2 \times data2$$

where:

x0 = Input data to IFFT

X0 = Output data from IFFT

N = number of points

data1 = IFFT output data and FFT input data

data2 = FFT output data

exp1 = IFFT output exponent

exp2 = FFT output exponent

IFFTa = IFFT

FFTa = FFT

Any scaling operation on X0 followed by truncation loses the value of exp1 and does not result in unity gain at x0. Any scaling operation must be done on X0 only when it is the final result. If the intermediate result X0 is first padded with exp1 number of zeros and then truncated or if the data bits of X0 are truncated, the scaling information is lost.

One way to keep unity gain is by passing the exp1 value to the output of the FFT block. The other way is to preserve the full precision of  $data1\times2^-exp1$  and use this value as input to the FFT block. The disadvantage of the second method is a large size requirement for the FFT to accept the input with growing bit width from IFFT operations. The resolution required to accommodate this bit width will, in most cases, exceed the maximum data width supported by the core.

RL\*\* For more information, refer to the *Achieving Unity Gain in Block Floating Point IFFT+FFT Pair* design example under DSP Design Examples at www.altera.com.

### **Related Information**

www.altera.com

For more information

Altera Corporation Block Floating Point Scaling



# **Document Revision History**

5

2014.12.15

**UG-FFT** 





Send Feedback

### FFT IP Core User Guide revision history.

| Date           | Version                  | Changes Made                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
|----------------|--------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 2014.12.15     | 14.1                     | <ul> <li>Added more detail to source_data and sink_data signal descriptions.</li> <li>Added hard-floating point option for Arria 10 devices in the Complex Multiplier Options</li> <li>Reworded DSP Block Resource Optimization description</li> <li>Added block floating point option in parameters table.</li> <li>Reordered parameters in parameters table.</li> <li>Removed the following parameters: <ul> <li>Twiddle ROM Distribution</li> <li>Use M-RAM or M144K blocks</li> <li>Implement appropriate logic functions in RAM</li> <li>Structure</li> <li>Implement Multipliers in</li> <li>Global enable clock signal</li> </ul> </li> <li>Removed Stratix V devices only comment for DSP Resource Optimization parameter.</li> <li>Added final support for Arria 10 and MAX 10 devices</li> </ul> |
| August<br>2014 | 14.0 Arria 10<br>Edition | <ul> <li>Added support for Arria 10 devices.</li> <li>Added new source_data bus description.</li> <li>Added Arria 10 generated files description.</li> <li>Removed table with generated file descriptions.</li> <li>Removed clk_ena</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| June 2014      | 14.0                     | <ul> <li>Removed Cyclone III and Stratix III device support</li> <li>Added support for MAX 10 FPGAs.</li> <li>Added instructions for using IP Catalog</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |

© 2015 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, ENPIRION, MAX, MEGACORE, NIOS, QUARTUS and STRATIX words and logos are trademarks of Altera Corporation and registered in the U.S. Patent and Trademark Office and in other countries. All other words and logos identified as trademarks or service marks are the property of their respective holders as described at <a href="https://www.altera.com/common/legal.html">www.altera.com/common/legal.html</a>. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera's standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services.

ISO 9001:2008 Registered



| Date             | Version | Changes Made                                                                                                                                                                                                                                                                                                   |
|------------------|---------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| November<br>2013 | 13.1    | <ul> <li>Added more information to variable streaming I/O dataflow.</li> <li>Removed device support for following devices:</li> <li>HardCopy II, HardCopy III, HardCopy IV E, HardCopy IV GX</li> <li>Stratix, Stratix GX, Stratix II, Stratix II GX</li> <li>Cyclone, Cyclone II</li> <li>Arria GX</li> </ul> |
| November<br>2012 | 12.1    | Added support for Arria V GZ devices.                                                                                                                                                                                                                                                                          |

Altera Corporation Document Revision History

