Intel Stratix 10 GX/SX Device Overview
1. Intel Stratix 10 GX/SX Device Overview
Featuring several groundbreaking innovations, including the all new Intel® Hyperflex™ core architecture, this device family enables you to meet the demand for ever-increasing bandwidth and processing performance in your most advanced applications, while meeting your power budget.
With an embedded hard processor system (HPS) based on a quad-core 64 bit Arm* Cortex* -A53, the Intel® Stratix® 10 SoC devices deliver power efficient, application-class processing and allow designers to extend hardware virtualization into the FPGA fabric. Intel® Stratix® 10 SoC devices demonstrate Intel's commitment to high-performance SoCs and extend Intel's leadership in programmable devices featuring an Arm* -based processor system.
Important innovations in Intel® Stratix® 10 FPGAs and SoCs include:
- All new Intel® Hyperflex™ core architecture delivering 2X the core performance compared to previous generation high-performance FPGAs
- Intel 14 nm tri-gate (FinFET) technology
- Heterogeneous 3D System-in-Package (SiP) technology
- Core fabric with up to 10.2 million logic elements (LEs)
- Up to 96 full duplex transceiver channels on heterogeneous 3D SiP transceiver tiles
- Transceiver data rates up to 28.3 Gbps chip-to-chip/module and backplane performance
- M20K (20 Kb) internal SRAM memory blocks
- Fractional synthesis and ultra-low jitter LC tank based transmit phase locked loops (PLLs)
- Hard PCI Express® Gen3 x16 intellectual property (IP) blocks
- Hard 10GBASE-KR/40GBASE-KR4 Forward Error Correction (FEC) in every transceiver channel
- Hard memory controllers and PHY supporting DDR4 rates up to 2666 Mbps per pin
- Hard fixed-point and IEEE 754 compliant hard floating-point variable precision digital signal processing (DSP) blocks with up to 10 TFLOP compute performance with a power efficiency of 80 GFLOP per Watt
- Quad-core 64 bit Arm* Cortex* -A53 embedded processor running up to 1.5 GHz in SoC family variants
- Programmable clock tree synthesis for flexible, low power, low skew clock trees
- Dedicated secure device manager (SDM) for:
- Enhanced device configuration and security
- AES-256, SHA-256/384 and ECDSA-256/384 encrypt/decrypt accelerators and authentication
- Multi-factor authentication
- Physically Unclonable Function (PUF) service and software programmable device configuration capability
- Comprehensive set of advanced power saving features delivering up to 70% lower power compared to previous generation high-performance FPGAs
- Non-destructive register state readback and writeback, to support ASIC prototyping and other applications
With these capabilities, Intel® Stratix® 10 FPGAs and SoCs are ideally suited for the most demanding applications in diverse markets such as:
- Compute and Storage—for custom servers, cloud computing and datacenter acceleration
- Networking—for Terabit, 400G and multi-100G bridging, aggregation, packet processing and traffic management
- Optical Transport Networks—for OTU4, 2xOTU4, 4xOTU4
- Broadcast—for high-end studio distribution, head end encoding/decoding, edge quadrature amplitude modulation (QAM)
- Military—for radar, electronic warfare, and secure communications
- Medical—for diagnostic scanners and diagnostic imaging
- Test and Measurement—for protocol and application testers
- Wireless—for next-generation 5G networks
- ASIC Prototyping—for designs that require the largest FPGA fabric with the highest I/O count
1.1. Intel Stratix 10 GX/SX Family Variants
Intel® Stratix® 10 devices are available in FPGA (GX) and SoC (SX) variants.
- Intel® Stratix® 10 GX devices deliver up to 1 GHz core fabric performance and contain up to 10.2 million LEs in the fabric. They also feature up to 96 general purpose transceivers on separate transceiver tiles, and 2666 Mbps DDR4 external memory interface performance. The transceivers are capable of up to 28.3 Gbps short reach and across the backplane. These devices are optimized for FPGA applications that require the highest transceiver bandwidth and core fabric performance, with the power efficiency of Intel’s 14 nm tri-gate process technology.
- Intel® Stratix® 10 SX devices have a feature set that is identical to Intel® Stratix® 10 GX devices, with the addition of an embedded quad-core 64 bit Arm* Cortex* A53 hard processor system.
Common to all Intel® Stratix® 10 family variants is a high-performance fabric based on the new Intel® Hyperflex™ core architecture that includes additional Hyper-Registers throughout the interconnect routing and at the inputs of all functional blocks. The core fabric also contains an enhanced logic array utilizing Intel’s adaptive logic module (ALM) and a rich set of high performance building blocks including:
- M20K (20 Kb) embedded memory blocks
- Variable precision DSP blocks with hard IEEE 754 compliant floating-point units
- Fractional synthesis and integer PLLs
- Hard memory controllers and PHY for external memory interfaces
- General purpose IO cells
To clock these building blocks, Intel® Stratix® 10 devices use programmable clock tree synthesis, which uses dedicated clock tree routing to synthesize only those branches of the clock trees required for the application. All devices support in-system, fine-grained partial reconfiguration of the logic array, allowing logic to be added and subtracted from the system while it is operating.
All family variants also contain high speed serial transceivers, containing both the physical medium attachment (PMA) and the physical coding sublayer (PCS), which can be used to implement a variety of industry standard and proprietary protocols. In addition to the hard PCS, Intel® Stratix® 10 devices contain multiple instantiations of PCI Express* hard IP that supports Gen1/Gen2/Gen3 rates in x1/x2/x4/x8/x16 lane configurations, and hard 10GBASE-KR/40GBASE-KR4 FEC for every transceiver. The hard PCS, FEC, and PCI Express IP free up valuable core logic resources, save power, and increase your productivity.
1.1.1. Available Options
1.2. Innovations in Intel Stratix 10 FPGAs and SoCs
Intel® Stratix® 10 FPGAs and SoCs deliver many significant improvements over the previous generation high-performance Stratix® V FPGAs.
Feature | Stratix® V FPGAs | Intel® Stratix® 10 FPGAs and SoCs |
---|---|---|
Process technology | 28 nm TSMC (planar transistor) | 14 nm Intel tri-gate (FinFET) |
Hard processor core | None | Quad-core 64 bit Arm* Cortex* -A53 (SoC only) |
Core architecture | Conventional core architecture with conventional interconnect | Intel® Hyperflex™ core architecture with Hyper-Registers in the interconnect |
Core performance | 500 MHz | 1 GHz |
Power dissipation | 1x | As low as 0.3x |
Logic density | 952 KLE | 10,200 KLE |
Embedded memory (M20K) | 52 Mbits | 253 Mbits |
18x19 multipliers | 3,926 Note: Multiplier is 18x18 in
Stratix® V
devices.
|
11,520 Note: Multiplier is 18x19 in
Intel®
Stratix® 10 devices.
|
Floating point DSP capability | Up to 1 TFLOP, requires soft floating point adder and multiplier | Up to 10 TFLOP, hard IEEE 754 compliant single precision floating point adder and multiplier |
Maximum transceivers | 66 | 96 |
Maximum transceiver data rate (chip-to-chip) | 28.05 Gbps |
26.6 Gbps L-Tile 28.3 Gbps H-Tile |
Maximum transceiver data rate (backplane) | 12.5 Gbps |
12.5 Gbps L-Tile 28.3 Gbps H-Tile |
Hard memory controller | None |
DDR4 @ 1333 MHz/2666 Mbps DDR3 @ 1067 MHz/2133 Mbps |
Hard protocol IP | PCIe* Gen3 x8 (up to 4 instances) |
PCIe* Gen3 x16 (up to 4 instances) SR-IOV (4 physical functions / 2k virtual functions) on H-Tile devices 10GBASE-KR/40GBASE-KR4 FEC |
Core clocking and PLLs | Global, quadrant and regional clocks supported by fractional-synthesis fPLLs | Programmable clock tree synthesis supported by fractional synthesis fPLLs and integer IO PLLs |
Register state readback and writeback | Not available | Non-destructive register state readback and writeback for ASIC prototyping and other applications |
These innovations result in the following improvements:
- Improved Core Logic Performance: The Intel® Hyperflex™ core architecture combined with 14 nm Intel tri-gate technology allows Intel® Stratix® 10 devices to achieve 2X the core performance compared to the previous generation
- Lower Power: Intel® Stratix® 10 devices use up to 70% lower power compared to the previous generation, enabled by 14 nm Intel tri-gate technology, the Intel® Hyperflex™ core architecture, and optional power saving features built into the architecture
- Higher Density: Intel® Stratix® 10 devices offer three times the level of integration, with up to 10.2 million logic elements (LEs), over 253 Mbits of embedded memory blocks (M20K), and 11,520 18x19 multipliers
- Embedded Processing: Intel® Stratix® 10 SoCs feature a Quad-Core 64 bit Arm* Cortex* -A53 processor optimized for power efficiency and software compatible with previous generation Arria® and Cyclone® SoC devices
- Improved Transceiver Performance: With up to 96 transceiver channels implemented in heterogeneous 3D SiP transceiver tiles, Intel® Stratix® 10 GX and SX devices support data rates up to 28.3 Gbps chip-to-chip and 28.3 Gbps across the backplane with signal conditioning circuits capable of equalizing over 30 dB of system loss
- Improved DSP Performance: The variable precision DSP block in Intel® Stratix® 10 devices features hard fixed and floating point capability, with up to 10 TFLOP IEEE754 single-precision floating point performance
- Additional Hard IP: Intel® Stratix® 10 devices include many more hard IP blocks than previous generation devices, with a hard memory controller included in each bank of 48 general purpose IOs, a hard PCIe* Gen3 x16 full protocol stack in each transceiver tile, and a hard 10GBASE-KR/40GBASE-KR4 FEC in every transceiver channel
- Enhanced Core Clocking: Intel® Stratix® 10 devices feature programmable clock tree synthesis; clock trees are only synthesized where needed, increasing the flexibility and reducing the power dissipation of the clocking solution
- Additional Core PLLs: The core fabric in Intel® Stratix® 10 devices is supported by both integer IO PLLs and fractional synthesis fPLLs, resulting in a greater total number of PLLs available than the previous generation
1.3. FPGA and SoC Features Summary
Feature |
Description |
---|---|
Technology |
|
Low power serial transceivers |
|
General purpose I/Os |
|
Embedded hard IP |
|
Transceiver hard IP |
|
Power management |
|
High performance core fabric |
|
Internal memory blocks |
|
Variable precision DSP blocks |
|
Phase locked loops (PLL) |
|
Core clock networks |
|
Configuration |
|
Packaging |
|
Software and tools |
|
SoC Subsystem | Feature | Description |
---|---|---|
Hard Processor System | Multi-processor unit (MPU) core |
|
System Controllers |
|
|
Layer 1 Cache |
|
|
Layer 2 Cache |
|
|
On-Chip Memory |
|
|
Direct memory access (DMA) controller |
|
|
Ethernet media access controller (EMAC) |
|
|
USB On-The-Go controller (OTG) |
|
|
UART controller |
|
|
Serial Peripheral Interface (SPI) controller |
|
|
I2C controller |
|
|
SD/SDIO/MMC controller |
|
|
NAND flash controller |
|
|
General-purpose I/O (GPIO) |
|
|
Timers |
|
|
Secure Device Manager | Security |
|
External Memory Interface | External Memory Interface |
|
1.4. Intel Stratix 10 Block Diagram
1.5. Intel Stratix 10 FPGA and SoC Family Plan
Intel® Stratix® 10 GX/SX Device Name |
Logic Elements (KLE) |
M20K Blocks |
M20K Mbits |
MLAB Counts |
MLAB Mbits |
18x19 Multi- pliers 2 |
---|---|---|---|---|---|---|
GX 400/ SX 400 |
378 | 1,537 | 30 | 3,276 | 2 | 1,296 |
GX 650/ SX 650 |
612 | 2,489 | 49 | 5,364 | 3 | 2,304 |
GX 850/ SX 850 |
841 | 3,477 | 68 | 7,124 | 4 | 4,032 |
GX 1100/ SX 1100 |
1,325 | 5,461 | 107 | 11,556 | 7 | 5,184 |
GX 1650/ SX 1650 |
1,624 | 5,851 | 114 | 13,764 | 8 | 6,290 |
GX 2100/ SX 2100 |
2,005 | 6,501 | 127 | 17,316 | 11 | 7,488 |
GX 2500/ SX 2500 |
2,422 | 9,963 | 195 | 20,529 | 13 | 10,022 |
GX 2800/ SX 2800 |
2,753 | 11,721 | 229 | 23,796 | 15 | 11,520 |
GX 1660 |
1,679 | 6,162 | 120 | 14,230 | 9 | 6,652 |
GX 2110 |
2,073 | 6,847 | 134 | 17,856 | 11 | 7,920 |
GX 10M | 10,200 | 12,950 | 253 | 87,984 | 55 | 6,912 |
Intel® Stratix® 10 GX/SX Device Name |
Interconnects | PLLs | Hard IP | ||
---|---|---|---|---|---|
Maximum GPIOs | Maximum XCVR | fPLLs | I/O PLLs | PCIe Hard IP Blocks | |
GX 400/ SX 400 |
374 | 24 | 8 | 8 | 1 |
GX 650/ SX 650 |
392 | 24 | 8 | 8 | 1 |
GX 850/ SX 850 |
688 | 48 | 16 | 16 | 2 |
GX 1100/ SX 1100 |
688 | 48 | 16 | 16 | 2 |
GX 1650/ SX 1650 |
704 | 96 | 32 | 24 | 4 |
GX 2100/ SX 2100 |
704 | 96 | 32 | 24 | 4 |
GX 2500/ SX 2500 |
1,160 | 96 | 32 | 24 | 4 |
GX 2800/ SX 2800 |
1,160 | 96 | 32 | 24 | 4 |
GX 1660 |
688 | 48 | 16 | 16 | 2 |
GX 2110 |
688 | 48 | 16 | 16 | 2 |
GX 10M | 2,304 | 48 | 24 | 48 | 4 |
Intel® Stratix® 10 GX/SX Device Name |
F1152 HF35 (35x35 mm2) |
F1760 NF43 (42.5x42.5 mm2) |
F2397 UF50 (50x50 mm2) |
F2912 HF55 (55x55 mm2) |
F4938 NF74 (70x74 mm2 |
---|---|---|---|---|---|
GX 400/ SX 400 |
374, 56, 120, 249 | - | - | - | - |
GX 650/ SX 650 |
392, 8, 192, 24 | - | - | - | - |
GX 850/ SX 850 |
- | 688, 16, 336, 48 | - | - | - |
GX 1100/ SX 1100 |
- | 688, 16, 336, 48 | - | - | - |
GX 1650/ SX 1650 |
- | 688, 16, 336, 48 | 704, 32, 336, 96 | - | - |
GX 2100/ SX 2100 |
- | 688, 16, 336, 48 | 704, 32, 336, 96 | - | - |
GX 2500/ SX 2500 |
- | 688, 16, 336, 48 | 704, 32, 336, 96 | 1160, 8, 576, 24 | - |
GX 2800/ SX 2800 |
- | 688, 16, 336, 48 | 704, 32, 336, 96 | 1160, 8, 576, 24 | - |
GX 1660 |
- | 688, 16, 336, 48 | - | - | - |
GX 2110 |
- | 688, 16, 336, 48 | - | - | - |
GX 10M | - | - | - | - | 2304, 32, 1152, 48 |
1.6. Intel Hyperflex Core Architecture
Intel® Stratix® 10 FPGAs and SoCs are based on a core fabric featuring the new Intel® Hyperflex™ core architecture. The Intel® Hyperflex™ core architecture delivers 2X the clock frequency performance and up to 70% lower power compared to previous generation high-end FPGAs. Along with this performance breakthrough, the Intel® Hyperflex™ core architecture delivers a number of advantages including:
- Higher Throughput—Capitalizes on 2X core clock frequency performance to obtain throughput breakthroughs
- Improved Power Efficiency—Uses reduced IP size, enabled by Intel® Hyperflex™ , to consolidate designs which previously spanned multiple devices into a single device, thereby reducing power by up to 70% versus previous generation devices
- Greater Design Functionality—Uses faster clock frequency to reduce bus widths and reduce IP size, freeing up additional FPGA resources to add greater functionality
- Increased Designer Productivity—Boosts performance with less routing congestion and fewer design iterations using Hyper-Aware design tools, obtaining greater timing margin for more rapid timing closure
In addition to the traditional user registers found in the Adaptive Logic Modules (ALM), the Intel® Hyperflex™ core architecture introduces additional bypassable registers everywhere throughout the fabric of the FPGA. These additional registers, called Hyper-Registers are available on every interconnect routing segment and at the inputs of all functional blocks.
The Hyper-Registers enable the following key design techniques to achieve the 2X core performance increases:
- Fine grain Hyper-Retiming to eliminate critical paths
- Zero latency Hyper-Pipelining to eliminate routing delays
- Flexible Hyper-Optimization for best-in-class performance
By implementing these techniques in your design, the Hyper-Aware design tools automatically make use of the Hyper-Registers to achieve maximum core clock frequency.
1.7. Heterogeneous 3D SiP Transceiver Tiles
Intel® Stratix® 10 FPGAs and SoCs feature power efficient, high bandwidth, low latency transceivers. The transceivers are implemented on heterogeneous 3D System-in-Package (SiP) transceiver tiles, each containing 24 full-duplex transceiver channels. In addition to providing a high-performance transceiver solution to meet current connectivity needs, this allows for future flexibility and scalability as data rates, modulation schemes, and protocol IPs evolve.
Each transceiver tile contains:
- 24 full-duplex transceiver channels (PMA and PCS)10
- Reference clock distribution network
- Transmit PLLs
- High-speed clocking and bonding networks
- One instance of PCI Express hard IP
1.8. Intel Stratix 10 Transceivers
Intel® Stratix® 10 devices offer up to 96 total full-duplex transceiver channels. These channels provide continuous data rates from 1 Gbps to 28.3 Gbps for chip-to-chip, chip-to-module, and backplane applications. In each device,two thirds of the transceivers can be configured up to the maximum data rate of 28.3 Gbps to drive 100G interfaces and C form-factor pluggable CFP2/CFP4 optical modules. For longer-reach backplane driving applications, advanced adaptive equalization circuits are used to equalize over 30 dB of system loss.
All transceiver channels feature a dedicated Physical Medium Attachment (PMA) and a hardened Physical Coding Sublayer (PCS).
- The PMA provides primary interfacing capabilities to physical channels.
- The PCS typically handles encoding/decoding, word alignment, and other pre-processing functions before transferring data to the FPGA core fabric.
Within each transceiver tile, the transceivers are arranged in four banks of six PMA-PCS groups. A wide variety of bonded and non-bonded data rate configurations are possible within each bank, and within each tile, using a highly configurable clock distribution network.
1.8.1. PMA Features
Intel® Stratix® 10 device features provide exceptional signal integrity at data rates up to 28.3 Gbps. Clocking options include ultra-low jitter LC tank-based (ATX) PLLs with optional fractional synthesis capability, channel PLLs operating as clock multiplier units (CMUs), and fractional synthesis PLLs (fPLLs).
- ATX PLL—can be configured in integer mode, or optionally, in a new fractional synthesis mode. Each ATX PLL spans the full frequency range of the supported data rate range providing a stable, flexible clock source with the lowest jitter.
- CMU PLL—when not being used as a transceiver, select PMA channels can be configured as channel PLLs operating as CMUs to provide an additional master clock source within the transceiver bank.
- fPLL—In addition, dedicated fPLLs are available with precision frequency synthesis capabilities. fPLLs can be used to synthesize multiple clock frequencies from a single reference clock source and replace multiple reference oscillators for multi-protocol and multi-rate applications.
On the receiver side, each PMA has an independent channel PLL that allows analog tracking for clock-data recovery. Each PMA also has advanced equalization circuits that compensate for transmission losses across a wide frequency spectrum.
- Variable Gain Amplifier (VGA)—to optimize the receiver's dynamic range
- Continuous Time Linear Equalizer (CTLE)—to compensate for channel losses with lowest power dissipation
- Decision Feedback Equalizer (DFE)—to provide additional equalization capability on backplanes even in the presence of crosstalk and reflections
- On-Die Instrumentation (ODI)—to provide on-chip eye monitoring capabilities (Eye Viewer). This capability helps to optimize link equalization parameters during board bring-up and supports in-system link diagnostics and equalization margin testing
All link equalization parameters feature automatic adaptation using the new Advanced Digital Adaptive Parametric Tuning (ADAPT) circuit. This circuit is used to dynamically set DFE tap weights, adjust CTLE parameters, and optimize VGA gain and threshold voltage. Finally, optimal and consistent signal integrity is ensured by using the new hardened Precision Signal Integrity Calibration Engine (PreSICE) to automatically calibrate all transceiver circuit blocks on power-up. This gives the most link margin and ensures robust, reliable, and error-free operation.
Feature |
Capability |
---|---|
Chip-to-Chip Data Rates |
1 Gbps 11 to 28.3 Gbps ( Intel® Stratix® 10 GX/SX devices) |
Backplane Support |
Drive backplanes at data rates up to 28.3 Gbps, including 10GBASE-KR compliance |
Optical Module Support |
SFP+/SFP, XFP, CXP, QSFP/QSFP28, QSFPDD, CFP/CFP2/CFP4 |
Cable Driving Support |
SFP+ Direct Attach, PCI Express over cable, eSATA |
Transmit Pre-Emphasis |
5-tap transmit pre-emphasis and de-emphasis to compensate for system channel loss |
Continuous Time Linear Equalizer (CTLE) |
Dual mode, high-gain, and high-data rate, linear receive equalization to compensate for system channel loss |
Decision Feedback Equalizer (DFE) |
15 fixed tap DFE to equalize backplane channel loss in the presence of crosstalk and noisy environments |
Advanced Digital Adaptive Parametric Tuning (ADAPT) |
Fully digital adaptation engine to automatically adjust all link equalization parameters—including CTLE, DFE, and VGA blocks—that provide optimal link margin without intervention from user logic |
Precision Signal Integrity Calibration Engine (PreSICE) |
Hardened calibration controller to quickly calibrate all transceiver control parameters on power-up, which provides the optimal signal integrity and jitter performance |
ATX Transmit PLLs |
Low jitter ATX (inductor-capacitor) transmit PLLs with continuous tuning range to cover a wide range of standard and proprietary protocols, with optional fractional frequency synthesis capability |
Fractional PLLs |
On-chip fractional frequency synthesizers to replace on-board crystal oscillators and reduce system cost |
Digitally Assisted Analog CDR |
Superior jitter tolerance with fast lock time |
On-Die Instrumentation— Eye Viewer and Jitter Margin Tool |
Simplify board bring-up, debug, and diagnostics with non-intrusive, high-resolution eye monitoring (Eye Viewer). Also inject jitter from transmitter to test link margin in system. |
Dynamic Reconfiguration |
Allows for independent control of each transceiver channel Avalon memory-mapped interface for the most transceiver flexibility. |
Multiple PCS-PMA and PCS-Core to FPGA fabric interface widths |
8, 10, 16, 20, 32, 40, or 64 bit interface widths for flexibility of deserialization width, encoding, and reduced latency |
1.8.2. PCS Features
Intel® Stratix® 10 PMA channels interface with core logic through configurable and bypassable PCS interface layers.
The PCS contains multiple gearbox implementations to decouple the PMA and PCS interface widths. This feature provides the flexibility to implement a wide range of applications with 8, 10, 16, 20, 32, 40, or 64 bit interface width between each transceiver and the core logic.
The PCS also contains hard IP to support a variety of standard and proprietary protocols across a wide range of data rates and encoding schemes. The Standard PCS mode provides support for 8B/10B encoded applications up to 12.5 Gbps. The Enhanced PCS mode supports 64B/66B and 64B/67B encoded applications up to 17.4 Gbps. The enhanced PCS mode also includes an integrated 10GBASE-KR/40GBASE-KR4 Forward Error Correction (FEC) circuit. For highly customized implementations, a PCS Direct mode provides an interface up to 64 bits wide to allow for custom encoding and support for data rates up to 28.3 Gbps.
For more information about the PCS-Core interface or the double rate transfer mode, refer to the Intel® Stratix® 10 L- and H-Tile Transceiver PHY User Guide, and the Intel® Stratix® 10 E-Tile Transceiver PHY User Guide.
PCS Protocol Support |
Data Rate (Gbps) |
Transmitter Data Path |
Receiver Data Path |
---|---|---|---|
Standard PCS |
1 to 12.5 |
Phase compensation FIFO, byte serializer, 8B/10B encoder, bit-slipper, channel bonding |
Rate match FIFO, word-aligner, 8B/10B decoder, byte deserializer, byte ordering |
PCI Express Gen1/Gen2 x1, x2, x4, x8, x16 |
2.5 and 5.0 |
Same as Standard PCS plus PIPE 2.0 interface to core |
Same as Standard PCS plus PIPE 2.0 interface to core |
PCI Express Gen3 x1, x2, x4, x8, x16 |
8.0 |
Phase compensation FIFO, byte serializer, encoder, scrambler, bit-slipper, gear box, channel bonding, and PIPE 3.0 interface to core, auto speed negotiation |
Rate match FIFO (0-600 ppm mode), word-aligner, decoder, descrambler, phase compensation FIFO, block sync, byte deserializer, byte ordering, PIPE 3.0 interface to core, auto speed negotiation |
CPRI |
0.6144 to 9.8 |
Same as Standard PCS plus deterministic latency serialization |
Same as Standard PCS plus deterministic latency deserialization |
Enhanced PCS |
2.5 to 17.4 |
FIFO, channel bonding, bit-slipper, and gear box |
FIFO, block sync, bit-slipper, and gear box |
10GBASE-R |
10.3125 |
FIFO, 64B/66B encoder, scrambler, FEC, and gear box |
FIFO, 64B/66B decoder, descrambler, block sync, FEC, and gear box |
Interlaken |
4.9 to 17.4 |
FIFO, channel bonding, frame generator, CRC-32 generator, scrambler, disparity generator, bit-slipper, and gear box |
FIFO, CRC-32 checker, frame sync, descrambler, disparity checker, block sync, and gear box |
SFI-S/SFI-5.2 |
11.3 |
FIFO, channel bonding, bit-slipper, and gear box |
FIFO, bit-slipper, and gear box |
IEEE 1588 |
1.25 to 10.3125 |
FIFO (fixed latency), 64B/66B encoder, scrambler, and gear box |
FIFO (fixed latency), 64B/66B decoder, descrambler, block sync, and gear box |
SDI |
up to 12.5 |
FIFO and gear box |
FIFO, bit-slipper, and gear box |
GigE |
1.25 |
Same as Standard PCS plus GigE state machine |
Same as Standard PCS plus GigE state machine |
PCS Direct | up to 28.3 | Custom | Custom |
1.9. PCI Express Gen1/Gen2/Gen3 Hard IP
Intel® Stratix® 10 devices contain embedded PCI Express hard IP designed for performance, ease-of-use, increased functionality, and designer productivity.
The PCI Express hard IP consists of the PHY, Data Link, and Transaction layers. It also supports PCI Express Gen1/Gen2/Gen3 end point and root port, in x1/x2/x4/x8/x16 lane configurations. The PCI Express hard IP is capable of operating independently from the core logic (autonomous mode). This feature allows the PCI Express link to power up and complete link training in less than 100 ms, while the rest of the device is still in the process of being configured. The hard IP also provides added functionality, which makes it easier to support emerging features such as Single Root I/O Virtualization (SR-IOV) and optional protocol extensions.
The PCI Express hard IP has improved end-to-end data path protection using Error Checking and Correction (ECC). In addition, the hard IP supports configuration of the device via protocol (CvP) across the PCI Express bus at Gen1/Gen2/Gen3 rates.
1.10. Interlaken PCS Hard IP
Intel® Stratix® 10 devices have integrated Interlaken PCS hard IP supporting rates up to 17.4 Gbps per lane.
The Interlaken PCS hard IP is based on the proven functionality of the PCS developed for Intel’s previous generation FPGAs, which has demonstrated interoperability with Interlaken ASSP vendors and third-party IP suppliers. The Interlaken PCS hard IP is present in every transceiver channel in Intel® Stratix® 10 devices.
1.11. 10G Ethernet Hard IP
Intel® Stratix® 10 devices include IEEE 802.3 10-Gbps Ethernet (10GbE) compliant 10GBASE-R PCS and PMA hard IP. The scalable 10GbE hard IP supports multiple independent 10GbE ports while using a single PLL for all the 10GBASE-R PCS instantiations, which saves on core logic resources and clock networks.
The integrated serial transceivers simplify multi-port 10GbE systems compared to 10 GbE Attachment Unit Interface (XAUI) interfaces that require an external XAUI-to-10G PHY. Furthermore, the integrated transceivers incorporate signal conditioning circuits, which enable direct connection to standard 10G XFP and SFP+ pluggable optical modules. The transceivers also support backplane Ethernet applications and include a hard 10GBASE-KR/40GBASE-KR4 Forward Error Correction (FEC) circuit that can be used for both 10G and 40G applications. The integrated 10G Ethernet hard IP and 10G transceivers save external PHY cost, board space and system power. The 10G Ethernet PCS hard IP and 10GBASE-KR FEC are present in every transceiver channel.
1.12. External Memory and General Purpose I/O
Intel® Stratix® 10 devices offer substantial external memory bandwidth, with up to ten 72 bit wide DDR4 memory interfaces running at up to 2666 Mbps. For external memory interface and LVDS restrictions, see AN 906: Intel Stratix 10 GX 400, SX 400, and TX 400 Routing and Designing Floorplan Guidelines .
This bandwidth is provided along with the ease of design, lower power, and resource efficiencies of hardened high-performance memory controllers. The external memory interfaces can be configured up to a maximum width of 144 bits when using either hard or soft memory controllers.
Each I/O bank contains 48 general purpose I/Os and a high-efficiency hard memory controller capable of supporting many different memory types, each with different performance capabilities. The hard memory controller is also capable of being bypassed and replaced by a soft controller implemented in user logic. The I/Os each have a hardened double data rate (DDR) read/write path (PHY) capable of performing key memory interface functionality such as:
- Read/write leveling
- FIFO buffering to lower latency and improve margin
- Timing calibration
- On-chip termination
The timing calibration is aided by the inclusion of hard microcontrollers based on Intel’s Nios® II technology, specifically tailored to control the calibration of multiple memory interfaces. This calibration allows the Intel® Stratix® 10 device to compensate for any changes in process, voltage, or temperature either within the Intel® Stratix® 10 device itself, or within the external memory device. The advanced calibration algorithms ensure maximum bandwidth and robust timing margin across all operating conditions.
Interface |
Controller Type |
Performance (maximum rate possible) |
---|---|---|
DDR4 |
Hard |
2666 Mbps |
DDR3 |
Hard |
2133 Mbps |
QDRII+ |
Soft |
1,100 Mtps |
QDRII+ Xtreme |
Soft |
1,266 Mtps |
QDRIV |
Soft |
2,133 Mtps |
RLDRAM III |
Soft |
2400 Mbps |
RLDRAM II |
Soft |
533 Mbps |
In addition to parallel memory interfaces, Intel® Stratix® 10 devices support serial memory technologies such as the Hybrid Memory Cube (HMC). The HMC is supported by the Intel® Stratix® 10 high-speed serial transceivers, which connect up to four HMC links, with each link running at data rates of 15 Gbps (HMC short reach specification).
Intel® Stratix® 10 devices also feature general purpose I/Os capable of supporting a wide range of single-ended and differential I/O interfaces. LVDS rates up to 1.6 Gbps are supported, with each pair of pins having both a differential driver and a differential input buffer. This enables configurable direction for each LVDS pair.
1.13. Adaptive Logic Module (ALM)
Intel® Stratix® 10 devices use a similar adaptive logic module (ALM) as the previous generation Intel® Arria® 10 and Stratix® V FPGAs, allowing for efficient implementation of logic functions and easy conversion of IP between the devices.
The ALM block diagram shown in the following figure has eight inputs with a fracturable look-up table (LUT), two dedicated embedded adders, and four dedicated registers.
Key features and capabilities of the ALM include:
- High register count with 4 registers per 8-input fracturable LUT, operating in conjunction with the new Intel® Hyperflex™ architecture, enables Intel® Stratix® 10 devices to maximize core performance at very high core logic utilization
- Implements select 7-input logic functions, all 6-input logic functions, and two independent functions consisting of smaller LUT sizes (such as two independent 4-input LUTs) to optimize core logic utilization
The Intel® Quartus® Prime software takes advantage of the ALM logic structure to deliver the highest performance, optimal logic utilization, and lowest compile times. The Intel® Quartus® Prime software simplifies design reuse as it automatically maps legacy designs into the Intel® Stratix® 10 ALM architecture.
1.14. Core Clocking
Core clocking in Intel® Stratix® 10 devices makes use of programmable clock tree synthesis.
This technique uses dedicated clock tree routing and switching circuits, and allows the Intel® Quartus® Prime software to create the exact clock trees required for your design. Clock tree synthesis minimizes clock tree insertion delay, reduces dynamic power dissipation in the clock tree and allows greater clocking flexibility in the core while still maintaining backwards compatibility with legacy global and regional clocking schemes.
The core clock network in Intel® Stratix® 10 devices supports the new Intel® Hyperflex™ core architecture at clock rates up to 1 GHz. It also supports the hard memory controllers up to 2666 Mbps with a quarter rate transfer to the core. The core clock network is supported by dedicated clock input pins, fractional clock synthesis PLLs, and integer I/O PLLs.
1.15. Fractional Synthesis PLLs and I/O PLLs
Intel® Stratix® 10 devices have up to 32 fractional synthesis PLLs (fPLL) available for use with transceivers or in the core fabric.
The fPLLs are located in the 3D SiP transceiver L-tiles and H-tiles, eight per tile, adjacent to the transceiver channels. The fPLLs can be used to reduce both the number of oscillators required on the board and the number of clock pins required, by synthesizing multiple clock frequencies from a single reference clock source. In addition to synthesizing reference clock frequencies for the transceiver transmit PLLs, the fPLLs can also be used directly for transmit clocking. Each fPLL can be independently configured for conventional integer mode, or enhanced fractional synthesis mode with third-order delta-sigma modulation.
In addition to the fPLLs, Intel® Stratix® 10 devices contain up to 24 integer I/O PLLs (IOPLLs) available for general purpose use in the core fabric and for simplifying the design of external memory interfaces and high-speed LVDS interfaces. The IOPLLs are located in each bank of 48 general purpose I/O, 1 per I/O bank, adjacent to the hard memory controllers and LVDS SerDes in each I/O bank. This makes it easier to close timing because the IOPLLs are tightly coupled with the I/Os that need to use them. The IOPLLs can be used for general purpose applications in the core such as clock network delay compensation and zero-delay clock buffering.
1.16. Internal Embedded Memory
Intel® Stratix® 10 devices contain two types of embedded memory blocks: M20K (20 Kb) and MLAB (640 bit).
The M20K and MLAB blocks are familiar block sizes carried over from previous Intel device families. The MLAB blocks are ideal for wide and shallow memories, while the M20K blocks are intended to support larger memory configurations and include hard ECC. Both M20K and MLAB embedded memory blocks can be configured as a single-port or dual-port RAM, FIFO, ROM, or shift register. These memory blocks are highly flexible and support a number of memory configurations as shown in Table 10.
MLAB (640 bits) |
M20K (20 Kb) |
---|---|
64 x 10 (supported through emulation) 32 x 20 |
2K x 10 (or x8) 1K x 20 (or x16) 512 x 40 (or x32) |
1.17. Variable Precision DSP Block
The Intel® Stratix® 10 DSP blocks are based upon the Variable Precision DSP Architecture used in Intel’s previous generation devices. They feature hard fixed point and IEEE 754 compliant floating point capability.
The DSP blocks can be configured to support signal processing with precision ranging from 18x19 up to 54x54. A pipeline register has been added to increase the maximum operating frequency of the DSP block and reduce power consumption.
Each DSP block can be independently configured at compile time as either dual 18x19 or a single 27x27 multiply accumulate. With a dedicated 64 bit cascade bus, multiple variable precision DSP blocks can be cascaded to implement even higher precision DSP functions efficiently.
In floating point mode, each DSP block provides one single precision floating point multiplier and adder. Floating point additions, multiplications, mult-adds and mult-accumulates are supported.
The following table shows how different precisions are accommodated within a DSP block, or by utilizing multiple blocks.
Multiplier Size |
DSP Block Resources |
Expected Usage |
---|---|---|
18x19 bits |
1/2 of Variable Precision DSP Block |
Medium precision fixed point |
27x27 bits |
1 Variable Precision DSP Block |
High precision fixed point |
19x36 bits |
1 Variable Precision DSP Block with external adder |
Fixed point FFTs |
36x36 bits |
2 Variable Precision DSP Blocks with external adder |
Very high precision fixed point |
54x54 bits |
4 Variable Precision DSP Blocks with external adder |
Double Precision floating point |
Single Precision floating point | 1 Single Precision floating point adder, 1 Single Precision floating point multiplier | Floating point |
Complex multiplication is very common in DSP algorithms. One of the most popular applications of complex multipliers is the FFT algorithm. This algorithm has the characteristic of increasing precision requirements on only one side of the multiplier. The Variable Precision DSP block supports the FFT algorithm with proportional increase in DSP resources as the precision grows.
Complex Multiplier Size |
DSP Block Resources |
FFT Usage |
---|---|---|
18x19 bits |
2 Variable Precision DSP Blocks |
Resource optimized FFT |
27x27 bits |
4 Variable Precision DSP Blocks |
Highest precision FFT |
For FFT applications with high dynamic range requirements, the Intel FFT IP Core offers an option of single precision floating point implementation with resource usage and performance similar to high precision fixed point implementations.
Other features of the DSP block include:
- Hard 18 bit and 25 bit pre-adders
- Hard floating point multipliers and adders
- 64 bit dual accumulator (for separate I, Q product accumulations)
- Cascaded output adder chains for 18 and 27 bit FIR filters
- Embedded coefficient registers for 18 and 27 bit coefficients
- Fully independent multiplier outputs
- Inferability using HDL templates supplied by the Intel® Quartus® Prime software for most modes
The Variable Precision DSP block is ideal to support the growing trend towards higher bit precision in high performance DSP applications. At the same time, it can efficiently support the many existing 18 bit DSP applications, such as high definition video processing and remote radio heads. With the Variable Precision DSP block architecture and hard floating point multipliers and adders, Intel® Stratix® 10 devices can efficiently support many different precision levels up to and including floating point implementations. This flexibility can result in increased system performance, reduced power consumption, and reduce architecture constraints on system algorithm designers.
1.18. Hard Processor System (HPS)
The Intel® Stratix® 10 SoC Hard Processor System (HPS) is Intel’s third generation HPS. Leveraging the performance of Intel 14 nm tri-gate technology, Intel® Stratix® 10 SoC devices more than double the performance of previous generation SoCs with an integrated quad-core 64-bit Arm* Cortex* -A53. The HPS also enables system-wide hardware virtualization capabilities by adding a system memory management unit. These architecture improvements ensure that Intel® Stratix® 10 SoCs meet the requirements of current and future embedded markets, including wireless and wireline communications, datacenter acceleration, and numerous military applications.
1.18.1. Key Features of the Intel Stratix 10 HPS
Feature |
Description |
---|---|
Quad-core Arm* Cortex* -A53 MPCore processor unit |
|
System Memory Management Unit |
|
Cache Coherency unit |
|
Cache |
|
On-Chip Memory |
|
External SDRAM and Flash Memory Interfaces for HPS |
|
Communication Interface Controllers |
|
Timers and I/O |
|
Interconnect to Logic Core |
|
1.19. Power Management
Intel® Stratix® 10 devices use the advanced Intel 14 nm tri-gate process technology, the all new Intel® Hyperflex™ core architecture to enable Hyper-Folding, power gating, and several optional power reduction techniques to reduce total power consumption by as much as 70% compared to previous generation high-performance Stratix® V devices.
Intel® Stratix® 10 standard power devices (-V) are SmartVID devices. The core voltage supplies (VCC and VCCP) for each SmartVID device must be driven by a PMBus voltage regulator dedicated to that Intel® Stratix® 10 device. Use of a PMBus voltage regulator for each SmartVID (-V) device is mandatory; it is not an option. A code is programmed into each SmartVID device during manufacturing that allows the PMBus voltage regulator to operate at the optimum core voltage to meet the device performance specifications.
With the new Intel® Hyperflex™ core architecture, designs can run 2X faster than previous generation FPGAs. With 2X performance and same required throughput, architects can cut the data path width in half to save power. This optimization is called Hyper-Folding. Additionally, power gating reduces static power of unused resources in the FPGA by powering them down. The Intel® Quartus® Prime software automatically powers down specific unused resource blocks such as DSP and M20K blocks, at configuration time.
The optional power reduction techniques in Intel® Stratix® 10 devices include:
- Available Low Static Power Devices— Intel® Stratix® 10 devices are available with a fixed core voltage that provides lower static power than the SmartVID standard power devices, while maintaining device performance
Furthermore, Intel® Stratix® 10 devices feature Intel’s low power transceivers and include a number of hard IP blocks that not only reduce logic resources but also deliver substantial power savings compared to soft implementations. In general, hard IP blocks consume up to 50% less power than the equivalent soft logic implementations.
1.20. Device Configuration and Secure Device Manager (SDM)
All Intel® Stratix® 10 devices contain a Secure Device Manager (SDM), which is a dedicated triple-redundant processor that serves as the point of entry into the device for all JTAG and configuration commands. The SDM also bootstraps the HPS in SoC devices ensuring that the HPS can boot using the same security features that the FPGA devices have.
During configuration, Intel® Stratix® 10 devices are divided into logical sectors, each of which is managed by a local sector manager (LSM). The SDM passes configuration data to each of the LSMs across the on-chip configuration network. This allows the sectors to be configured independently, one at a time, or in parallel. This approach achieves simplified sector configuration and reconfiguration, as well as reduced overall configuration time due to the inherent parallelism. The same sector-based approach is used to respond to single-event upsets and security attacks.
While the sectors provide a logical separation for device configuration and reconfiguration, they overlay the normal rows and columns of FPGA logic and routing. This means there is no impact to the Intel® Quartus® Prime software place and route, and no impact to the timing of logic signals that cross the sector boundaries.
The SDM enables robust, secure, fully-authenticated device configuration. It also allows for customization of the configuration scheme, which can enhance device security. For configuration and reconfiguration, this approach offers a variety of advantages:
- Dedicated secure configuration manager
- Reduced device configuration time, because sectors are configured in parallel
- Updateable configuration process
- Reconfiguration of one or more sectors independent of all other sectors
- Zeroization of individual sectors or the complete device
The SDM also provides additional capabilities such as register state readback and writeback to support ASIC prototyping and other applications.
1.21. Device Security
Building on top of the robust security features present in the previous generation devices, Intel® Stratix® 10 FPGAs and SoCs include a number of new and innovative security enhancements. These features are also managed by the SDM, tightly coupling device configuration and reconfiguration with encryption, authentication, key storage and anti-tamper services.
Security services provided by the SDM include:
- Bitstream encryption
- Multi-factor authentication
- Hard encryption and authentication acceleration; AES-256, SHA-256/384, ECDSA-256/384
- Volatile and non-volatile encryption key storage and management
- Boot code authentication for the HPS
- Physically Unclonable Function (PUF) service
- Updateable configuration process
- Secure device maintenance and upgrade functions
- Side channel attack protection
- Scripted response to sensor inputs and security attacks, including selective sector zeroization
- Readback, JTAG and test mode disable
- Enhanced response to single-event upsets (SEU)
- Black key provisioning
- Physical anti-tamper
See the Intel® Stratix® 10 Device Security User Guide for a complete list of all security features.
The SDM and associated security services provide a robust, multi-layered security solution for your Intel® Stratix® 10 design.
Intel® Stratix® 10 Family Variant | Bitstream Authentication | Advanced Security Features 12 |
---|---|---|
GX/SX | All devices | -AS suffix part number required |
1.22. Configuration via Protocol Using PCI Express
Configuration via protocol using PCI Express* allows the FPGA to be configured across the PCI Express* bus, simplifying the board layout and increasing system integration. Making use of the embedded PCI Express* hard IP operating in autonomous mode before the FPGA is configured, this technique allows the PCI Express* bus to be powered up and active within the 100 ms time allowed by the PCI Express* specification. Intel® Stratix® 10 devices also support partial reconfiguration across the PCI Express* bus which reduces system down time by keeping the PCI Express* link active while the device is being reconfigured.
1.23. Partial and Dynamic Reconfiguration
In addition to lowering power and cost, partial reconfiguration also increases the effective logic density by removing the necessity to place in the FPGA those functions that do not operate simultaneously. Instead, these functions can be stored in external memory and loaded as needed. This reduces the size of the required FPGA by allowing multiple applications on a single FPGA, saving board space and reducing power. The partial reconfiguration process is built on top of the proven incremental compile design flow in the Intel® Quartus® Prime design software
Dynamic reconfiguration in Intel® Stratix® 10 devices allows transceiver data rates, protocols and analog settings to be changed dynamically on a channel-by-channel basis while maintaining data transfer on adjacent transceiver channels. Dynamic reconfiguration is ideal for applications that require on-the-fly multiprotocol or multi-rate support. Both the PMA and PCS blocks within the transceiver can be reconfigured using this technique. Dynamic reconfiguration of the transceivers can be used in conjunction with partial reconfiguration of the FPGA to enable partial reconfiguration of both core and transceivers simultaneously.
1.24. Fast Forward Compile
The innovative Fast Forward Compile feature in the Intel® Quartus® Prime software identifies performance bottlenecks in your design and provides detailed, step-by-step performance improvement recommendations that you can then implement. The Compiler reports estimates of the maximum operating frequency that can be achieved by applying the recommendations. As part of the new Hyper-Aware design flow, Fast Forward Compile maximizes the performance of your Intel® Stratix® 10 design and achieves rapid timing closure.
Previously, this type of optimization required multiple time-consuming design iterations, including full design re-compilation to determine the effectiveness of the changes. Fast Forward Compile enables you to make better decisions about where to focus your optimization efforts, and how to increase your design performance and throughput. This technique removes much of the guesswork of performance exploration, resulting in fewer design iterations and as much as 2X core performance gains for Intel® Stratix® 10 designs.
1.25. Single Event Upset (SEU) Error Detection and Correction
Intel® Stratix® 10 FPGAs and SoCs offer robust SEU error detection and correction circuitry. The detection and correction circuitry includes protection for Configuration RAM (CRAM) programming bits and user memories. The CRAM is protected by a continuously running parity checker circuit with integrated ECC that automatically corrects one or two bit errors and detects higher order multibit errors.
The physical layout of the CRAM array is optimized to make the majority of multi-bit upsets appear as independent single-bit or double-bit errors which are automatically corrected by the integrated CRAM ECC circuitry. In addition to the CRAM protection, user memories also include integrated ECC circuitry and are layout optimized for error detection and correction.
The SEU error detection and correction hardware is supported by both soft IP and the Intel® Quartus® Prime software to provide a complete SEU mitigation solution. The components of the complete solution include:
- Hard error detection and correction for CRAM and user M20K memory blocks
- Optimized physical layout of memory cells to minimize probability of SEU
- Sensitivity processing soft IP that reports if CRAM upset affects a used or unused bit
- Fault injection soft IP with the Intel® Quartus® Prime software support that changes state of CRAM bits for testing purposes
- Hierarchy tagging in the Intel® Quartus® Prime software
- Triple Mode Redundancy (TMR) used for the Secure Device Manager and critical on-chip state machines
In addition to the SEU mitigation features listed above, the Intel 14 nm tri-gate process technology used for Intel® Stratix® 10 devices is based on FinFET transistors which have reduced SEU susceptibility versus conventional planar transistors.
1.26. Document Revision History for the Intel Stratix 10 GX/SX Device Overview
Document Version | Changes |
---|---|
2020.09.28 | Made the following change:
|
2020.04.30 | Made the following change:
|
2020.03.24 | Made the following changes:
|
2019.08.19 | Made the following changes:
|
2019.02.15 | Made the following changes:
|
2018.08.08 | Made the following changes:
|
2017.10.30 | Made the following changes:
|
2016.10.31 | Made the following changes:
|
2015.12.04 | Initial release. |