Intel Stratix 10 DX Device Overview
1. Intel Stratix 10 DX Device Overview
A low latency, high performance coherent interface is achieved when connecting the FPGA to selected Intel® Xeon® Scalable Processors via Intel® Ultra Path Interconnect (UPI), while the non-coherent interface takes advantage of any PCI Express* (PCIe) Gen4 capable device.
The FPGA's external memory capability now includes support for a new DDR-T soft IP memory controller, allowing interfaces to attach up to 1 TB of high-performance, persistent Intel® Optane™ PMem modules per controller, directly to the FPGAs GPIO banks.
In addition to supporting these interface protocols, the DX variant FPGAs also offer hard intellectual property blocks for 100 Gigabit Ethernet and DDR4 memory control, combined with a high-performance monolithic 14 nm FPGA fabric die, all inside a single flip-chip FBGA package. Select Intel® Stratix® 10 DX devices include an integrated quad-core 64-bit Arm* Cortex* -A53 hard processor subsystem (HPS) on the fabric die, or embedded 3D stacked High-Bandwidth (up to 512 GB/s) DRAM memory (HBM2) inside the package.
As part of the Intel® Stratix® 10 family, the DX variant devices feature other innovations such as the Intel® Hyperflex™ core architecture, variable precision DSP blocks with hardened support for both floating-point and fixed-point operation, and advanced packaging technology based on Intel® Embedded Multi-die Interconnect Bridge (EMIB).
Important innovations in Intel® Stratix® 10 DX devices include:
- Intel® Hyperflex™ core architecture delivering higher core performance compared to previous generation high-performance FPGAs
- Manufactured using Intel® high volume 14 nm tri-gate (FinFET) technology
- Intel® Embedded Multi-die Interconnect Bridge (EMIB) packaging technology
- A soft IP memory controller and PHY supporting DDR-T to directly attach Intel® Optane™ PMem modules to the FPGA, two to four controllers per FPGA, and rates up to 2400 megatransfers per second (one module per channel)
- Transceivers on separate heterogeneous tiles, supporting data rates up to 57.8 gigabits per second (Gbps) Pulse Amplitude Modulation (PAM4) and 28.9 Gbps non-return-to-zero (NRZ) for chip-to-chip, chip-to-module, and backplane driving
- Hard PCI Express* Gen4 x16 intellectual property blocks, with useful features such as endpoint and root port modes, multiple independent controllers, virtualization support for single-root I/O virtualization (SR-IOV), virtual I/O device (VIRTIO), Intel® Scalable I/O Virtualization ( Intel® Scalable IOV), and transaction layer bypass mode
- Hard Intel® UPI intellectual property blocks in select devices, supporting Home Agent soft IP
- Hard 100G Ethernet MAC, 100G Reed-Solomon forward error correction (FEC), and KP-FEC blocks
- 3D stacked High-Bandwidth DRAM Memory (HBM2) in select devices
- Monolithic core fabric with up to 2.8 million logic elements (LEs)
- Hard fixed-point and IEEE 754 compliant hard floating-point variable precision digital signal processing (DSP) blocks
- Hard memory controllers and PHY supporting DDR4 rates up to 2666 megabits per second (Mbps) per pin
- Hard HBM2 memory controllers in devices that include in-package 3D stacked HBM2 DRAM memory
- M20K, 20 kilobit (Kb) internal SRAM memory blocks
- eSRAM, 47.25 megabit (Mb) internal SRAM blocks in select devices
- Quad-core 64-bit Arm* Cortex* -A53 embedded processor running up to 1.5 GHz in select devices, processor subsystem peripherals, and high bandwidth buses to and from the FPGA logic fabric
- Programmable clock tree synthesis for flexible, low power, low skew clock trees
- Dedicated Secure Device Manager (SDM) for enhanced device configuration and security, supporting AES-256, SHA-256/384 and elliptic curve digital signature algorithm (ECDSA) -256/384 encrypt/decrypt accelerators, and multi-factor authentication
- Comprehensive set of advanced power saving features
1.1. Intel Stratix 10 DX Devices
In addition to the coherent and non-coherent protocol interfaces that are required for high-performance acceleration applications, Intel® Stratix® 10 DX FPGAs deliver improved core logic performance compared to previous generation high-performance FPGAs, with densities up to 2.8 million LEs in a monolithic fabric.
The devices also feature up to 84 full-duplex transceivers on separate transceiver tiles, a subset of which are capable of supporting data rates up to 57.8 Gbps PAM4 and 28.9 Gbps NRZ for both short reach and backplane driving applications. External memory interfaces up to 2666 Mbps DDR4 are achieved using hard memory controllers, and some DX variant devices include in-package 3D stacked HBM2 DRAM memory capable of supporting 512 GByte/s memory bandwidth. Select devices contain an embedded hard processor system (HPS) based on an application-class quad-core 64-bit Arm* Cortex* -A53, running at clock rates up to 1.5 GHz, including processor peripherals and high-bandwidth buses to and from the FPGA logic fabric.
The high-performance monolithic FPGA fabric is based on the Intel® Hyperflex™ core architecture that includes additional Hyper-Registers everywhere throughout the interconnect routing and at the inputs of all functional blocks. The core fabric also contains an enhanced logic array utilizing Intel’s adaptive logic module (ALM) and a rich set of high-performance building blocks including:
- M20K, 20 Kb embedded SRAM memory blocks
- eSRAM, 47.25 Mb embedded SRAM memory blocks (in select devices)
- Variable precision DSP blocks with hard fixed point and IEEE 754 compliant hard floating-point
- General purpose IO cells with integer PLLs in every IO bank
- Hard memory controllers and PHY for external memory interfaces
- Hard memory controllers for in-package 3D stacked HBM2 DRAM memory (in select devices)
To clock these fabric building blocks, Intel® Stratix® 10 DX FPGAs use programmable clock tree synthesis, which uses dedicated clock tree routing to synthesize only those branches of the clock trees required for the application.
The high-speed serial transceivers contain both the physical medium attachment (PMA) and the physical coding sublayer (PCS) required to implement a variety of industry standard protocols. In addition to the hard PCS for each transceiver, Intel® Stratix® 10 DX devices contain hard PCI Express* IP that supports up to Gen4 x16 lane configuration, hard Intel® UPI IP in select devices that supports Home Agent soft IP, and hard 10/25/100 Gbps Ethernet MAC IP with dedicated Reed-Solomon FEC for NRZ signals (528, 514) and PAM4 signals (544, 514). These hardened intellectual property blocks free up valuable core logic resources, save power, and increase your productivity.
All Intel® Stratix® 10 DX devices support in-system, fine-grained partial reconfiguration of the logic array, allowing logic add and subtract from the system while it is operating.
1.2. Intel Stratix 10 DX Features Summary
Feature |
Description |
---|---|
Configuration |
|
Core clock networks |
|
Core process technology |
|
Embedded hard IP |
|
General purpose I/Os |
|
High performance monolithic core fabric |
|
Internal memory blocks |
|
Low power serial transceivers |
|
Packaging |
|
Phase locked loops (PLLs) |
|
Power management |
|
Software and tools |
|
Variable precision DSP blocks |
|
1.3. Intel Stratix 10 DX Block Diagram
1.4. Intel Stratix 10 DX Family Plan
Intel® Stratix® 10 DX Device Name |
Logic Elements (KLE) | eSRAM Blocks | eSRAM Mbits | M20K Blocks | M20K Mbits | MLAB Counts | MLAB Mbits |
---|---|---|---|---|---|---|---|
DX 1100 | 1,325 | — | — | 5,461 | 107 | 11,556 | 7 |
DX 2100 | 2,073 | 2 | 94.5 | 6,847 | 134 | 17,856 | 11 |
DX 2800 | 2,753 | — | — | 11,721 | 229 | 23,796 | 15 |
Intel® Stratix® 10 DX Device Name |
18x19 Multipliers1 |
HPS Quad Core | Interconnects | PLL | ||
---|---|---|---|---|---|---|
Maximum GPIOs |
Maximum Transceiver |
External Memory Interfaces (x72 width) |
I/O PLLs | |||
DX 1100 | 5,184 | Yes | 528 | 32 | 2 | 16 |
DX 2100 | 7,920 | — | 612 | 84 | 4 | 16 |
DX 2800 | 11,520 | — | 816 | 84 | 4 | 24 |
Intel® Stratix® 10 DX Device Name |
Hard IP | HBM2 | Tile Layout | |||
---|---|---|---|---|---|---|
Config PCIe* Gen4x16, or Intel® UPI, Hard IP Blocks | PCIe* Gen4x16 Only, Hard IP Blocks | 10/25/100 GbE MACs | Bandwidth (GByte/s) | Density (GByte) | ||
DX 1100 | — | 1 | 4 | — | — | Figure 2 |
DX 2100 | 3 | — | 4 | 512 | 8 | Figure 3 |
DX 2800 | 3 | 1 | 2 | — | — | Figure 4 |
Intel® Stratix® 10 DX Device Name |
F1760 JF43- 32 Transceivers (42.5 mm x 42.5 mm) |
F2597 TF53- 84 Transceivers (52.5 mm x 52.5 mm) |
F2912 TF55- 84 Transceivers (55 mm x 55 mm) |
---|---|---|---|
DX 1100 | 528, 0, 264, 16, 16 | — | — |
DX 2100 | — | 612, 0, 306, 60, 24 | — |
DX 2800 | — | — | 816, 0, 408, 76, 8 |
The P-tile with 16 channels can be used for PCIe* only, not for Intel® UPI.
The P-tile with 20 channels can be used for either PCIe* , or for Intel® UPI.
The P-tile with 16 channels can be used for PCIe* only, not for Intel® UPI.
The P-tile with 20 channels can be used for either PCIe* , or for Intel® UPI.
1.4.1. Available Options
1.5. Intel Hyperflex Core Architecture
Intel® Stratix® 10 DX devices are based on a monolithic core fabric featuring the new Intel® Hyperflex™ core architecture. The Intel® Hyperflex™ core architecture delivers higher performance and up to 70% lower power compared to previous generation high-end FPGAs. Along with this performance breakthrough, the Intel® Hyperflex™ core architecture delivers a number of advantages including:
- Higher Throughput—Capitalizes on high core clock frequency performance to obtain throughput breakthroughs
- Improved Power Efficiency—Uses reduced IP size, enabled by Intel® Hyperflex™ , to consolidate designs which previously spanned multiple devices into a single device, thereby reducing power by up to 70% versus previous generation devices
- Greater Design Functionality—Uses faster clock frequency to reduce bus widths and reduce IP size, freeing up additional FPGA resources to add greater functionality
- Increased Designer Productivity—Boosts performance with less routing congestion and fewer design iterations using Hyper-Aware design tools, obtaining greater timing margin for more rapid timing closure
In addition to the traditional user registers found in the Adaptive Logic Modules (ALM), the Intel® Hyperflex™ core architecture introduces additional bypassable registers everywhere throughout the fabric of the FPGA. These additional registers, called Hyper-Registers are available on every interconnect routing segment and at the inputs of all functional blocks.
The Hyper-Registers enable the following key design techniques to achieve the 2X core performance increases:
- Fine grain Hyper-Retiming to eliminate critical paths
- Zero latency Hyper-Pipelining to eliminate routing delays
- Flexible Hyper-Optimization for best-in-class performance
By implementing these techniques in your design, the Hyper-Aware design tools automatically make use of the Hyper-Registers to achieve maximum core clock frequency.
1.6. Heterogeneous 3D SiP Transceiver Tiles
Intel® Stratix® 10 DX devices feature power efficient, high bandwidth, low latency transceivers. The transceivers are implemented on heterogeneous 3D System-in-Package (SiP) transceiver tiles, each containing up to 24 full-duplex transceiver channels. In addition to providing a high-performance transceiver solution to meet current connectivity needs, this allows for future flexibility and scalability as data rates, modulation schemes, and protocol IPs evolve.
1.7. Intel Stratix 10 DX Transceivers
1.7.1. Intel P-Tile Transceivers and Hard IP
Intel® Stratix® 10 DX devices contain one or more P-tiles, each P-tile containing up to 20 full-duplex transceiver channels, along with PCIe* Gen4 x16 hard IP and Intel® UPI hard IP. If all 20 channels from the P-tile are available in the device, the P-tile can be configured to support either a PCIe* interface or an Intel® UPI interface. If only 16 channels are available, the P-tile supports PCIe* but does not support Intel® UPI which requires all 20 channels. Support for protocols other than PCIe* or Intel® UPI is not possible with the P-tile; it is not possible to bypass the hard IP blocks and connect the P-tile transceivers directly to the FPGA fabric.
Feature |
Capability |
---|---|
PCIe* Configurations |
|
Virtualization Support |
|
Switch Support |
|
Feature |
Capability |
---|---|
Intel® UPI Configurations |
|
1.7.2. Intel E-Tile Transceivers and Hard IP
Each E-tile contains up to 24 full-duplex dual-mode transceivers, each transceiver capable of supporting both Pulse Amplitude Modulation with 4 levels (PAM4) up to 57.8 Gbps, and non-return-to-zero (NRZ) up to 28.9 Gbps. In addition to the transceivers, each E-tile contains multiple instances of 10/25/100 Gbps Ethernet MAC + FEC hard IP blocks. Both Reed-Solomon and KP FEC hard IP blocks are included, allowing complete Ethernet interfaces to be implemented, simplifying the design of complex multi-port Ethernet systems.
Intel® Stratix® 10 DX Device Name | Number of E-Tile Transceiver Channels | Available E-Tile Transceiver Channel Locations |
---|---|---|
DX 1100 | 16 | 0, 1, 2, 3, 8, 9, 10, 11, 12, 13, 14, 15, 20, 21, 22, 23 |
DX 2100 | 24 | 0 through 23 |
DX 2800 | 8 | 0, 1, 2, 3, 12, 13, 14, 15 |
For more information about the E-tile transceivers and the E-tile Ethernet hard IP, refer to the Intel® Stratix® 10 E-Tile Transceiver PHY User Guide.
1.8. Heterogeneous 3D Stacked HBM2 DRAM Memory
This results in a “near memory” implementation where the high-density stacked DRAM is integrated very close to the FPGA in the same package. In this configuration the in-package memory is able to deliver up to 512 GByte/s of total aggregate bandwidth which represents over a 10X increase in bandwidth compared to traditional “far memory” implemented in separate devices on the board. A near memory configuration also reduces system power by reducing traces between the FPGA and memory, while also reducing board area.
Select Intel® Stratix® 10 DX devices integrate two 3D HBM2 DRAM memory stacks inside the package. Each of these DRAM stacks has:
- 4 GByte density per stack, for a total density of 8 GByte per device
- 256 GByte/s bandwidth per stack, for a total aggregate bandwidth of 512 GByte/s per device
- 8 independent channels, each 128 bits wide, or 16 independent pseudo channels, each 64 bits wide (in pseudo channel mode)
- Data transfer rates up to 2 Gbps, per signal, between core fabric and HBM2 DRAM
- Half-rate transfer to core fabric
Intel® Stratix® 10 DX devices use embedded hard memory controllers to access the HBM2 DRAM.
1.9. External Memory and General Purpose I/O
In addition to the bandwidth delivered by the in-package HBM2 DRAM near memory (in selected devices), all Intel® Stratix® 10 DX devices offer substantial external memory bandwidth, supporting DDR4 memory interfaces running at up to 2666 Mbps and DDR-T memory interfaces at up to 2400 megatransfers per second.
This bandwidth is provided along with the ease of design, lower power, and resource efficiencies of hardened high-performance memory controllers. The external memory interfaces can be configured up to a maximum width of 144 bits when using either hard or soft memory controllers.
Each I/O bank contains 48 general purpose I/Os and a high-efficiency hard memory controller capable of supporting many different memory types, each with different performance capabilities. The hard memory controller is also capable of being bypassed and replaced by a soft controller implemented in user logic. The I/Os each have a hardened double data rate (DDR) read/write path (PHY) capable of performing key memory interface functionality such as:
- Read/write leveling
- FIFO buffering to lower latency and improve margin
- Timing calibration
- On-chip termination
The timing calibration is aided by the inclusion of hard microcontrollers based on Intel’s Nios® II technology, specifically tailored to control the calibration of multiple memory interfaces. This calibration allows the Intel® Stratix® 10 DX device to compensate for any changes in process, voltage, or temperature either within the device itself, or within the external memory device. The advanced calibration algorithms ensure maximum bandwidth and robust timing margin across all operating conditions. For the list of features available with the Intel® DDR-T memory controller IP, see External Memory Interface and the DDR-T Memory Controller IP User Guide. For access to the DDR-T Memory Controller IP User Guide, contact My Intel support.
Interface |
Controller Type |
Performance (maximum rate possible) |
---|---|---|
Intel® DDR-T | Soft |
2400 megatransfers per second (one module per controller) |
DDR4 |
Hard |
2666 Mbps |
DDR3 |
Hard |
2133 Mbps |
QDRII+ |
Soft |
1,100 Mtps |
QDRII+ Xtreme |
Soft |
1,266 Mtps |
QDRIV |
Soft |
2,133 Mtps |
RLDRAM III |
Soft |
2400 Mbps |
RLDRAM II |
Soft |
533 Mbps |
Intel® Stratix® 10 DX devices also feature general purpose I/Os capable of supporting a wide range of single-ended and differential I/O interfaces. LVDS rates up to 1.6 Gbps are supported, with each pair of pins having both a differential driver and a differential input buffer. This enables configurable direction for each LVDS pair.
1.10. Adaptive Logic Module (ALM)
Intel® Stratix® 10 DX devices use a similar adaptive logic module (ALM) as the previous generation Intel® Arria® 10 and Stratix® V FPGAs, allowing for efficient implementation of logic functions and easy conversion of IP between the devices.
The ALM block diagram shown in the following figure has eight inputs with a fracturable look-up table (LUT), two dedicated embedded adders, and four dedicated registers.
Key features and capabilities of the ALM include:
- High register count with 4 registers per 8-input fracturable LUT, operating in conjunction with the new Intel® Hyperflex™ architecture, enables Intel® Stratix® 10 DX devices to maximize core performance at very high core logic utilization
- Implements select 7-input logic functions, all 6-input logic functions, and two independent functions consisting of smaller LUT sizes (such as two independent 4-input LUTs) to optimize core logic utilization
The Intel® Quartus® Prime software takes advantage of the ALM logic structure to deliver the highest performance, optimal logic utilization, and lowest compile times. The Intel® Quartus® Prime software simplifies design reuse as it automatically maps legacy designs into the Intel® Stratix® 10 ALM architecture.
1.11. Core Clocking
Core clocking in Intel® Stratix® 10 DX devices makes use of programmable clock tree synthesis.
This technique uses dedicated clock tree routing and switching circuits, and allows the Intel® Quartus® Prime software to create the exact clock trees required for your design. Clock tree synthesis minimizes clock tree insertion delay, reduces dynamic power dissipation in the clock tree and allows greater clocking flexibility in the core while still maintaining backwards compatibility with legacy global and regional clocking schemes.
The core clock network in Intel® Stratix® 10 DX devices supports the high-performance Intel® Hyperflex™ core architecture and also supports the hard memory controllers at rates up to 2666 Mbps with a quarter rate transfer to the core. The core clock network is driven by either dedicated clock input pins, or integer I/O PLLs.
1.12. I/O PLLs
Intel® Stratix® 10 DX devices contain up to 24 integer I/O PLLs (IOPLLs) available for general purpose use in the core fabric and for simplifying the design of external memory interfaces and high-speed LVDS interfaces. The IOPLLs are located in each bank of 48 general purpose I/O, one per I/O bank, adjacent to the hard memory controllers and LVDS SerDes in each I/O bank. This makes it easier to close timing because the IOPLLs are tightly coupled with I/Os that need to use them. The IOPLLs can be used for general purpose applications in the core such as clock network delay compensation and zero-delay clock buffering
1.13. Internal Embedded Memory
Intel® Stratix® 10 DX devices contain three types of embedded memory blocks: eSRAM (47.25 Mbit), M20K (20 Kb), and MLAB (640 bit). This variety of on-chip memory provides fast access times and low latency for applications such as wide and deep FIFOs and variable buffers. Combined with the in-package memory provided by the HBM2 DRAM stacks in select devices, the internal embedded memory completes the memory hierarchy in Intel® Stratix® 10 DX devices.
The eSRAM blocks are a new innovation in Intel® Stratix® 10 devices. These large embedded SRAM blocks are tightly coupled to the core fabric and are directly accessible with no need for a separate memory controller. Each eSRAM block is arranged as 8 channels, 42 banks per channel, with a total capacity of 47.25 Mbits running at clock rates up to 750 MHz. Within the eSRAM block, each channel has a bus width of 72 bit read and 72 bit write, and has one READ and one WRITE per channel. This allows each eSRAM block to support a total aggregate bandwidth (read + write) of up to 864 Gbps.
The eSRAM block is implemented as a simple dual port memory with concurrent read and write access per channel, and includes integrated hard ECC generation and checking. Compared to an off-chip SRAM solution, the eSRAM block allows you to reduce system power and save board space and cost.
The M20K and MLAB blocks are familiar block sizes carried over from previous Intel device families. The MLAB blocks are ideal for wide and shallow memories, while the M20K blocks are intended to support larger memory configurations and include hard ECC. Both M20K and MLAB embedded memory blocks can be configured as a single-port or dual-port RAM, FIFO, ROM, or shift register. These memory blocks are highly flexible and support a number of memory configurations as shown in the table.
MLAB (640 bits) |
M20K (20 Kb) |
---|---|
64 x 10 (supported through emulation) 32 x 20 |
2K x 10 (or x8) 1K x 20 (or x16) 512 x 40 (or x32) |
1.14. Variable Precision DSP Block
The Intel® Stratix® 10 DX DSP blocks are based upon the Variable Precision DSP Architecture used in Intel’s previous generation devices. They feature hard fixed point and IEEE 754 compliant floating point capability.
The DSP blocks can be configured to support signal processing with precision ranging from 18x19 up to 54x54. A pipeline register has been added to increase the maximum operating frequency of the DSP block and reduce power consumption.
Each DSP block can be independently configured at compile time as either dual 18x19 or a single 27x27 multiply accumulate. With a dedicated 64 bit cascade bus, multiple variable precision DSP blocks can be cascaded to implement even higher precision DSP functions efficiently.
In floating point mode, each DSP block provides one single precision floating point multiplier and adder. Floating point additions, multiplications, mult-adds and mult-accumulates are supported.
The following table shows how different precisions are accommodated within a DSP block, or by utilizing multiple blocks.
Multiplier Size |
DSP Block Resources |
Expected Usage |
---|---|---|
18x19 bits |
1/2 of Variable Precision DSP Block |
Medium precision fixed point |
27x27 bits |
1 Variable Precision DSP Block |
High precision fixed point |
19x36 bits |
1 Variable Precision DSP Block with external adder |
Fixed point FFTs |
36x36 bits |
2 Variable Precision DSP Blocks with external adder |
Very high precision fixed point |
54x54 bits |
4 Variable Precision DSP Blocks with external adder |
Double Precision floating point |
Single Precision floating point | 1 Single Precision floating point adder, 1 Single Precision floating point multiplier | Floating point |
Complex multiplication is very common in DSP algorithms. One of the most popular applications of complex multipliers is the FFT algorithm. This algorithm has the characteristic of increasing precision requirements on only one side of the multiplier. The Variable Precision DSP block supports the FFT algorithm with proportional increase in DSP resources as the precision grows.
Complex Multiplier Size |
DSP Block Resources |
FFT Usage |
---|---|---|
18x19 bits |
2 Variable Precision DSP Blocks |
Resource optimized FFT |
27x27 bits |
4 Variable Precision DSP Blocks |
Highest precision FFT |
For FFT applications with high dynamic range requirements, the Intel FFT IP Core offers an option of single precision floating point implementation with resource usage and performance similar to high precision fixed point implementations.
Other features of the DSP block include:
- Hard 18 bit and 25 bit pre-adders
- Hard floating point multipliers and adders
- 64 bit dual accumulator (for separate I, Q product accumulations)
- Cascaded output adder chains for 18 and 27 bit FIR filters
- Embedded coefficient registers for 18 and 27 bit coefficients
- Fully independent multiplier outputs
- Inferability using HDL templates supplied by the Intel® Quartus® Prime software for most modes
The Variable Precision DSP block is ideal to support the growing trend towards higher bit precision in high performance DSP applications. At the same time, it can efficiently support the many existing 18 bit DSP applications, such as high definition video processing and remote radio heads. With the Variable Precision DSP block architecture and hard floating point multipliers and adders, Intel® Stratix® 10 DX devices can efficiently support many different precision levels up to and including floating point implementations. This flexibility can result in increased system performance, reduced power consumption, and reduce architecture constraints on system algorithm designers.
1.15. Hard Processor System (HPS)
The Hard Processor System (HPS) in select Intel® Stratix® 10 DX devices is Intel's third generation HPS. Leveraging the performance of Intel 14 nm tri-gate technology, the HPS provides more than double the performance of previous generation devices with an integrated quad-core 64-bit Arm* Cortex* -A53. The HPS also enables system-wide hardware virtualization capabilities by adding a system memory management unit.
1.15.1. Key Features of the Intel Stratix 10 HPS
Feature |
Description |
---|---|
Quad-core Arm* Cortex* -A53 MPCore processor unit |
|
System Memory Management Unit |
|
Cache Coherency unit |
|
Cache |
|
On-Chip Memory |
|
External SDRAM and Flash Memory Interfaces for HPS |
|
Communication Interface Controllers |
|
Timers and I/O |
|
Interconnect to Logic Core |
|
1.16. Power Management
Intel® Stratix® 10 DX devices use the advanced Intel 14 nm tri-gate process technology, the all new Intel® Hyperflex™ core architecture to enable Hyper-Folding, power gating, and optional power reduction techniques to reduce total power consumption by as much as 70% compared to previous generation high-performance Stratix® V devices.
Intel® Stratix® 10 standard power devices (-V) are SmartVID devices. The core voltage supplies (VCC and VCCP) for each SmartVID device must be driven by a PMBus voltage regulator dedicated to that Intel® Stratix® 10 device. Use of a PMBus voltage regulator for each SmartVID (-V) device is mandatory; it is not an option. A code is programmed into each SmartVID device during manufacturing that allows the PMBus voltage regulator to operate at the optimum core voltage to meet the device performance specifications.
With the new Intel® Hyperflex™ core architecture, designs can run faster than previous generation FPGAs. With faster performance and same required throughput, architects can reduce the width of the data path to save power. This optimization is called Hyper-Folding. Additionally, power gating reduces static power of unused resources in the FPGA by powering them down. The Intel® Quartus® Prime software automatically powers down specific unused resource blocks such as DSP and M20K blocks, at configuration time.
Furthermore, Intel® Stratix® 10 DX devices feature Intel’s low power transceivers and include a number of hard IP blocks that not only reduce logic resources but also deliver substantial power savings compared to soft implementations. In general, hard IP blocks consume up to 50% less power than the equivalent soft logic implementations.
1.17. Device Configuration and Secure Device Manager (SDM)
All Intel® Stratix® 10 DX devices contain a Secure Device Manager (SDM), which is a dedicated triple-redundant processor that serves as the point of entry into the device for all JTAG and configuration commands. The SDM also bootstraps the HPS in SoC devices ensuring that the HPS can boot using the same security features that the FPGA devices have.
During configuration, Intel® Stratix® 10 DX devices are divided into logical sectors, each of which is managed by a local sector manager (LSM). The SDM passes configuration data to each of the LSMs across the on-chip configuration network. This allows the sectors to be configured independently, one at a time, or in parallel. This approach achieves simplified sector configuration and reconfiguration, as well as reduced overall configuration time due to the inherent parallelism. The same sector-based approach is used to respond to single-event upsets and security attacks.
While the sectors provide a logical separation for device configuration and reconfiguration, they overlay the normal rows and columns of FPGA logic and routing. This means there is no impact to the Intel® Quartus® Prime software place and route, and no impact to the timing of logic signals that cross the sector boundaries.
The SDM enables robust, secure, fully-authenticated device configuration. It also allows for customization of the configuration scheme, which can enhance device security. For configuration and reconfiguration, this approach offers a variety of advantages:
- Dedicated secure configuration manager
- Reduced device configuration time, because sectors are configured in parallel
- Updateable configuration process
- Reconfiguration of one or more sectors independent of all other sectors
- Zeroization of individual sectors or the complete device
1.18. Device Security
Building on top of the robust security features present in the previous generation devices, Intel® Stratix® 10 DX devices include a number of new and innovative security enhancements. These features are also managed by the SDM, tightly coupling device configuration and reconfiguration with encryption, authentication, key storage and anti-tamper services.
Security services provided by the SDM include:
- Bitstream encryption
- Multi-factor authentication
- Hard encryption and authentication acceleration; AES-256, SHA-256/384, ECDSA-256/384
- Volatile and non-volatile encryption key storage and management
- Physically Unclonable Function (PUF) service
- Updateable configuration process
- Secure device maintenance and upgrade functions
- Side channel attack protection
- Scripted response to sensor inputs and security attacks, including selective sector zeroization
- Readback, JTAG and test mode disable
- Enhanced response to single-event upsets (SEU)
- Black key provisioning
- Physical anti-tamper
See the Intel® Stratix® 10 Device Security User Guide for a complete list of all security features.
The SDM and associated security services provide a robust, multi-layered security solution for your Intel® Stratix® 10 DX design.
Intel® Stratix® 10 Family Variant | Bitstream Authentication | Advanced Security Features 6 |
---|---|---|
DX | All devices | All devices |
1.19. Configuration via Protocol Using PCI Express
Configuration via protocol using PCI Express* allows the FPGA to be configured across the PCI Express* bus, simplifying the board layout and increasing system integration. Making use of the embedded PCI Express* hard IP operating in autonomous mode before the FPGA is configured, this technique allows the PCI Express* bus to be powered up and active within the 100 ms time allowed by the PCI Express* specification. Intel® Stratix® 10 DX devices also support partial reconfiguration across the PCI Express* bus which reduces system down time by keeping the PCI Express* link active while the device is being reconfigured.
1.20. Partial and Dynamic Reconfiguration
In addition to lowering power and cost, partial reconfiguration also increases the effective logic density by removing the necessity to place in the FPGA those functions that do not operate simultaneously. Instead, these functions can be stored in external memory and loaded as needed. This reduces the size of the required FPGA by allowing multiple applications on a single FPGA, saving board space and reducing power. The partial reconfiguration process is built on top of the proven incremental compile design flow in the Intel® Quartus® Prime design software
Dynamic reconfiguration in Intel® Stratix® 10 DX devices allows transceiver data rates, protocols and analog settings to be changed dynamically on a channel-by-channel basis while maintaining data transfer on adjacent transceiver channels. Dynamic reconfiguration is ideal for applications that require on-the-fly multiprotocol or multi-rate support. Both the PMA and PCS blocks within the transceiver can be reconfigured using this technique. Dynamic reconfiguration of the transceivers can be used in conjunction with partial reconfiguration of the FPGA to enable partial reconfiguration of both core and transceivers simultaneously.
1.21. Fast Forward Compile
The innovative Fast Forward Compile feature in the Intel® Quartus® Prime software identifies performance bottlenecks in your design and provides detailed, step-by-step performance improvement recommendations that you can then implement. The Compiler reports estimates of the maximum operating frequency that can be achieved by applying the recommendations. As part of the new Hyper-Aware design flow, Fast Forward Compile maximizes the performance of your Intel® Stratix® 10 DX design and achieves rapid timing closure.
Previously, this type of optimization required multiple time-consuming design iterations, including full design re-compilation to determine the effectiveness of the changes. Fast Forward Compile enables you to make better decisions about where to focus your optimization efforts, and how to increase your design performance and throughput. This technique removes much of the guesswork of performance exploration, resulting in fewer design iterations.
1.22. Single Event Upset (SEU) Error Detection and Correction
Intel® Stratix® 10 DX devices offer robust SEU error detection and correction circuitry. The detection and correction circuitry includes protection for Configuration RAM (CRAM) programming bits and user memories. The CRAM is protected by a continuously running parity checker circuit with integrated ECC that automatically corrects one or two bit errors and detects higher order multibit errors.
The physical layout of the CRAM array is optimized to make the majority of multi-bit upsets appear as independent single-bit or double-bit errors which are automatically corrected by the integrated CRAM ECC circuitry. In addition to the CRAM protection, user memories also include integrated ECC circuitry and are layout optimized for error detection and correction.
The SEU error detection and correction hardware is supported by both soft IP and the Intel® Quartus® Prime software to provide a complete SEU mitigation solution. The components of the complete solution include:
- Hard error detection and correction for CRAM and user eSRAM and M20K memory blocks
- Optimized physical layout of memory cells to minimize probability of SEU
- Sensitivity processing soft IP that reports if CRAM upset affects a used or unused bit
- Fault injection soft IP with the Intel® Quartus® Prime software support that changes state of CRAM bits for testing purposes
- Hierarchy tagging in the Intel® Quartus® Prime software
- Triple Mode Redundancy (TMR) used for the Secure Device Manager and critical on-chip state machines
In addition to the SEU mitigation features listed above, the Intel 14 nm tri-gate process technology used for Intel® Stratix® 10 DX devices is based on FinFET transistors which have reduced SEU susceptibility versus conventional planar transistors.
1.23. Document Revision History for the Intel Stratix 10 DX Device Overview
Document Version | Changes |
---|---|
2020.09.28 | Made the following change:
|
2020.03.24 | Made the following changes:
|
2019.09.19 | Initial release. |