Stratix® 10 GX FPGAs and SX SoCs deliver 2X the core
performance and up to 70% lower power over previous generation high-performance FPGAs.
Featuring several groundbreaking innovations, including the all new
core architecture, this device family enables you to meet the demand for ever-increasing
bandwidth and processing performance in your most advanced applications, while meeting
your power budget.
With an embedded hard processor system (HPS) based on a quad-core 64 bit
Stratix® 10 SoC
devices deliver power efficient, application-class processing and allow designers to
extend hardware virtualization into the FPGA fabric.
Stratix® 10 SoC devices demonstrate Intel's commitment to high-performance
SoCs and extend Intel's leadership in
programmable devices featuring an
-based processor system.
Important innovations in
Stratix® 10 FPGAs and SoCs include:
core architecture delivering 2X the core performance compared to previous generation
Intel14 nm tri-gate (FinFET) technology
Heterogeneous 3D System-in-Package
fabric with up to
million logic elements (LEs)
Up to 96 full duplex transceiver
channels on heterogeneous 3D SiP transceiver tiles
Transceiver data rates up to 28.3
Gbps chip-to-chip/module and backplane performance
M20K (20 Kb) internal SRAM memory blocks
Fractional synthesis and ultra-low jitter LC tank
based transmit phase locked loops (PLLs)
Hard PCI Express® Gen3 x16 intellectual property (IP)
Hard 10GBASE-KR/40GBASE-KR4 Forward Error Correction
(FEC) in every transceiver channel
Hard memory controllers and PHY supporting DDR4 rates
up to 2666 Mbps per pin
Hard fixed-point and IEEE 754 compliant hard floating-point variable
precision digital signal processing (DSP) blocks with up to 10 TFLOP compute
performance with a power efficiency of 80 GFLOP per Watt
Quad-core 64 bit
-A53 embedded processor running up to 1.5 GHz in SoC family variants
Programmable clock tree synthesis for flexible, low power, low skew clock
Dedicated secure device manager (SDM) for:
Enhanced device configuration and security
AES-256, SHA-256/384 and ECDSA-256/384 encrypt/decrypt accelerators and
Physically Unclonable Function (PUF) service and software
programmable device configuration capability
Comprehensive set of advanced power saving features delivering up to 70% lower power
compared to previous generation high-performance FPGAs
Non-destructive register state readback and writeback, to support ASIC prototyping
and other applications
With these capabilities,
Stratix® 10 FPGAs and SoCs are ideally
suited for the most demanding applications in diverse markets such as:
Compute and Storage—for custom servers, cloud
computing and datacenter acceleration
Networking—for Terabit, 400G and multi-100G bridging,
aggregation, packet processing and traffic management
Optical Transport Networks—for OTU4, 2xOTU4, 4xOTU4
Broadcast—for high-end studio distribution, head end
encoding/decoding, edge quadrature amplitude modulation (QAM)
Military—for radar, electronic warfare, and secure communications
Medical—for diagnostic scanners and diagnostic imaging
Test and Measurement—for protocol and application testers
Wireless—for next-generation 5G networks
ASIC Prototyping—for designs that require the
FPGA fabric with the highest I/O count
1.1. Intel Stratix 10 GX/SX Family Variants
Stratix® 10 devices are available in FPGA (GX) and SoC (SX)
Stratix® 10 GX devices deliver
up to 1 GHz core fabric performance and contain up to
million LEs in
They also feature up to 96 general purpose transceivers on separate transceiver
tiles, and 2666 Mbps DDR4 external memory interface performance. The
transceivers are capable of up to 28.3 Gbps short reach and across the
backplane. These devices are optimized for FPGA applications that require the
highest transceiver bandwidth and core fabric performance, with the power
efficiency of Intel’s 14 nm tri-gate
Stratix® 10 SX devices have a feature set that is
Stratix® 10 GX devices, with the
addition of an embedded quad-core
A53 hard processor
Common to all
Stratix® 10 family variants is a
high-performance fabric based on the new
core architecture that includes additional Hyper-Registers throughout the
interconnect routing and at the inputs of all functional blocks. The core fabric
also contains an enhanced logic array utilizing
Intel’s adaptive logic module (ALM) and a rich set of high
performance building blocks including:
M20K (20 Kb) embedded memory blocks
Variable precision DSP blocks with hard IEEE 754 compliant floating-point
Fractional synthesis and integer PLLs
Hard memory controllers and PHY for external memory interfaces
General purpose IO cells
To clock these building blocks,
Stratix® 10 devices use
programmable clock tree synthesis, which uses dedicated clock tree routing to
synthesize only those branches of the clock trees required for the application. All
devices support in-system, fine-grained partial reconfiguration of the logic array,
allowing logic to be added and subtracted from the system while it is operating.
All family variants also contain high speed serial transceivers, containing
both the physical medium attachment (PMA) and the physical coding sublayer (PCS),
which can be used to implement a variety of industry standard and proprietary
protocols. In addition to the hard PCS,
devices contain multiple instantiations of
hard IP that supports Gen1/Gen2/Gen3 rates in x1/x2/x4/x8/x16 lane configurations,
and hard 10GBASE-KR/40GBASE-KR4 FEC for every transceiver. The hard PCS, FEC, and
PCI Express IP free up valuable core logic resources, save power, and increase your
1.1.1. Available Options
Figure 1. Sample Ordering Code and Available Options for
Global, quadrant and regional
clocks supported by fractional-synthesis fPLLs
Programmable clock tree synthesis
supported by fractional synthesis fPLLs and integer IO PLLs
Register state readback and writeback
Non-destructive register state
readback and writeback for ASIC prototyping and other
These innovations result in the following improvements:
Logic Performance: The
core architecture combined
Stratix® 10 devices to achieve 2X
the core performance compared to the previous generation
Stratix® 10 devices use up to 70%
lower power compared to the previous generation, enabled by
core architecture, and optional power saving features built into the
Stratix® 10 devices offer three
times the level of integration, with up
10.2 million logic elements
Mbits of embedded memory blocks (M20K), and 11,520 18x19 multipliers
Stratix® 10 SoCs feature a Quad-Core 64 bit
-A53 processor optimized for power efficiency and
software compatible with previous generation
Transceiver Performance: With up to 96 transceiver channels implemented
in heterogeneous 3D SiP transceiver tiles,
Stratix® 10 GX and SX
devices support data rates up to 28.3 Gbps chip-to-chip and 28.3 Gbps across the
backplane with signal conditioning circuits capable of equalizing over 30 dB of
Performance: The variable precision DSP block in
Stratix® 10 devices features hard fixed and floating
point capability, with up to 10 TFLOP IEEE754 single-precision floating point
Additional Hard IP:
Stratix® 10 devices include many more hard IP blocks than previous
generation devices, with a hard memory controller included in each bank of 48
general purpose IOs, a hard
Gen3 x16 full protocol stack in each transceiver tile, and a hard
10GBASE-KR/40GBASE-KR4 FEC in every transceiver channel
Enhanced Core Clocking:
devices feature programmable clock tree synthesis; clock trees are only
synthesized where needed, increasing the flexibility and reducing the power
dissipation of the clocking solution
Additional Core PLLs: The core fabric in
Stratix® 10 devices is supported by both integer IO PLLs and fractional
synthesis fPLLs, resulting in a greater total number of PLLs available than the
1.3. FPGA and SoC Features Summary
Stratix® 10 FPGA and SoC Common Device Features
(FinFET) process technology
voltage, standard power
0.85-V fixed core voltage, low static power devices
Low power serial transceivers
Up to 96 total transceivers available
Continuous operating range of 1
Gbps to 28.3 Gbps for
Stratix® 10 GX/SX devices
Backplane support up to 28.3
Stratix® 10 GX/SX devices
down to 125 Mbps with oversampling
ATX transmit PLLs
with user-configurable fractional synthesis capability
XFP, SFP+, QSFP/QSFP28,
CFP/CFP2/CFP4 optical module support
and decision feedback equalization
pre-emphasis and de-emphasis
reconfiguration of individual transceiver channels
(Eye Viewer non-intrusive data eye
General purpose I/Os
total GPIO available
1.6 Gbps LVDS—every pair can be
configured as an input or output
1333 MHz/2666 Mbps
DDR4 external memory interface
1067 MHz/2133 Mbps
DDR3 external memory interface
1.2 V to
single-ended LVCMOS/LVTTL interfacing
Embedded hard IP
PCIe Gen1/Gen2/Gen3 complete
protocol stack, x1/x2/x4/x8/x16 end point and root port
hard memory controller (RLDRAM3/QDR II+/QDR IV using soft
2 The number of 27x27 multipliers is one-half
the number of 18x19 multipliers.
3 All packages are ball
grid arrays with 1.0 mm pitch.
4 High-Voltage I/O pins are used for 3 V and 2.5 V interfacing.
5 Each LVDS pair can be configured as either a
differential input or a differential output.
6 High-Voltage I/O pins and LVDS pairs are included in the
General Purpose I/O count. Transceivers are counted separately.
7 Each package column offers pin migration
(common circuit board footprint) for all devices in the
For more information about vertical migration and the common circuit board
footprint, see Vertical Device Migration in
Stratix® 10 Device Design
Stratix® 10 GX devices
are pin migratable with
Stratix® 10 SX
devices in the same package.
Stratix® 10 SX/GX 400 device has a level shifter, and
this imposes some restrictions on the number of LVDS pairs and
I/O banks available (see "
Stratix® 10 SX/GX 400 Device Level Shifter Details").
1.6. Intel Hyperflex Core Architecture
Stratix® 10 FPGAs and
SoCs are based on a
fabric featuring the new
core architecture. The
core architecture delivers 2X the clock
frequency performance and up to 70% lower power compared to previous generation high-end
FPGAs. Along with this performance breakthrough, the
core architecture delivers a number of advantages including:
Higher Throughput—Capitalizes on 2X core clock frequency performance
to obtain throughput breakthroughs
Improved Power Efficiency—Uses reduced IP size,
, to consolidate designs which
previously spanned multiple devices into a single device, thereby reducing power by up to
70% versus previous generation devices
Greater Design Functionality—Uses faster clock
frequency to reduce bus widths and reduce IP size, freeing up additional FPGA resources to
add greater functionality
Increased Designer Productivity—Boosts performance
with less routing congestion and fewer design iterations using Hyper-Aware design tools,
obtaining greater timing margin for more rapid timing closure
In addition to the traditional user registers found in the Adaptive Logic
Modules (ALM), the
core architecture introduces
additional bypassable registers everywhere throughout the fabric of the FPGA. These additional
registers, called Hyper-Registers are available on every interconnect routing segment and at
the inputs of all functional blocks.
Figure 4. Bypassable Hyper-Register
The Hyper-Registers enable the following key design techniques to achieve the
2X core performance increases:
Fine grain Hyper-Retiming to eliminate critical paths
Zero latency Hyper-Pipelining to eliminate routing delays
Flexible Hyper-Optimization for best-in-class performance
By implementing these techniques in your design, the Hyper-Aware design tools
automatically make use of the Hyper-Registers to achieve maximum core clock frequency.
1.7. Heterogeneous 3D SiP Transceiver Tiles
Stratix® 10 FPGAs and
SoCs feature power efficient, high bandwidth, low latency transceivers. The transceivers are
implemented on heterogeneous 3D System-in-Package (SiP) transceiver tiles, each containing 24
full-duplex transceiver channels. In addition to providing a high-performance transceiver
solution to meet current connectivity needs, this allows for future flexibility and
scalability as data rates, modulation schemes, and protocol IPs evolve.
Figure 6. Monolithic Core Fabric and Heterogeneous 3D SiP Transceiver
Figure 7. Dual Core Fabric and Heterogeneous 3D SiP Transceiver Tiles (for the
Stratix® 10 GX 10M Variant Only)
transceiver tile contains:
24 full-duplex transceiver channels (PMA and
Reference clock distribution network
High-speed clocking and bonding networks
One instance of PCI Express hard IP
Figure 8. Heterogeneous 3D SiP Transceiver Tile Architecture
full-duplex transceiver channels for the
Stratix® 10 GX 10M
1.8. Intel Stratix 10 Transceivers
Stratix® 10 devices offer up to 96 total full-duplex
transceiver channels. These channels provide continuous data rates from 1 Gbps to
28.3 Gbps for chip-to-chip, chip-to-module, and backplane applications. In each
device,two thirds of the transceivers can be configured up to the maximum data rate
of 28.3 Gbps to drive 100G interfaces and C form-factor pluggable CFP2/CFP4 optical
modules. For longer-reach backplane driving applications, advanced adaptive
equalization circuits are used to equalize over 30 dB of system loss.
All transceiver channels feature a dedicated Physical Medium
Attachment (PMA) and a hardened Physical Coding Sublayer (PCS).
The PMA provides primary
interfacing capabilities to physical channels.
The PCS typically handles
encoding/decoding, word alignment, and other pre-processing functions before
transferring data to the FPGA core fabric.
Within each transceiver tile, the transceivers are arranged in four banks of
six PMA-PCS groups. A wide variety of bonded and non-bonded data rate configurations
are possible within each bank, and within each tile, using a highly configurable
clock distribution network.
1.8.1. PMA Features
PMA channels are comprised of transmitter (TX), receiver (RX),
and high speed clocking resources.
Stratix® 10 device features provide
exceptional signal integrity at data rates up to 28.3 Gbps. Clocking options include
ultra-low jitter LC tank-based (ATX) PLLs with optional fractional synthesis
capability, channel PLLs operating as clock multiplier units (CMUs), and fractional
synthesis PLLs (fPLLs).
ATX PLL—can be configured in
integer mode, or optionally, in a new fractional synthesis mode. Each ATX PLL
spans the full frequency range of the supported data rate range providing a
stable, flexible clock source with the lowest jitter.
CMU PLL—when not being used
as a transceiver, select PMA channels can be configured as channel PLLs
operating as CMUs to provide an additional master clock source within the
fPLL—In addition, dedicated
fPLLs are available with precision frequency synthesis capabilities. fPLLs can
be used to synthesize multiple clock frequencies from a single reference clock
source and replace multiple reference oscillators for multi-protocol and
On the receiver side, each PMA has an independent channel PLL that allows
analog tracking for clock-data recovery. Each PMA also has advanced equalization
circuits that compensate for transmission losses across a wide frequency spectrum.
Variable Gain Amplifier
(VGA)—to optimize the receiver's dynamic range
Continuous Time Linear Equalizer (CTLE)—to compensate for channel losses
with lowest power dissipation
Decision Feedback Equalizer (DFE)—to provide additional equalization
capability on backplanes even in the presence of crosstalk and reflections
On-Die Instrumentation (ODI)—to provide on-chip eye monitoring
capabilities (Eye Viewer). This capability helps to optimize link equalization
parameters during board bring-up and supports in-system link diagnostics and
equalization margin testing
Stratix® 10 Receiver
All link equalization parameters feature automatic adaptation using the new
Advanced Digital Adaptive Parametric Tuning (ADAPT) circuit. This circuit is used to
dynamically set DFE tap weights, adjust CTLE parameters, and optimize VGA gain and
threshold voltage. Finally, optimal and consistent signal integrity is ensured by
using the new hardened Precision Signal Integrity Calibration Engine (PreSICE) to
automatically calibrate all transceiver circuit blocks on power-up. This gives the
most link margin and ensures robust, reliable, and error-free operation.
5-tap transmit pre-emphasis and de-emphasis to compensate for
system channel loss
Continuous Time Linear Equalizer (CTLE)
Dual mode, high-gain, and high-data rate, linear
receive equalization to compensate for system channel loss
Decision Feedback Equalizer (DFE)
15 fixed tap DFE to equalize backplane channel
loss in the presence of crosstalk and noisy environments
Advanced Digital Adaptive Parametric Tuning (ADAPT)
Fully digital adaptation engine to automatically adjust all
link equalization parameters—including CTLE, DFE, and VGA blocks—that provide
optimal link margin without intervention from user logic
Precision Signal Integrity Calibration Engine (PreSICE)
Hardened calibration controller to quickly calibrate all
transceiver control parameters on power-up, which provides the optimal signal
integrity and jitter performance
ATX Transmit PLLs
Low jitter ATX (inductor-capacitor) transmit PLLs with continuous tuning
range to cover a wide range of standard and proprietary
protocols, with optional fractional frequency synthesis
On-chip fractional frequency synthesizers to replace on-board
crystal oscillators and reduce system cost
Digitally Assisted Analog CDR
Superior jitter tolerance with fast lock time
On-Die Instrumentation— Eye Viewer and Jitter Margin Tool
Simplify board bring-up, debug, and diagnostics with
non-intrusive, high-resolution eye monitoring (Eye Viewer). Also inject jitter from
transmitter to test link margin in system.
Allows for independent control of each transceiver channel Avalon
memory-mapped interface for the most transceiver
Multiple PCS-PMA and PCS-Core to FPGA fabric interface widths
bit interface widths for flexibility of
deserialization width, encoding, and reduced latency
11 Stratix 10 transceivers
can support data rates below 1 Gbps with over sampling.
1.8.2. PCS Features
PMA channels interface with core logic through configurable and bypassable PCS
The PCS contains multiple gearbox implementations to decouple the PMA
and PCS interface widths. This feature provides the flexibility to implement a wide
range of applications with 8, 10, 16, 20, 32, 40, or 64 bit interface width between
each transceiver and the core logic.
The PCS also contains hard IP to support a variety of
standard and proprietary protocols across a wide range of data rates and encoding
schemes. The Standard PCS mode provides support for 8B/10B encoded applications up
to 12.5 Gbps. The Enhanced PCS mode supports 64B/66B and 64B/67B encoded
applications up to 17.4 Gbps. The enhanced PCS mode also includes an integrated
10GBASE-KR/40GBASE-KR4 Forward Error Correction (FEC) circuit. For highly customized
implementations, a PCS Direct mode provides an interface up to 64 bits wide to allow
for custom encoding and support for data rates up to 28.3 Gbps.
For more information about the PCS-Core interface or the double rate
transfer mode, refer to the
Stratix® 10 L- and H-Tile Transceiver PHY User Guide, and the
Stratix® 10 E-Tile
Transceiver PHY User Guide.
Stratix® 10 devices contain embedded PCI Express
hard IP designed for performance, ease-of-use, increased functionality, and designer
The PCI Express hard IP consists of the PHY, Data Link, and Transaction layers.
It also supports PCI Express Gen1/Gen2/Gen3 end point and root port, in
x1/x2/x4/x8/x16 lane configurations. The PCI Express hard IP is capable of operating
independently from the core logic (autonomous mode). This feature allows the PCI
Express link to power up and complete link training in less than 100 ms, while the
rest of the device is still in the process of being configured. The hard IP also
provides added functionality, which makes it easier to support emerging features
such as Single Root I/O Virtualization (SR-IOV) and optional protocol extensions.
The PCI Express hard IP has improved end-to-end data path protection using Error
Checking and Correction (ECC). In addition, the hard IP supports configuration of
the device via protocol (CvP) across the PCI Express bus at Gen1/Gen2/Gen3
1.10. Interlaken PCS Hard IP
Stratix® 10 devices have integrated Interlaken
PCS hard IP supporting rates up to 17.4 Gbps per lane.
The Interlaken PCS hard IP is based on the proven functionality
of the PCS developed for Intel’s previous
generation FPGAs, which has demonstrated interoperability with Interlaken ASSP
vendors and third-party IP suppliers. The Interlaken PCS hard IP is present in every
transceiver channel in
1.11. 10G Ethernet Hard IP
Stratix® 10 devices include IEEE 802.3 10-Gbps Ethernet
(10GbE) compliant 10GBASE-R PCS and PMA hard IP. The scalable 10GbE hard IP
supports multiple independent 10GbE ports while using a single PLL for all the
10GBASE-R PCS instantiations, which saves on core logic resources and clock
The integrated serial transceivers simplify multi-port 10GbE systems compared to
10 GbE Attachment Unit Interface (XAUI) interfaces that require an external
XAUI-to-10G PHY. Furthermore, the integrated transceivers incorporate signal
conditioning circuits, which enable direct connection to standard 10G XFP and SFP+
pluggable optical modules. The transceivers also support backplane Ethernet
applications and include a hard 10GBASE-KR/40GBASE-KR4 Forward Error Correction
(FEC) circuit that can be used for both 10G and 40G applications. The integrated 10G
Ethernet hard IP and 10G transceivers save external PHY cost, board space and system
power. The 10G Ethernet PCS hard IP and 10GBASE-KR FEC are present in every
This bandwidth is provided along with the ease of design, lower
power, and resource efficiencies of hardened high-performance memory controllers.
The external memory interfaces can be configured up to a maximum width of 144 bits
when using either hard or soft memory controllers.
Figure 10. Hard Memory Controller
Each I/O bank contains 48 general purpose I/Os and a high-efficiency hard
memory controller capable of supporting many different memory types, each with
different performance capabilities. The hard memory controller is also capable of
being bypassed and replaced by a soft controller implemented
user logic. The I/Os each have a hardened double data rate (DDR) read/write path
(PHY) capable of performing key memory interface functionality such as:
FIFO buffering to lower latency and improve margin
The timing calibration is aided by the inclusion of
hard microcontrollers based on Intel’s Nios® II technology, specifically tailored to
control the calibration of multiple memory interfaces. This calibration allows the
Stratix® 10 device to compensate for any
changes in process, voltage, or temperature either within the
Stratix® 10 device itself, or within the external
memory device. The advanced calibration algorithms ensure maximum bandwidth and
robust timing margin across all operating conditions.
Table 9. External Memory Interface PerformanceThe listed speeds are for the 1-rank case.
(maximum rate possible)
In addition to parallel memory interfaces,
Stratix® 10 devices support serial memory
technologies such as the Hybrid Memory Cube (HMC). The HMC is supported by the
Stratix® 10 high-speed serial transceivers, which
connect up to four HMC links, with each link running at data rates of 15 Gbps (HMC
short reach specification).
devices also feature general purpose I/Os capable of supporting a wide range of
single-ended and differential I/O interfaces. LVDS rates up to 1.6 Gbps are
supported, with each pair of pins having both a differential driver and a
differential input buffer. This enables configurable direction for each LVDS
1.13. Adaptive Logic Module (ALM)
devices use a similar adaptive logic module (ALM) as the previous generation
Arria® 10 and
Stratix® V FPGAs, allowing for efficient implementation of logic functions
and easy conversion of IP between the devices.
The ALM block diagram shown in the following figure has eight inputs
with a fracturable look-up table (LUT), two dedicated embedded adders, and four
register count with 4 registers per 8-input fracturable LUT, operating in
conjunction with the new
Stratix® 10 devices to
maximize core performance at very high core logic utilization
7-input logic functions, all 6-input logic functions, and two independent
functions consisting of smaller LUT sizes (such as two independent 4-input LUTs)
to optimize core logic utilization
advantage of the ALM logic structure to deliver the highest
performance, optimal logic utilization, and lowest compile times. The
Quartus® Prime software simplifies design reuse as it
automatically maps legacy designs into the
Stratix® 10 ALM architecture.
1.14. Core Clocking
Core clocking in
Stratix® 10 devices makes use of programmable clock tree
This technique uses dedicated clock tree routing and switching
circuits, and allows the
Quartus® Prime software to
create the exact clock trees required for your design. Clock tree synthesis
minimizes clock tree insertion delay, reduces dynamic power dissipation in the clock
tree and allows greater clocking flexibility in the core while still maintaining
backwards compatibility with legacy global and regional clocking schemes.
The core clock network in
Stratix® 10 devices supports the new
core architecture at clock rates up to 1 GHz. It also supports the hard memory controllers up to
2666 Mbps with a quarter rate transfer
to the core. The core clock network is supported by dedicated clock input pins,
fractional clock synthesis PLLs, and integer I/O PLLs.
1.15. Fractional Synthesis PLLs and I/O PLLs
Stratix® 10 devices have up
to 32 fractional synthesis PLLs (fPLL) available for use with transceivers or in the
The fPLLs are located in the 3D SiP
L-tiles and H-tiles, eight per tile, adjacent to the transceiver
channels. The fPLLs can be used to reduce both the number of oscillators required on
the board and the number of clock pins required, by synthesizing multiple clock
frequencies from a single reference clock source. In addition to synthesizing
reference clock frequencies for the transceiver transmit PLLs, the fPLLs can also be
used directly for transmit clocking. Each fPLL can be independently configured for
conventional integer mode, or enhanced fractional synthesis mode with third-order
In addition to the fPLLs,
Stratix® 10 devices contain up to
integer I/O PLLs (IOPLLs) available for general purpose use in the core fabric and
for simplifying the design of external memory interfaces and high-speed LVDS
interfaces. The IOPLLs are located in each bank of 48 general purpose I/O, 1 per I/O
bank, adjacent to the hard memory controllers and LVDS SerDes in each I/O bank. This
makes it easier to close timing because the IOPLLs are tightly coupled with the I/Os
that need to use them. The IOPLLs can be used for general purpose applications in
the core such as clock network delay compensation and zero-delay clock buffering.
1.16. Internal Embedded Memory
Stratix® 10 devices
contain two types of embedded memory blocks: M20K (20 Kb) and MLAB (640 bit).
The M20K and MLAB blocks are familiar block sizes
carried over from previous Intel device families. The MLAB blocks are ideal for wide
and shallow memories, while the M20K blocks are intended to support larger memory
configurations and include hard ECC. Both M20K and MLAB embedded memory blocks can
be configured as a single-port or dual-port RAM, FIFO, ROM, or shift register. These
memory blocks are highly flexible and support a number of memory configurations as
shown in Table 10.
DSP blocks are based upon the Variable Precision DSP Architecture used in Intel’s previous generation devices. They feature hard fixed point and IEEE
compliant floating point capability.
The DSP blocks can be configured to support signal processing with
precision ranging from 18x19 up to 54x54. A pipeline register has been added to
increase the maximum operating frequency of the DSP block and reduce power
Figure 12. DSP Block: Standard Precision Fixed Point Mode
Figure 13. DSP Block: High Precision Fixed Point Mode
Figure 14. DSP Block: Single Precision Floating Point Mode
Each DSP block can be independently configured at compile time as
either dual 18x19 or a single 27x27 multiply accumulate. With a dedicated 64
cascade bus, multiple variable precision DSP blocks can be cascaded to implement
even higher precision DSP functions efficiently.
In floating point mode, each DSP block provides one single precision
floating point multiplier and adder. Floating point additions, multiplications,
mult-adds and mult-accumulates are supported.
The following table shows how different precisions are accommodated
within a DSP block, or by utilizing multiple blocks.
1 Variable Precision DSP Block with external
Fixed point FFTs
2 Variable Precision DSP Blocks with external
Very high precision fixed point
4 Variable Precision DSP Blocks with external
Double Precision floating point
Single Precision floating point
1 Single Precision floating point adder, 1 Single
Precision floating point multiplier
Complex multiplication is very common in DSP algorithms. One of the
most popular applications of complex multipliers is the FFT algorithm. This
algorithm has the characteristic of increasing precision requirements on only one
side of the multiplier. The Variable Precision DSP block supports the FFT algorithm
with proportional increase in DSP resources as the precision grows.
Table 12. Complex Multiplication With Variable Precision DSP Block
Complex Multiplier Size
DSP Block Resources
2 Variable Precision DSP Blocks
Resource optimized FFT
4 Variable Precision DSP Blocks
Highest precision FFT
For FFT applications with high dynamic range requirements, the
Intel FFT IP Core offers an option of
single precision floating point implementation with resource usage and performance
similar to high precision fixed point implementations.
Other features of the DSP block include:
Hard 18 bit and 25 bit
Hard floating point multipliers and adders
64 bit dual accumulator
(for separate I, Q product accumulations)
Cascaded output adder
chains for 18 and 27 bit FIR filters
registers for 18 and 27 bit coefficients
Inferability using HDL
templates supplied by the
Quartus® Prime software
for most modes
The Variable Precision DSP block is ideal to support
the growing trend towards higher bit precision in high performance DSP applications.
At the same time, it can efficiently support the many existing 18 bit DSP applications, such as high definition video
processing and remote radio heads. With the Variable Precision DSP block
architecture and hard floating point multipliers and adders,
Stratix® 10 devices can efficiently support many
different precision levels up to and including floating point implementations. This
flexibility can result in increased system performance, reduced power consumption,
and reduce architecture constraints on system algorithm designers.
1.18. Hard Processor System (HPS)
Stratix® 10 SoC Hard Processor
System (HPS) is Intel’s third generation
HPS. Leveraging the performance of Intel14 nm tri-gate technology,
Stratix® 10 SoC devices more than double the performance of
previous generation SoCs with an integrated quad-core 64-bit
-A53. The HPS also enables system-wide
hardware virtualization capabilities by adding a system memory management unit.
These architecture improvements ensure that
SoCs meet the requirements of current and future embedded markets, including
wireless and wireline communications, datacenter acceleration, and numerous military
Figure 15. HPS Block Diagram
1.18.1. Key Features of the Intel Stratix 10 HPS
Table 13. Key Features of the
Stratix® 10 GX/SX HPS
-A53 MPCore processor unit
2.3 MIPS/MHz instruction efficiency
CPU frequency up to 1.5 GHz
At 1.5 GHz total performance of 13,800
Runs 64 bit and 32 bit
bit and 32
bit Thumb instructions for 30% reduction
in memory footprint
Jazelle* RCT execution architecture with 8 bit Java byte codes
Superscalar, variable length, out-of-order
pipeline with dynamic branch prediction
Neon* media processing engine
Single- and double-precision floating-point
CoreSight* debug and trace technology
System Memory Management Unit
Enables a unified memory model and extends
hardware virtualization into peripherals implemented in the
Cache Coherency unit
Changes in shared data stored in cache are
propagated throughout the system providing bi-directional
coherency for co-processing elements.
32 KB of instruction cache w/
32 KB of L1 data cache w /ECC
8-way set associative
SEU Protection with parity on TAG
ram and ECC on data RAM
Cache lockdown support
256 KB of scratch on-chip RAM
External SDRAM and Flash Memory Interfaces for
Hard memory controller with
support for DDR4, DDR3
bit + 8
bit ECC) with select packages
72 bit (64
bit + 8
Support for up to 2666 Mbps DDR4
and 2166 Mbps DDR3 frequencies
Error correction code (ECC) support
including calculation, error correction, write-back
correction, and error counters
Software Configurable Priority
Scheduling on individual SDRAM bursts
Fully programmable timing parameter
support for all JEDEC-specified timing parameters
Multiport front-end (MPFE)
scheduler interface to the hard memory controller,
which supports the AXI® Quality of Service (QoS) for interface
to the FPGA fabric
NAND flash controller
Integrated descriptor based with
Programmable hardware ECC support
bit Flash devices
Secure Digital SD/SDIO/MMC controller
Integrated descriptor based DMA
CE-ATA digital commands supported
50 MHz operating frequency
Direct memory access (DMA) controller
Supports up to 32 peripheral
Communication Interface Controllers
Three 10/100/1000 Ethernet media access controls (MAC) with
Supports RGMII and RMII external
Option to support other PHY
interfaces through FPGA logic
RMII (requires MII to RMII
RGMII (requires GMII to RGMII
SGMII (requires GMII to SGMII
Supports IEEE 1588-2002 and IEEE
1588-2008 standards for precision networked clock
Supports IEEE 802.1Q VLAN tag
detection for reception frames
Supports Ethernet AVB standard
Two USB On-the-Go (OTG) controllers with DMA
Dual-Role Device (device and host
High-speed (480 Mbps)
Full-speed (12 Mbps)
Low-speed (1.5 Mbps)
Supports USB 1.1 (full-speed and
Support for external ULPI PHY
Up to 16 bidirectional endpoints,
including control endpoint
Up to 16 host channels
Supports generic root hub
Configurable to OTG 1.3 and OTG 2.0
Five I2C controllers (three can
be used by EMAC for MIO to external PHY)
Support both 100
Kbps and 400
Support both 7
bit and 10
bit addressing modes
Support Master and Slave operating
Two UART 16550 compatible
Programmable baud rate up
Four serial peripheral
interfaces (SPI) (2
Full and Half duplex
Timers and I/O
4 general-purpose timers
4 watchdog timers
HPS direct I/O allow HPS peripherals to connect directly to
Up to three IO48 banks may be assigned to
HPS for HPS DDR access
Interconnect to Logic Core
Allows IP bus masters in the FPGA
fabric to access to HPS bus slaves
bit AMBA AXI interface
Allows HPS bus masters to access
bus slaves in FPGA fabric
AMBA AXI interface allows high-bandwidth HPS master
transactions to FPGA fabric
HPS-to-SDM and SDM-to-HPS Bridges
Allows the HPS to reach the SDM
block and the SDM to bootstrap the HPS
Light Weight HPS-to-FPGA Bridge
Light weight 32
bit AXI interface suitable for
low-latency register accesses from HPS to soft
peripherals in FPGA fabric
FPGA-to-HPS SDRAM Bridge
Up to three AMBA AXI
interfaces supporting 32, 64, or 128
bit data paths
1.19. Power Management
Stratix® 10 devices use
the advanced Intel14 nm tri-gate process technology, the all
core architecture to enable
Hyper-Folding, power gating, and several optional power reduction techniques to
reduce total power consumption by as much as 70% compared to previous generation
Stratix® V devices.
Stratix® 10 standard power devices (-V)
are SmartVID devices. The core voltage supplies (VCC and VCCP) for each SmartVID
device must be driven by a PMBus voltage regulator dedicated to that
Stratix® 10 device. Use of a PMBus voltage regulator for each
SmartVID (-V) device is mandatory; it is not an option. A code is programmed into
each SmartVID device during manufacturing that allows the PMBus voltage regulator to
operate at the optimum core voltage to meet the device performance specifications.
With the new
core architecture, designs can run 2X
faster than previous generation FPGAs. With 2X performance and same required
throughput, architects can cut the data path width in half to save power. This
optimization is called Hyper-Folding. Additionally, power gating reduces static
power of unused resources in the FPGA by powering them down. The
Quartus® Prime software automatically powers down specific
unused resource blocks such as DSP and M20K blocks, at configuration time.
The optional power reduction techniques in
Stratix® 10 devices include:
Available Low Static Power
Stratix® 10 devices are
available with a fixed core voltage that provides lower static power than the
SmartVID standard power devices, while maintaining device performance
devices feature Intel’s
power transceivers and include a number of hard IP blocks that not only reduce logic
resources but also deliver substantial power savings compared to soft
implementations. In general, hard IP blocks consume up to 50% less power than the
equivalent soft logic implementations.
1.20. Device Configuration and Secure Device Manager (SDM)
devices contain a Secure Device Manager (SDM), which is a dedicated triple-redundant processor
that serves as the point of entry into the device for all JTAG and configuration commands. The
SDM also bootstraps the HPS in SoC devices ensuring that the HPS can boot using the same
security features that the FPGA devices have.
Figure 16. SDM Block Diagram
Stratix® 10 devices are divided into logical sectors, each of which is managed
by a local sector manager (LSM). The SDM passes configuration data to each of the LSMs across
the on-chip configuration network. This allows the sectors to be configured independently, one
at a time, or in parallel. This approach achieves simplified sector configuration and
reconfiguration, as well as reduced overall configuration time due to the inherent
parallelism. The same sector-based approach is used to respond to single-event upsets and
While the sectors provide a logical separation for device configuration and
reconfiguration, they overlay the normal rows and columns of FPGA logic and routing. This
means there is no impact to the
Quartus® Prime software place
and route, and no impact to the timing of logic signals that cross the sector boundaries.
The SDM enables robust, secure, fully-authenticated device configuration. It
also allows for customization of the configuration scheme, which can enhance device security.
For configuration and reconfiguration, this approach offers a variety of advantages:
Dedicated secure configuration manager
Reduced device configuration time, because sectors are configured in
Updateable configuration process
Reconfiguration of one or more sectors independent of all other
Zeroization of individual sectors or the complete device
The SDM also provides additional
capabilities such as register state readback and writeback to support ASIC prototyping and
1.21. Device Security
Building on top of the robust security features present in the
previous generation devices,
Stratix® 10 FPGAs and SoCs
include a number of new and innovative security enhancements. These features are also managed
by the SDM, tightly coupling device configuration and reconfiguration with encryption,
authentication, key storage and anti-tamper services.
Security services provided by the SDM include:
Hard encryption and authentication acceleration; AES-256, SHA-256/384,
Volatile and non-volatile encryption key storage and management
Boot code authentication for the HPS
Physically Unclonable Function (PUF) service
Updateable configuration process
Secure device maintenance and upgrade functions
Side channel attack protection
Scripted response to sensor inputs and security attacks, including
selective sector zeroization
Readback, JTAG and test mode disable
Enhanced response to single-event upsets (SEU)
Black key provisioning
Stratix® 10 Device Security User Guide for a complete list
of all security features.
The SDM and associated security services provide a robust,
multi-layered security solution for your
12 Contact My Intel Support
for additional information.
1.22. Configuration via Protocol Using PCI Express
Configuration via protocol using
allows the FPGA to be configured across the
bus, simplifying the board layout and increasing system integration. Making use of
hard IP operating in autonomous mode before the FPGA is configured, this technique
bus to be powered up and active within the 100 ms time allowed by the
Stratix® 10 devices also
support partial reconfiguration across the
bus which reduces system down time by keeping the
link active while the device is being reconfigured.
1.23. Partial and Dynamic Reconfiguration
Partial reconfiguration allows you to reconfigure part of the
FPGA while other sections continue running.
This capability is required in systems where uptime is
critical, because it allows you to make updates or adjust functionality without
In addition to lowering power and cost, partial reconfiguration also
increases the effective logic density by removing the necessity to place in the FPGA
those functions that do not operate simultaneously. Instead, these functions can be
stored in external memory and loaded as needed. This reduces the size of the
required FPGA by allowing multiple applications on a single FPGA, saving board space
and reducing power. The partial reconfiguration process is built on top of the
proven incremental compile design flow in the
Quartus® Prime design software
Dynamic reconfiguration in
Stratix® 10 devices allows transceiver data rates, protocols and
analog settings to be changed dynamically on a channel-by-channel basis while
maintaining data transfer on adjacent transceiver channels. Dynamic reconfiguration
is ideal for applications that require on-the-fly multiprotocol or multi-rate
support. Both the PMA and PCS blocks within the transceiver can be reconfigured
using this technique. Dynamic reconfiguration of the transceivers can be used in
conjunction with partial reconfiguration of the FPGA to enable partial
reconfiguration of both core and transceivers simultaneously.
1.24. Fast Forward Compile
The innovative Fast Forward Compile feature in the
Quartus® Prime software identifies performance bottlenecks in your
design and provides detailed, step-by-step performance improvement recommendations that you
can then implement. The Compiler reports estimates of the maximum operating frequency that can
be achieved by applying the recommendations. As part of the new Hyper-Aware design flow, Fast
Forward Compile maximizes the performance of your
Stratix® 10 design and achieves rapid timing closure.
Previously, this type of optimization required multiple
time-consuming design iterations, including full design re-compilation to determine the
effectiveness of the changes. Fast Forward Compile enables you to make better decisions about
where to focus your optimization efforts, and how to increase your design performance and
throughput. This technique removes much of the guesswork of performance exploration, resulting
in fewer design iterations and as much as 2X core performance gains for
Stratix® 10 designs.
1.25. Single Event Upset (SEU) Error Detection and Correction
FPGAs and SoCs offer robust SEU error detection and correction circuitry. The
detection and correction circuitry includes protection for Configuration RAM (CRAM)
programming bits and user memories. The CRAM is protected by a continuously running
parity checker circuit with integrated ECC that automatically corrects one or two
bit errors and detects higher order multibit errors.
The physical layout of the CRAM array is optimized to make the majority of
multi-bit upsets appear as independent single-bit or double-bit errors which are
automatically corrected by the integrated CRAM ECC circuitry. In addition to the
CRAM protection, user memories also include integrated ECC circuitry and are layout
optimized for error detection and correction.
The SEU error detection and correction hardware is supported by both
soft IP and the
Quartus® Prime software to provide a
complete SEU mitigation solution. The components of the complete solution
Hard error detection and correction for CRAM and
user M20K memory blocks
Optimized physical layout of memory cells to minimize
probability of SEU
Sensitivity processing soft IP that reports if CRAM upset
affects a used or unused bit
Fault injection soft IP with the
Quartus® Prime software support that changes state of CRAM bits for
Hierarchy tagging in the
Quartus® Prime software
Triple Mode Redundancy (TMR) used for the Secure Device Manager
and critical on-chip state machines
In addition to the SEU mitigation features listed
above, the Intel14 nm tri-gate process technology used for
Stratix® 10 devices is based on FinFET transistors which have reduced SEU
susceptibility versus conventional planar transistors.
1.26. Document Revision History for the Intel Stratix 10 GX/SX Device Overview
Made the following change:
In the "
Stratix® 10 GX/SX FPGA and SoC Family Package Plan" table in
Stratix® 10 FPGA and SoC Family Plan, added:
For more information about vertical migration and the common circuit board footprint, see Vertical Device Migration in
Stratix® 10 Device Design Guidelines.
Made the following change:
Added black key provisioning (-BK) devices. See the "Sample Ordering Code" figure in Available Options.
Made the following change:
Added the GX 10M variant.
Made the following changes:
Added advanced security (-AS) devices.
Added level shifter details for the
Stratix® 10 SX/GX 400 device.
Made the following changes:
Added composition details for the leaded and lead-free contact device options.
Updated the I/O PLL counts.
Made the following changes:
Changed the number of included logic elements globally.
Removed logic density 450, logic density 550, and package code 48 from the "Sample Ordering Code and Available Options for Intel Stratix 10 Devices" figure.
Updated description of the higher density in the "Innovations in Intel Stratix 10 FPGAs and SoCs" section.
Updated description of the general purpose I/Os in the "Intel Stratix 10 FPGA and SoC Common Device Features" table.
Removed support for LPDDR3 globally.
Updated the "Intel Stratix 10 FPGA and SoC Architecture Block Diagram" figure.
Updated the "Intel Stratix 10 GX/SX FPGA and SoC Family Plan-FPGA Core (part 1)" table.
Updated the "Intel Stratix 10 GX/SX FPGA and SoC Family Plan-Interconnects, PLLs and Hard IP (part 2)" table.
Updated and merged the "Intel Stratix 10 GX/SX FPGA and SoC Family Package Plan" tables.
Made the following changes:
Changed the specs for QDRII+ and QDRII+ Xtreme and added specs for QDRIV in the "External Memory Interface Performance" table.
Updated description of the power options in the "Sample Ordering Code and Available Options for
Stratix® 10 Devices" figure.
Changed the description of the technology and power management features in the "
Stratix® 10 FPGA and SoC Common Device Features" table.
Changed the description of SmartVID in the "Power Management" section.
Changed the direction arrow from the coefficient registers block in the "DSP Block: High Precision Fixed Point Mode" figure.
Made the following changes:
Removed the embedded eSRAM feature globally.
Removed the Low Power (VID) and Military operating temperature options, and package code 53 from the "Sample Ordering Code and Available Options for Stratix 10 Devices" figure.
Changed the Maximum transceiver data rate (chip-to-chip) specification for L-Tile devices in the "Key Features of
Stratix® 10 Devices Compared to Stratix V Devices" table.
Made the following changes:
Changed the number of available transceivers to 96, globally.
Changed the single-precision floating point performance to 10 TFLOP, globally.
Changed the maximum datarate to 28.3 Gbps, globally.
Changed some of the features listed in the "Stratix 10 GX/SX Device Overview" section.
Changed descriptions for the GX and SX devices in the "Stratix 10 Family Variants" section.
Changed the "Sample Ordering Code and Available Options for Stratix 10 Devices" figure.
Changed the features listed in the "Key Features of Stratix 10 Devices Compared to Stratix V Devices" table.
Changed the descriptions of the following areas of the "Stratix 10 FPGA and SoC Common Device Features" table:
Transceiver hard IP
Internal memory blocks
Core clock networks
Reorganized and updated all tables in the "Stratix 10 FPGA and SoC Family Plan" section.
Removed the "Migration Between Arria 10 FPGAs and Stratix 10 FPGAs" section.
Removed footnotes from the "Transceiver PCS Features" table.
Changed the HMC description in the "External Memory and General Purpose I/O" section.
Changed the number of fPLLs in the "Fractional Synthesis PLLs and I/O PLLs" section.
Clarified HMC data width support in the "Key Features of the Stratix 10 HPS" table.
Changed the description in the "Internal Embedded Memory" section.
Changed the datarate for the Standard PCS and SDI PCS features in the "Transceiver PCS Features" table.
Added a note to the "PCI Express Gen1/Gen2/Gen3 Hard IP" section.
Updated the "Key Features of the Stratix 10 HPS" table.
Changed the description for the Cache coherency unit in the "Key Features of the Stratix 10 HPS" table.
Changed the description for the external SDRAM and Flash memory interfaces for HPS in the "Key Features of the Stratix 10 HPS" table.