AN 831: Intel® FPGA SDK for OpenCL™: Host Pipelined Multithread

ID 683013
Date 11/20/2017
Public

1.4.3. Fine Tuning the Framework Design

In this section, you learn how to optimize threads and fine tune your framework design, to have an efficient system.

As discussed earlier in the Pipelining Framework for High Throughput Design topic, to have an efficient system, it is important to design the pipeline with a similar throughput for all steps of the data compression algorithm.

  • Deflate and CRC thread is executed on FPGA with a throughput higher than 3 GB/s. Therefore, ensure that all other threads are faster than this step to have the system throughput similar to the device throughput.
  • Metadata thread is optimized by benefiting from multithreading and processing on larger chunks of data. This step also achieves throughput higher than 3 GB/s.
  • Huffman calculation thread that generates the Huffman code must calculate the frequency table based on each input file. This is a time consuming task, especially for very large files (in the order of N). To accelerate this process, use optimizations discussed in Optimization Techniques for CPU Tasks.

For more details about how optimization techniques are applied to Huffman frequency table calculation, refer to Huffman Code Generation.