AN 831: Intel® FPGA SDK for OpenCL™: Host Pipelined Multithread

ID 683013
Date 11/20/2017
Public

1.4.2. Pipelining Framework Details

This section provides a detailed explanation of the data compression tasks in the pipelined framework architecture.
As illustrated in the following figure, several threads were generated to process the data compression algorithm. The main thread handles the input files and enqueues them into the input queue of the first task or thread, and also starts the rest of the threads. There are three more threads that follow the same provider-consumer model and communicate through specific queues, which are protected by lock and have efficient push and pop functions for any access, as explained in Pipelining Framework for High Throughput Design:
Figure 7. Data Compression Tasks in the Pipelined Framework Architecture
  • Huffman calculation thread calculates the frequency table of the input file and generates the Huffman code. This code must be passed to the Deflate and CRC thread.
  • Deflate and CRC thread starts executing Deflate and CRC kernels on the device and feeds them with the required input, including the input file and Huffman code. Simultaneously, this thread reads back all the generated outputs from the device. Communication between the host and device on this thread is done in two different threads using the host channel streaming feature. One thread streams input file and Huffman code into the device. Second thread reads back the generated compressed file and the CRC value. The received results are passed to the next thread through the connecting queue.
  • Metadata thread is executed on the CPU. This thread reads the compressed data by popping them from the input queue and processing them to support RFC 1951 standard. Metadata thread generates the final compressed file.

Although execution of each of these tasks depends on the data generated by the previous thread, they can still run concurrently on different input files as illustrated in the following figure.

Figure 8. Multiple Input File Process in the Designed Pipelining Framework

Each thread pops an input job or item from the input queue, processes and generates data required for the next thread, and pushes the output into its output queue. Therefore, while Huffman thread is calculating the Huffman code for the third input file, Deflate and CRC thread is processing the second input file, and Metadata thread is preparing the output results of the first input file simultaneously.