What is FPGA and why is it popular in large data streaming?
As it stands for Field Programmable Gate Array, the FPGA chip can be programmed by customer after manufacturing. They contain Programmable logic blocks capable of performing combinational logic.
Since high performance business analytics means operating on very large data sets, traditional warehouse systems struggle to move data in low latency from disk over network. The Netezza appliance exploits the use of FPGA to filter out extraneous data from the source to eliminate moving them out of the disk. The approach frees up CPU, memory and network to process data that is not needed for the query to satisfy the condition hence boosting the performance 10 to 100 times compared to traditional system. The key building block of a Netezza appliance inlcudes:
- Netezza host is a linux SMP server that presents standard tools and configuration to the user. The host is a software layer which compiles SQL queries into executable code snippets, create optimized query plans and submit those snippets to the MPP nodes for execution.
- S-blades - These are high performing blade servers with multi-core CPUs and FPGA. The programmable software called FAST engine reside on FPGA which does the magic. Take a look at the picture on the right, the FAST engine uses direct memory access of compressed data, uncompresses it and passes to project and restrict engines which filter out columns and rows respectively based on the parameters of the SELECT and WHERE clauses of SQL query. The filtered rows are very low percentage of the original data. This data is then given back to memory for processing by CPU cores.
Query results are sent over customized fast network and aggregated by host to present to users.
References:
Hi,
ReplyDeleteNicely written article. Could you throw some more light on how exactly does the FPGA help in improving the performance.