Fusion Active Storage for Write-intensive Big Data Applications
Many high-performance computing applications have become highly data intensive due to the substantial increase of both simulation data generated from scientific computing models and instrument data collected from increasingly large-scale sensors and instruments. These applications transfer large amounts of data between compute nodes and storage nodes, which is a costly and bandwidth consuming process. The data movement often dominates the applications' run time. In this research, we propose a Fusion Active Storage System (FASS) to address the data movement bottleneck issue for write-intensive big data applications. The idea of the proposed FASS is to identify write-intensive operations, offload them, and carry out these operations on storage nodes. The FASS enables a paradigm that moves write-intensive computations to storage nodes, generates and writes data in place to storage devices. It moves computations to data and avoids the data movement bottleneck on the data path. We have performed theoretical modeling to study the potential of the FASS. We are in the process of carrying out prototyping experimental tests to evaluate the bandwidth savings and performance improvements of the FASS compared to traditional solutions. The proposed FASS has a significant advantage of minimizing data movements for write-intensive big data applications and can have a profound impact for high-performance computing, a strategic tool for scientific discovery and innovations.