Texas Tech University, Department of Computer Science
TTU Home Whitacre College of Engineering Computer Science Research
Research Experiences for Undergraduates (REU) Past Participation Mose Gumble

Mose Gumble


Multi-Order Data Deduplication with GPGPUs for Data-Intensive Computing


Data deduplication has been generally recognized as a critical technique that reduces the data volume to be transferred over the interconnection and to be stored on storage devices in data-intensive high-performance computing, Cloud computing, and Big Data computing. Current data deduplication solutions, however, suffer costly byte-by-byte comparisons in the cases of hash collisions. In this research, we propose a Multi-Order Data Deduplication with GPGPUs method (MODD in short) to address the costly comparison issue in conventional data deduplication solutions. The idea is to reduce hash collisions by leveraging multiple fingerprinting algorithms and thus reduces the need and the cost of byte-by-byte comparisons. The MODD also leverages GPGPUs to compute highly concurrent and massively parallel multiple hashing algorithms to avoid the delay of a multi-order deduplication. We have performed investigations to identify the desired property of complementing hashing algorithms and evaluated the hash collision reductions and byte-by-byte comparison savings. We are also in the process of prototyping and evaluating the leverage of GPGPUs to speed up the hash computations. The proposed MODD method considerably reduces the costly byte-by-byte comparisons in data deduplication and further trades computation capability to data access capability (further reduced data movement), similar as the idea of the original data deduplication technique. It holds a promise and can be widely applicable to high-performance computing, Cloud computing, and Big Data computing.