Dr. Yong Chen Awarded $1M NSF Grant to Develop a New Data Model to Combat Data Movement Complications: Compute on Data Path Methodology
By Amanda Miller and Dr. Yong Chen
Perhaps one of the most exciting themes of twenty-first century digital technology is "Big Data." GPS programs on our phones, smart grids used to monitor electricity usage in our homes, and digital health records in our hospitals are a few of many examples of creating data in amounts we cannot sort, store, and analyze as we have done in the past. This proliferation of data has led to great strides in scientific discovery, especially utilizing computing simulations and data analysis to build models, simulate, observe, and discover. Areas including, but not limited to, computational biology and chemistry, high-energy physics, climate sciences, information retrieval, and data mining have adopted these methods with much success in research. Simulations are also commonly used for meteorology predictions and virtual training of pilots and surgeons. However, there is a desire for finer and more accurate models and a need for improved processing of the significantly large data sets that these models would create.
Previously, it was common to move data to computing sources in order to sort through and analyze different sets of information. Movement of data remained fairly simple as long as data volume was relatively small. Unfortunately, there emerges a critical problem with larger sets of data. This movement has proven to be the dominant bottleneck for computational models and scientific discovery, precluding the development of finer, more accurate models. In response to this increasingly critical problem, Dr. Yong Chen, an assistant professor of computer science at Texas Tech, is conducting research to explore a new paradigm in data-processing methods consisting of conducting data synthesis and computation simultaneously along the data path, or a "compute on data path concept" to combat data movement complications. Contrary to data-movement techniques of the past, Chen will explore the effects of moving computations to the "location" of data.
Chen's research will consist of developing an entirely new data model to facilitate the movement of computation to data instead. This model's development will include programming and code writing to solve simulation problems of the past. Chen will also focus on storage of data, working to develop a data storage system for this particular computation method. This process will ultimately attempt to minimize data movement and therefore reduce the time, cost, resources, and energy necessary for scientific discovery made via elaborate computation models. The result could be greater productivity than ever before in the scientific world.
With success, this research will revolutionize computing methods. Instead of computation-reliant research of the past, Chen's work could lead to a future of predominantly "data-centric" scientific discovery. This research can also have a huge impact on future supercomputers and the way they are developed in order to yield better productivity for scientific discovery and innovation. Chen's research will contribute greatly to the strength of the Department of Computer Science at Texas Tech and its command in the realm of Big Data. Lastly, through this research Chen hopes to enhance the computing curriculum by integrating research findings in student learning and the classroom.
This three-year project is supported by the National Science Foundation (NSF), through the Software and Hardware Foundation (SHF) core program in the Directorate for Computer & Information Science & Engineering (CISE). Backed by a $1M grant, Chen plans to work with graduate students and to collaborate with University of Houston and Northwestern University throughout the duration of this project, offering the research additional drive and expertise, as well as an opportunity for participating students to interact with top researchers in this field from other institutions.
SHF: Medium: Compute on Data Path: Combating Data Movement in High Performance Computing