REU: Projects Overview
Each project will begin with an in−depth study of the scientific concepts that underlie each project. Students will also have the opportunity to observe the more theoretical activities of their graduate mentors and to see the results of their work in the broader context of the research objectives. Each student will also be required to work with graduate students in publication activities on the results of their work.
Big Data Design, Access, and Analysis Issues for Cybersecurity Applications
Monitoring for cyberattacks is an example of a contemporary software application that requires analysis of large amounts of data. Most existing cybersecurity software tools are focused on detection of cybersecurity violations after a violation occurs and are incapable of handling large amounts of data. This research is focused on the investigation of big data technology for the development of more advanced tools for cybersecurity monitoring. The long term objective of the research is to develop systems that are capable of monitoring and integrating multiple streams of data that may help to prevent or detect attacks as early as possible. Technology such as Hadoop and NoSQL systems are being used to investigate big data storage design alternatives for cybersecurity data and to enhance the integration of cybersecurity data with data mining and analysis tools for early prevention and detection of cyberattacks.
Software Specifications for Cybersecurity
The Descartes specification language research effort is one part of an overall software engineering research program in software requirements, specification, process, and environments. The Descartes specification language effort is based on a solid foundation of research that has resulted in graduate students who completed degrees and the involvement of three REU site project students and two REU supplement students. The research effort will focus on automated software specification generation from expected input and corresponding output that is a direct extension of earlier research on automated test data generation from Descartes specifications. There are extensions to the language for real−time, object−oriented, and intelligent agent software development that will support an effort on executable software specifications as formal methods for information assurance, investigated in the context of the MRI and SFS grants mentioned above for Susan Urban. Dr. J. Urban is the P.I. of the NSF SFS project.
Data Deduplication in High-Performance Computing
Data deduplication has been generally recognized as a critical technique that reduces the data volume to be stored on storage systems and is primarily used for backup store of existing high-performance computing (HPC), Cloud computing, and big data computing systems. The data explosion of scientific applications, however, poses a significant challenge for I/O (input/output) subsystem and primary storage of existing HPC systems. The movement of huge volume of data has been recognized as the bottleneck of computing. The goal of this project is to investigate innovative data deduplication methods to reduce the data movement for I/O operations with limited overheads. We expect to design an I/O deduplication framework based on existing work to support read/write operations and investigate the overhead. The primary summer work is to study and run open source single node deduplication file systems, including lessfs and Opendedup SDFS, analyze their designs, and evaluate their performance.
Fast Data Analysis for Big Data Applications
Many scientific computing and high-performance computing applications have become increasingly data intensive. Recent studies have started to utilize indexing, subsetting, and data reorganization to manage the increasingly large datasets. In this project, we intend to build an innovative fast data analysis framework for scientific big data applications. We have recently studied a Fast Analysis with Statistical Metadata (FASM) approach via data subsetting and integrating a small amount of statistics into the original datasets. The added statistical information illustrates the data shape and provides knowledge of the data distribution; therefore the original scientific libraries can utilize these statistical metadata to perform fast queries and analyses. We expect to further study subsetting, indexing, segmented analysis, and pre-analysis for reducing data movements and to enable fast data analysis for scientific big data applications. These concepts and ideas can potentially lead to new data analytics methodologies and can have an impact on scientific discovery productivity.
Data Acquisition and Analysis for Precision Agriculture
Precision agriculture is the intelligent use of farm inputs, such as water, fertilizer, and pesticide, to promote environmental sustainability and simultaneously improve crop yield. One of the most important components of precision agriculture is the use of imagery for analyzing and managing the inter- and intra-farm variability in a manner that improves crop yield while preserving precious natural resources. Much like the healthcare and defense industries, the sources of the imagery are quite varied and continue to expand. These include multi- and hyper-spectral aerial and satellite imagery, as well as vehicle mounted sensors that record various characteristics of the crop, soil, and air. Given the large spatial scale of farms and the long temporal scale of crop development, the analysis of the data gathered from the various sources is impossible for humans to undertake and thus, must be automated. In relative terms, this component of precision agriculture is in its infancy, with many daunting scientific and technological challenges still remaining. Given the vast cotton fields in West Texas and the importance of precision agriculture to the community and to TTU researchers, exciting opportunities exist to conduct applied research in the general areas of image processing, computer vision, and machine learning.