Texas Tech University

How to Write the Data Management Plan

Data management became a part of the landscape of national grant agencies beginning with the National Science Foundation in January of 2011. Since then it has spread as a requirement to nearly every funding source, with varied importance emphasized across disciplines. For example, the Department of Energy requires an Information Management Planning, the National Institutes of Health requests a Resource Sharing Plan, NASA has a Data and Information Policy which "promotes the full and open sharing of all data...", and the National Oceanic and Atmospheric Administration (NOAA) emphasizes the need of "information collected under federal sponsorship be submitted to and archived by designated national data centers." The humanities are not exempt from these data sharing standards either, as the National Endowment for the Humanities: Office of Digital Humanities now requires data management for their new grant program. All of these institutions, and more that aren't listed, are expecting researchers and primary investigators to be able to fulfill all the requirements in the writing of a data management plan. The trend of them being an influence on the acceptance of grants is on the rise. The rest of this article will summarize what a data management proposal should include and give a brief outline as to what primary investigators should be aware of as they begin their proposals.

All data management plans are no more than two pages in length. This may seems inadequate for proposals which span dozens of pages, but it's easier to consolidate than you might think. A plan should begin with a short summary of the project and what it's hoping to accomplish. This needs to be limited to one or two paragraphs, get the overall objective across effectively, but don't go into the tiniest detail. The next section of paragraphs should detail what types of data is collected during the entire lifecycle of the proposal, what format it will be in, and any applicable standards which need to be applied to it. Data collection is unique to every proposal, and it's recommended that the primary investigators identify what this will include before writing this section. Anything from analytical programs, specimens, data sets, simulation results, surveys, project summaries, lab reports, etc. should be considered in this section. The format of all of this data is also important, as it may influence how you present the data years after the award period. Are they in unique program formats such as .docx, .xslx, .PlotXML or are they in more universal formats such as .txt, .csv, and .xml? Metadata standards may also apply to the data collected to be used as a description for your data. It could be a simple readme file which describes all aspects of the collected folders, it could be a unique description per file, or it could be a specific metadata schema designed for a discipline, i.e. Dublin Core for social science and Darwin Core for biological diversity.

Data storage and preservation is the next section of consideration. Where will you store the data during and after the award period? It is no longer adequate or acceptable to say, "my computer" when talking about the storage of proposal data. While not in a repository, data should keep to a rule of three geographically diverse places to prevent total loss of information due to a catastrophic event. Cloud services such as OneDrive, Amazon, Google Drive, and DropBox are all well situated to take small amounts of data and store them in secure locations. However, should a larger repository need to be found for larger data sets, be sure to recognize that data management costs should be included in a budget. Data storage after the award period is usually recommended to be taken by a NSF/NIH/DoE, etc. approved repository for that specific information. Unique requirements may be requested before submission to these repositories, such as removing all identifying information from personal surveys. Preservation of the data is not just about where you store it, storage is relatively easy. It speaks more to the ability for other researchers years down the road to be able to interpret what you did and replicate it without having to find you for an explanation. Labeling a spreadsheet better, file naming that makes sense, including descriptive metadata that translates your collected statistics into meaningful results all make a difference when others look at your work.

Finally, responsibilities should be considered at the end of the award cycle. Who will be taking care of this data for the required retention period of the granting agency? For the NSF, retention is a standard three years after the award period. This may mean paying for a repository for three years out of the grant budget. Many repositories will manage the data after it is deposited, be sure to identify all PIs and auxiliary personnel who may be responsible for the data prior to the repository handoff.

Following these simple guidelines and checking your granting agency policies will go a long way in developing effective data management plans. Should you require any further assistance, the TTU Libraries Data Management Team is here to help with resources and individual consultations, contact libraries.datamanagement@ttu.edu.