Texas Tech University

Start the New Year Off with These Data Management Resolutions

The year 2016 is finally upon us and with it the University Libraries' Data Management Team is highlighting some New Year's resolutions that you should consider when writing your data management plan. These resolutions will expand upon common pitfalls and also illustrate helpful tips on avoiding them and enhancing existing plans. So sit back, relax, and enjoy these New Year's Resolutions for Data Management Plans.

Our first resolution is to become better at file naming and labeling. This may seem trivial in the grand scheme of things, but being able to decipher which .pdf or .txt file refers to a specific data set is vital. File names should be easy to understand but also have all identifying information required. If the file comes with a pre-generated name that doesn't make much sense, consider placing them all in a folder which is labeled "program output files" or something similar. Ideally, for general files, the subject, date, and creator (if applicable) separated by underscores or hyphens is a desired format (i.e. biodiversity_narrative_smith_2016). In addition to the file names themselves, one has to be mindful of how files will look to an outsider when they are organized in a fashion which makes sense to the creator. Spreadsheets are typically the gravest offender when considering how labeling within a file fails. If the fields for data results are jargon, acronyms without explanation, random numbers, or machine code then the ability to read it becomes much more difficult. A group of geologists may know the terminologies of the 100 fields in which they have labeled on their spreadsheets, but an assistant professor researching the outcomes of the project may not. Consider creating a legend on a separate sheet, relabeling another file for ease of access, or creating an instruction text file accompanying the difficult file(s). For federally funded grants, the information that is made available to the public should be properly labeled and named, so that those that want to access the information can also decipher it. True preservation of your work is the ability for someone else to be able to read, understand, and replicate it decades later without having to come back to the source.

The second resolution is to identify all formats and types of files that you may use during the award period of the grant. It's easy to state, "We will save all documentation associated with this project." However, most data management plans require specific file formats to ensure that preservation on an accessible level is maintained. If the project creates very specific file formats, list those in the plan and how you will keep track of them. Also, be aware of persistent file formats, those that are not dependent on programs to access and read. The most common formats are .txt, .csv, .xml, .pdf, .mp3, and .tiff. Maintaining important documents in these formats will ensure that the ability to access your data won't be prevented by the lack of availability of a specific program or software. While .docx may be what most of your documentation is written with, the conversion between different software could mess up formatting and compromise how they are interpreted. Make sure you identify what you will be converting to the preservation file formats. It's recommended that everything generated be considered, but if it's impossible for certain program outputs then make sure to define why that is the case.

Our third resolution for data management plans is to embrace metadata. Metadata, when defined in most scientific or descriptive environments, is considered to be "data about data." It's not meant to be a complex system that one has to develop, but it can be as simple as placing a text file in your main project file that explains all the relevant information about what you have put together. Describing the creator(s) of the project, the subject(s), a description of the file structure, the file formats involved, title of the project, and perhaps even how a third party would go about deciphering it all. Ideally, a metadata file would be created for each important part of the project, especially if specialized software was used in order to identify the version or very specific calibrations were used during development and testing. For the more advanced user, there are metadata standards available for nearly all disciplines. Dublin Core is a great one for social science and general description. Some of the more specialized standards, like Darwin Core for biodiversity, fulfill specific needs of the discipline coupled with all-purpose description.

As 2016 arrives with new funding opportunities, deadlines, and restrictions the Data Management Team hopes that these resolutions will be helpful when you sit down to write your plan. If you have any questions about the process or would like to consult with us directly please feel free to email us at libraries.datamanagement@ttu.edu.