Confessions and Enlightenment of a Reluctant Student of StatisticsBy Marianne Evola
For the next couple of issues of Scholarly Messenger, I plan to spend a bit of time discussing responsible conduct issues regarding research data management and application of statistics. These are challenging issues to address because readers will markedly differ with regard to the amount of training that they have received on how to manage data and how to determine which statistical tools are appropriate for a particular research design. However, it is because researchers markedly differ with regard to training that these issues have become critical in the discussion of responsible research conduct. These brief discussions will by no means provide strong training in data management or statistics. That would be an impossible task. However, I am hoping to raise reader awareness that data management systems and statistical analyses need to evolve with a research program. To do so, researchers need to commit time and effort toward designing and maintaining data management systems as well as incorporating the study of statistical methods and available software to ensure that they are utilizing the most appropriate stats for their evolving research program.
At a recent meeting of the Association for Practical & Professional Ethics, I sat in on a presentation by Kathy Partin, a professor of biomedical sciences and director of research integrity at Colorado State University. Dr. Partin asserted that RCR Education often overlooked critical issues related to data management and appropriate application of statistical analyses. She asserted that it was a common misconception that most researchers were well trained in data management and statistics. Furthermore, because of their limited knowledge base, many mentors lack the education or skills to effectively train the next generation of researchers. The presentation concerned many of the RCR educators in the room for a couple reasons. Some educators had no training in statistics or data management themselves, and thus did not feel qualified to provide the appropriate instruction. Others, such as me, had a good deal of experience managing data and a lot of training in statistics from courses and lab work. However, our concern was how one can teach years of coursework and career experience in just a few hours of seminar training on responsible research. As such, my challenge with this series of articles is addressing these complex issues in two to three pages of text when there are libraries of published works on data management and statistics. I will begin this series by addressing the importance of researchers incorporating the study of statistics as a critical component of career development.
Once again, I find myself in an amusing position of confession as I address responsible use of statistics because I have never been an enthusiastic student of statistics. Rather, I have always been a reluctant student of statistics. By reluctant student, I do not mean that I believe that inappropriate use of statistics is acceptable or that one should limit their research, so that they can utilize familiar statistics. Rather, I have always approached the study of statistics as a quest to understand a necessary tool, so that I could appropriately analyze data and understand the results produced by that tool. For clarification purposes, this is also how I would describe my interest in computers, as tools to accomplish specific tasks. In contrast, I would guess that statisticians or computer programmers find the study of statistics and computers inherently interesting and rewarding. However, I will admit, that after forcing myself to study and understand a particular statistic (or Excel formula), finding the appropriate tool provides me with a feeling of accomplishment and momentary joy.
Interestingly, even as a reluctant student, through degree requirements and a bit of random “luck” I’ve had a considerable amount of training and mentoring in statistics. I earned my doctorate through a psychology department, and psychology has a long history of extensive stats training, relative to many other disciplines. In retrospect, I cannot say that my stats coursework was difficult, but because I hated studying for my stats exams, I did not really learn statistics until I started struggling to analyze my own data sets. Luckily, I also had a mentor who encouraged her graduate students to develop independent research questions and experiments, and as such, we generally also had to investigate how to analyze our data. Actually, my graduate mentor once admitted to me that her interest in stats notably evolved when she had to study a statistical analysis proposed by one of her first grad students. She needed to understand the analysis so that she could provide guidance and instruction on her student’s dissertation as well as ensure that the analysis was appropriate for the research questions. Since that event, she had encouraged students to develop an interest in stats and their application, and to look at data with “new eyes” to assess whether there were better questions to ask of a data set and better ways to analyze data, beyond the existing standards. Finally, she continued to examine the rapidly evolving stats software for better tools to address statistical challenges.
However, the evolution of software and computers has not always solved problems with statistics. In fact, powerful software with the sometimes limited study of stats has created a whole new problem of stats in research. Specifically, it became easy for researchers to conduct multiple analyses on a single data set with the push of a button. As such, it also became common practice for researchers to run all possible stats, searching for a significant effect and then rationalizing the use of a particular statistic, rather than intellectually applying statistics to a particular research question. This is the equivalent of a statistical fishing expedition for a significant result. As such, the results drive the question, rather than intellectual questions driving research design and analyses. When I was an undergraduate research assistant, computers were rapidly being integrated into research settings. As such, productive researchers were assessing how to utilize computers to maximize production. Besides data entry, the typical job assigned to undergrad assistants, I was required to run analyses, print out the results and submit the printed results to the graduate students. It sounded impressive to me at the time. However, in retrospect, all I was really doing was utilizing statistical software (SPSS) to run all possible stats on a data set. I merely typed Run Stats = All, and then printed. There was no instruction on the appropriate application of statistics, nor was there any discussion on what happened to these massive printouts of results, once they were provided to the graduate students. There was a lot of production with little instruction. A couple years later, as I reluctantly attended my graduate stats course, my professor emphatically asserted the widespread misuse of statistics because of evolving computer power. I thought about all of those massive printouts of all possible stats and wondered if the results were appropriately utilized or whether they had fished for any possible significant effect. As an undergraduate assistant, I was not sufficiently knowledgeable to question the instructions given to me, but my ongoing coursework in statistics, albeit reluctant, provided me with greater insight on appropriate and possibly inappropriate use of statistics.
Another vivid memory of poor training in statistics was when an undergraduate assistant came to me as her mentor to ask questions about a group project she was working on for one of her research methods courses. The students had been required to design an experiment, collect data and then analyze the results. The experimental design had to be approved by the graduate teaching assistant (GTA) prior to the data collection. Most groups had designed a simple experiment comparing two treatment groups, which was the model proposed by the GTA. However, my student’s group, as is typical of honors students, expanded their project beyond a simple two-group experimental design and proposed a research project with multiple treatment groups and with multiple time points for data collection. The experimental design had been approved by their GTA, and her group had conducted their experiment and collected their data. However, when it came time to analyze the results, my student realized that the statistical tools that had been provided by the course instruction (Means, standard deviation and T-tests, which are appropriate for the simple two-group comparisons that most groups proposed) did not make sense when analyzing multiple treatment groups. I give the student a great deal of credit for making this realization. However, her group also made this realization on a Friday, and their project was due on Monday, which is why she sought me out for help over the weekend, rather than seeking out her GTA. Regardless, we sat down on a Saturday morning, and she got her first lesson in repeated measures ANOVA. I provided her with a few book chapters to read and access to software to conduct her analyses. She learned a lot and was proud of her accomplishment. However, when the students took the analyses to their GTA, they were told that the ANOVA would not be accepted, and they had to conduct multiple T-Tests to make all possible comparisons.
My student was rightfully upset after all of her hard work, and when she informed me about her conversation with her GTA, I was astonished that the teaching assistant would force the students to use inappropriate analyses on their data. This was poor instruction, but worse, it mentored to students that appropriate selection of analyses was unimportant. To this day, I continue to wonder if the teaching assistant required the students to utilize inappropriate analyses because they did not understand how to appropriately analyze data consisting of more than two treatment groups. And if so, why did the GTA even approve the research project? Or, was the GTA unwilling to study and explore alternate analyses when faced with a unique data set, were they not mentored to consider the appropriate use of statistics. Regardless, it would have been better for the teaching assistant to force the group to simplify their experiment prior to collecting data, rather than instructing (mentoring) students to use inappropriate analyses. Furthermore, it is this very practice that concerns instructors of responsible research, that one would mentor a disregard for investigating and conducting appropriate analyses.
Finally, for clarification due to the variable training of SM readers, I thought that I would explain why the analysis mandated by the GTA was inappropriate, which will hopefully reinforce the need to incorporate the study of statistics throughout a research career. This is slightly technical, so feel free to skim this paragraph if it is beyond your current level of training. Regardless, a T-Test is a statistical test utilized for comparing only two data groups. When an experiment expands beyond two groups, the appropriate analysis is ANOVA. The reason that conducting multiple T-Tests is inappropriate is because the likelihood of finding a false significant effect increases with each additional T-Test. ANOVA corrects for the additional “chance” variability contributed by additional groups of data. Chance variability is variability that cannot be attributed to experimental manipulations. This is one of the reasons that selection of appropriate statistical analyses is critical to the validity of experimental results.
Unfortunately, as experiments grow beyond two groups, some researchers merely choose to conduct multiple T-Tests. Even more unfortunately, there is a historical precedent of conducting multiple T-Tests in many disciplines, and thus, it is interpreted as standard practice. However, there are many historical research practices that have been deemed inappropriate as science has evolved, and utilizing inappropriate statistics needs to be abandoned with many of the other inappropriate practices of our research ancestors.
The research world is facing a crisis of results that cannot be replicated, and inappropriate statistical analyses may be contributing to this problem. Mentors do an exceptional job instructing their students on experimental design, experimental techniques and creativity in their research. However, many mentors fall short with regard to mentoring the incorporation of statistical training in a research program. If a reluctant student of statistics, such as myself, can comprehend the importance of utilizing appropriate statistical methodology and develop a means to study these methods, anyone can figure out a way to incorporate this training in their research environment. For research programs that do not require a great deal of training in statistics, there are departments and resources on the TTU campus that are well-versed in statistics. As such, the mentor does not need to know everything about statistics, but rather, they need to encourage their students to think about how their statistics may become inappropriate as their research project evolves and then push students to seek out instruction and guidance on appropriate statistical methods. Just as students seek out experimental methods and procedures that will address their experimental questions, students need to learn to seek out guidance and instruction on the appropriate statistical tools, once they become aware that their current tool is inefficient. Or as my mentor put it: hire talented people, provide them with guidance and encouragement, and then stay out of the way.
Marianne Evola is senior administrator in the Responsible Research area of the Office of the Vice President for Research. She is a monthly contributor to Scholarly Messenger. Alice Young, associate vice president for research/research integrity, is a contributing author/editor.