Biostatistics Indie

Best Practices in Data Cleaning

Professionally, I write software used for data collection in biomedical research projects. My current project extends from data collection into the field of interpretation. Combining these demands requires that I learn something about data cleaning. Thus, this book’s title caught my eye. It is an academic book, not a popular book. I usually write in-depth book reviews on what I read, so I am writing one here.

This book seems like it began as an advanced graduate-level course. Osborne, by profession a statistics professor and university administrator, teaches how to use reasoning to enhance data analysis through statistics. For instance, this textbook covers topics like extreme data points, missing data points, and different variations of data. All in all, 12 subjects are dealt with, each in its own chapter. He concludes with a short chapter containing 12 “best practices” that summarize each of the prior chapters. Each chapter also presents topics for further reflection and/or discussion, depending on the context.

Graduate-level education is meant to foster quibbling, and I did quibble with his discounting the practice of dichotomizing continuous data. Dichotomizing data, particularly in healthcare, has an important role in its distribution. The practice provides quick-and-easy, evidence-based heuristics for healthcare professionals to use in patient care without having to carefully and repeatedly reason through the statistics. This economizes time and enhances the memorability of the data’s message. I would suggest a modification to this principle. Continuous data should be analyzed in its native form first, and dichotomization should only occur to communicate a message consistent with the continuous analysis.

Overall, Osborne’s observations are helpful in dealing with data at a low level, as is consistent with his expertise in statistics. He does not deal with data at a higher level – say, at the level of an epidemiologist. Those interested in the high-level impact of studies might not find his detailed discussions interesting or relevant. Nonetheless, for those who deal at a low level with data – its richness and its proper interpretation – subjects like data cleaning are definitely relevant and interesting. To practitioners of statistical analyses, take it from a fellow practitioner: these suggestions can help you up your research game.

Best Practices in Data Cleaning
By Jason W. Osborne
Copyright (c) 2012, 2019
Independently Published
ISBN13 9781090350435
Page Count: 264
Genre: Statistics
www.amazon.com