Why is data management so important?
It’s important because data is everywhere. We get data from social media accounts; we have data from clinical trials, finance, and many other places. But what can we do with just raw data? Nothing.
We need data in usable formats to answer specific question or tell a story, please read my post on data storytelling for more information.
Not only correct format is needed to understand what the data is telling us, but mismanaging data can also lead us to wrong answers with serious consequences.
One recent example of the consequences of data mismanagement comes from studies conducted by the Duke Cancer Center around mid-2000s. The Duke Cancer Center developed gene-based tests to improve detection and treatment for cancer. However, due to mismanagement of their data, researchers developed genetic tests based on the wrong traits of cancerous tumors.
Now, I don’t mean to scare you away from conducting your data analysis or deter you from trusting research, but to convey the importance of data management. As researchers, we spend about 90% of our time doing data cleaning and management.
So, when do we start working on data management?
The planning should begin before the study. There are many ways to managing data, and it all depends on your study. There is no one size fits all in data management. But there are two important steps you can take to make sure you are on the right path.
Just like any major project, you need to involve the right expertise in your group.
Here are some potential key players you may want to include in your team:
Now that you have the key players in place, you can start your data management plan. You can take several approaches to developing a successful plan. I like to use the “public health model” to design my data management plans. The public health model has three stages described as prevention, detection, and treatment.
At the prevention stage, you want to make sure the forms or surveys you design for data collection are accurate and easy to read. You will need to determine whether you want to collect data on paper or electronically. If you are going with electronic data collection, you must troubleshoot your database before becoming live. The earlier you can anticipate problems and have a plan in place, the better your data collection process will go.
At the detection stage, you would want to detect any errors at the data collection point and fix those errors. Often times, we will do data audits as we are collecting data for quality control. It is really important to ensure that the data collected is valid.
At the treatment stage, you have already finished with data collection. What you can do is limited at this stage; however, it is still important to correct any errors possible. During this stage, we assess missing values, and if possible, assign information to missing values through different statistical techniques. We also declutter the data and take necessary information to answer our question.
To see examples of data management plans please visit this data management portal.
So, next time you are thinking of conducting a research study, be sure to involve the right players and have a solid data management plan in place.