I must say though that it’s wise to backup the data before you delete anything from your dataset. There are multiple ways to do this, but the easiest is to simply sort the data by the said variable, and delete the rest of the cases that are not of interest to your analysis. Let’s suppose that we are only interested in analyzing data for those who completed tertiary education While we are talking about irrelevant data, let’s look at how we may remove irrelevant cases I left the index variable deliberately because we may need it to uniquely identify each case in the data set While the control key is still pressed, go ahead and click all the other columns Let’s select from to are several was you can do that.īut the easiest and most consistent way is to click the first variable which in this case is press the control key on the keyboard and hold it We can also delete multiple variables at the same time Lets do the same with the end time variable as well You will also notice that w have more of these for example the id, uuid, submission time, validation status and so onĭealing with irrelevant variables is simple. I have here this data from a fictitious survey I conducted using a data collection platform called KoBo Toolbox.Īlready you can see that we have variables that were automatically generated by the platform for example the start and end time of the interview here. So this could be either irrelevant variables or irrelevant cases. Starting with data quality issue number 1: Irrelevant dataĭata can be irrelevant if it is not of interest to the analysis you are trying to do. That’s what I am going to be showing you in this video
With that definition, you should have an idea already about what is involved in data cleaning. Well, data cleaning is the process of preparing data for analysis by removing or modifying data that is incorrect, incomplete, irrelevant, duplicated, or improperly formatted. In this video, I will point you to data quality issues you need to look out for, and how you can fix them using SPSS. This is why data cleaning is an extremely important step in data analytics.ĭata quality issues are very common in data that has been collected through surveys, or imported from other formats for example databases or Microsoft Excel worksheets. And no one wants to make decisions based on trashy data. If you use garbage data, you get garbage results. Well, that’s the truth about working with data.
You probably have heard the term garbage in garbage out.