Cleaning your data with Google Sheets
Getting your data ready for analysis and data visualization
In this article, we’re focusing on how you can use the built-in features within Google Sheets to clean up data and prepare it for analysis and data visualization. The main purpose of this process is to ensure that the data is valid, consistent, complete, and uniform. Let’s explore the Google Sheets data cleanup features — Cleanup Suggestions, Remove Duplicates, and Trim Whitespace.
While we will touch on why cleaning data is an important step, this is not meant to be an overview of the subject. For that, we’d recommend checking out the sources below from Tableau, Toward Data Science, and ThoughtSpot.
The Ultimate Guide to Data Cleaning | by Omar Elgabry | Towards Data Science
What is Data Cleaning? 3 Examples of How to Clean Data (thoughtspot.com)
These insightful pieces are a great primer on the topic.
Cleanup Suggestions — What is it and what does it do?
Often, when you receive or download a dataset, it’s not in the format you need it to be in. For example, information may be in the incorrect column, missing values, spelling mistakes, inconsistencies in how items are named/labeled, different formats for dates and numbers, information irrelevant to your analysis, etc.
While it may be easy to spot these issues when you’re working with a relatively small dataset, handling a large dataset is a whole different matter. Luckily, the Cleanup Suggestions feature in Google Sheets can help with this.
What is Cleanup Suggestions?
Think of Cleanup Suggestions like your personal data detective. When you import your data, the Cleanup Suggestions feature goes to work, spotting common mistakes for you. It can help you tidy up stuff like getting rid of extra spaces, kicking out those pesky duplicates, giving numbers a neat format, pointing out weird stuff, and even sorting out data that's all over the place. Here are some examples of Cleanup Suggestions:
Unused spaces: If you have extra spaces before a word, Google will suggest removing them.
Inconsistent formatting: If you have a list of prices but some cells are missing the dollar sign, Google will suggest adding it.
Duplicate rows or columns: If you have unintentionally duplicated a row or column, Google Sheets will identify and offer to remove the duplicates
The first image below is a sample of the Cleanup Suggestions that has detected a duplicate entry. The second image shows a summary table of entries in column G “IMF economy classification” using the “Review Column Stats” feature where you can quickly see emerging is misspelled.
Where to find Cleanup Suggestions
In Google Sheets go to Data ➡️ Data Cleanup ➡️ Cleanup Suggestions. This will reveal Cleanup Suggestion in the sidebar. When the sidebar is open, click on each column to view detected suggestions. Note that when you scroll over a suggestion, the related cells are highlighted in your spreadsheet. You can easily distinguish between errors/anomalies which appear in red and recommendations which appear in green.
If you imported data into a sheet and suggestions are detected, a Data Cleanup notification will appear on the bottom right. Click See All, to see what Google Sheets recommends.
Remove duplicates
If you know your data has duplicate entries, then you may want to go straight to using remove duplicates. Select the cells you want to check for duplicates and then go to Data ➡️ Data Cleanup ➡️ Remove Duplicates. If your data has headers, then you’ll want to select that in the popup. Next, confirm the columns you want to review and select Remove Duplicates. You’ll get a message that will let you know how many duplicates were found and removed, and the number of unique rows that remain within your dataset.
Trim Whitespace
Similar to “Remove Duplicates” if you know your data contains whitespaces, you can jump straight to using Trim Whitespace. Select the range of cells you’d like to remove leading, trailing or extra spaces and then to go to Data ➡️ Data Cleanup ➡️ Trim Whitespace. You’ll get a message letting you know the number of cells that had white space removed. Note that non-break spaces such as   will not be removed.