top of page

Excel Data Cleaning Techniques

Updated: Oct 15, 2023

Data cleaning is an essential step when working with datasets in Excel.

Here are some of the techniques to assist you.


Removing Duplicate Rows

Select the desired range or table.

Navigate to: Data > Remove Duplicates.


Finding and Replacing Text

Use shortcut: Ctrl + H.

For instance, with Find: apple and Replace with: orange, it'll replace each occurrence of "apple" with "orange".


Changing the Case of Text

To convert to uppercase: =UPPER("apple") → Result: APPLE.

To convert to lowercase: =LOWER("APPLE") → Result: apple.

For proper case: =PROPER("APPLE tree") → Result: Apple Tree.


Removing Spaces and Non-Printing Characters

To remove unnecessary spaces: =TRIM(" apple ") → Result: apple.

To remove non-printing characters, if A1 contains such: =CLEAN(A1).


Fixing Numbers and Number Signs

Convert text to number: If A1 has "123", then =VALUE(A1) or =A1*1 will yield 123.

To get the absolute value: If A1 has -5, =ABS(A1) will yield 5.


Fixing Dates and Times

Convert text to date: If A1 contains the text "January 1, 2023", =DATEVALUE(A1) will transform it into a date serial.

Convert text to time: If A1 contains the text "12:30", =TIMEVALUE(A1) turns it into a time serial.


Spell Checking

Navigate to: Review > Spelling.

Example: If a cell has "excell", it might suggest "excel".


Merging Text in Columns

Merging: If A1 contains "John" and B1 contains "Doe", then =A1 & " " & B1

or

=CONCATENATE(A1, " ", B1) will yield "John Doe".


Splitting

If A1 has "John Doe", using the menu option: Data > Text to Columns with space as a delimiter will yield two separate columns: "John" and "Doe".



Transforming and Rearranging Columns and Rows

Moving

Simply drag column headers or row numbers.


Transpose

If A1:A3 holds values "apple", "banana", and "cherry", copying them and using 'Paste Special' > 'Transpose' will display them horizontally.


Flash Fill

Flash Fill automatically fills in values based on a pattern it detects from your input.

For example, if you have a column of names in the format "First Last" and you start typing in the adjacent column just the first names, Excel might detect the pattern and auto-suggest to fill the rest of the column with first names.

To trigger Flash Fill, start typing patterns and when Excel auto-suggests, press Enter or you can also manually initiate it via Data > Flash Fill.


Reconciling Table Data by Joining or Matching VLOOKUP

Given Sheet1!A1 contains "apple" and Sheet2 has a table with "apple" in A1 and "fruit" in B1, the formula =VLOOKUP(Sheet1!A1, Sheet2!A1:B10, 2, FALSE) will return "fruit".


XLOOKUP (in newer Excel versions)

Using the same example, =XLOOKUP(Sheet1!A1, Sheet2!A1:A10, Sheet2!B1:B10) will return "fruit".



How converting a range to a table helps clean data

Converting a range of cells to a table in Excel offers various advantages that can assist in the data cleaning process.


Let’s firstly look at how to convert a range to a table format:


1. Select the Range

Click on a cell within the range of data you want to convert to a table. If your data is contiguous (i.e., no blank rows or columns within the data set), Excel will automatically detect the entire range when you start the process to create a table.


2. Convert to Table

With the range selected or with an active cell inside the desired range, go to the Insert tab on the Ribbon.

Click on the Table button. Alternatively, you can use the shortcut Ctrl + T.


3. Specify Table Settings

A "Create Table" dialog box will appear.

Ensure the "Create Table" window that pops up has the correct range listed.

If your data has headers (like column titles), make sure the checkbox "My table has headers" is ticked. If not, Excel will automatically provide headers, and you can rename them later. Click OK.


4. Adjust Table Design (Optional)

Once your data is converted into a table, you'll notice a new Table Tools Design tab appears on the Ribbon.

From here, you can rename your table, choose from different table styles for formatting, and adjust other table-specific settings.


5. Naming the Table (Optional, but Recommended)

It's a good practice to give your table a meaningful name, especially if you'll be creating multiple tables or using formulas that reference the table.


Under the Table Tools Design tab on the Ribbon, you'll see a field for "Table Name" on the left side. Simply type in a new name there. Remember, table names cannot contain spaces and must be unique within the workbook.


Your range is now converted to a table, and you can start leveraging the advantages that tables offer in Excel!


How tables can help clean data

Structured References

Tables use structured references, which means formulas that reference table columns by name rather than cell address. This makes your formulas more readable and reduces the chance of errors when rows are added or deleted.


Auto Expansion

When you add new data below a table, the table automatically expands to include this data. This ensures that any calculations or charts based on the table will automatically include the new data.


Automatic Filtering

Excel automatically adds drop-down arrows to the column headers of a table. This feature allows you to sort and filter data quickly, helping you to spot anomalies or inconsistencies.


Consistent Formatting

Tables apply consistent formatting. When you add or remove rows or columns, the table's formatting adjusts automatically, making it easier to visually scan for errors.

Also, every other row gets a shaded colour by default (zebra striping), which can make large sets of data easier to read.


Calculated Columns

If you create a formula in one cell of a table column, Excel will automatically fill down that formula to the remaining cells in that column. This ensures that the entire column uses the same formula and prevents inconsistencies. This feature also automatically applies when new rows are added.


Total Row

Excel offers an option to add a total row at the bottom of the table. This row can contain sum, average, count, etc., for each column, allowing for quick assessments of data integrity.

Data Validation Consistency

If you apply data validation to a table column (like a drop-down list), the validation rule is consistently applied to all cells in that column. This helps prevent invalid entries.


Better Integration with Power Query

If you're using Power Query for data transformation and cleaning, tables work better than ranges. Any changes made in Power Query will be applied directly to the table upon refreshing, ensuring data consistency.


Easier to Define Named Ranges

Tables and their columns can easily be named, making them more identifiable in formulas, data validation rules, VBA code, and more.


Reduced Errors in Dynamic Data

If you use external data sources that might increase or decrease the number of rows, converting it to a table ensures that the data range updates dynamically, reducing the possibility of missing or extra data.

Improved Data Connectivity

If you're connecting Excel to external databases or importing data, tables are often better recognized and work more seamlessly with connections than standard ranges.

By converting ranges to tables, you not only facilitate the process of cleaning data but also improve the overall management, analysis, and presentation of your data in Excel.


Remember, while the examples given here are basic, real-world applications might require combining multiple functions and features to efficiently clean data in Excel.

72 views1 comment

Recent Posts

See All
bottom of page