Get Started With the
SQL SELECT DISTINCT Statement

When working with databases, you often encounter large datasets filled with duplicate entries that need filtering. If you need to extract only the unique records, SQL provides a simple and effective solution: the DISTINCT keyword in a SELECT statement. Let's explore how to use this method efficiently in various scenarios.

What is the SQL SELECT DISTINCT statement?

The DISTINCT keyword in the SQL SELECT statement removes duplicate records, returning unique values. It is beneficial when you need to analyze distinct data, such as product categories, job titles, salary rates, or statuses, where unique entries are required.

Typically, this filtering occurs early in the data processing workflow like data cleaning, integrity checks, and preparing data for more detailed analysis.

General syntax and parameters

The basic syntax of the query is as follows:

SELECT DISTINCT column1, column2…, columnN 
FROM table;

Parameter descriptions

  • column1, column2… are the names of columns from which we retrieve data from and filter with the DISTINCT clause.
  • table is the name of the table from which we retrieve records.

Let us have a closer look at the work of this statement (we are going to use the popular AdventureWorks2022 test database for SQL Server and dbForge Studio for SQL Server to illustrate our test cases).

The following query retrieves the information about all employees in the organization, along with their job titles. It does not apply any filters, returning 290 entries.

SELECT * 
FROM HumanResources.Employee;
SELECT simple

To see the list of job positions currently held by employees (for example, to consider removing specific titles or adding new ones), query the same table using the SQL SELECT DISTINCT statement and specify the column to retrieve data.

SELECT DISTINCT JobTitle  
FROM HumanResources.Employee;

This output includes only the list of unique job titles held by the employees, 67 records.

SELECT DISTINCT simple

Let us now consider the most common use cases for SQL SELECT DISTINCT.

Using SELECT DISTINCT in different contexts

In its simplest form, SELECT DISTINCT returns a list of unique values, which can then be further processed. Among the most common use cases, it is worth mentioning sorting the results delivered by the query and using SELECT DISTINCT with the COUNT() aggregate function.

SQL SELECT DISTINCT with ORDER BY

Using SELECT DISTINCT with the ORDER BY clause helps organize and sort your results, making them more structured and easier to read. The syntax of such a query is:

SELECT DISTINCT column1, column2, ... 
FROM table_name 
ORDER BY column_name [ASC|DESC];

SELECT DISTINCT ensures that the query returns unique rows based on the specified columns, while ORDER BY sorts the result set by one or more columns, either in ascending (ASC) or descending (DESC) order.

Note: The ORDER BY clause should refer to columns specified in the SELECT DISTINCT list.

For example, if we need to retrieve a list of all cities where the organization's customers are located, we can use SELECT DISTINCT to avoid duplicates.

SELECT DISTINCT a.City 
FROM Sales.SalesOrderHeader soh 
JOIN Person.Address a 
    ON soh.BillToAddressID = a.AddressID 
ORDER BY a.City;
SELECT DISTINCT ORDER BY

SQL SELECT DISTINCT with COUNT()

Another frequent scenario of using SELECT DISTINCT is combining it with the COUNT() aggregate function to identify and count unique values across various record groups. For example, it can count students who achieved certain grades, determine the number of purchases from a specific area, tally employees working particular shifts, etc.

In the case below, we want to count the number of employees in each job position.

SELECT DISTINCT JobTitle, COUNT(*) AS NumberOfEmployees 
FROM HumanResources.Employee 
GROUP BY JobTitle 
ORDER BY NumberOfEmployees DESC;
SQL SELECT DISTINCT with COUNT()

In this query, SELECT DISTINCT ensures that we only retrieve unique job titles, the COUNT function returns the number of employees for each job title, and GROUP BY groups the results by job title to get counts for each position.

Handling NULL values

When using SELECT DISTINCT, NULL values are treated as equal. Therefore, if a column contains multiple NULL values, SELECT DISTINCT returns only one NULL, along with the distinct non-NULL values from the column.

To illustrate this use case, we check for values in the MiddleName column, which often contains NULL values. First, we use the simple SELECT query to see if there are multiple NULL values in a table:

SELECT MiddleName 
FROM Person.Person;
SELECT ALL NULL

Now, we apply SELECT DISTINCT to retrieve the list of unique middle names:

SELECT DISTINCT MiddleName 
FROM Person.Person;

As the MiddleName column contains NULL and non-NULL values, the SELECT DISTINCT query will return only one instance of NULL along with all distinct non-NULL middle names.

SELECT DISTINCT NULL

Comparing SELECT DISTINCT with GROUP BY

SELECT DISTINCT and GROUP BY allow us to retrieve unique results, but they function differently and serve distinct purposes:

  • SELECT DISTINCT only provides unique values without any aggregation and is applicable only when you are interested in filtering out duplicates.
  • GROUP BY groups rows that have the same values in specified columns. This clause often works with aggregate functions, allowing users to summarize or aggregate data.

To obtain specific information about pay rates in the company, run the SELECT DISTINCT query that will bring distinct pay rates without aggregation.

SELECT DISTINCT 
   Rate 
FROM HumanResources.EmployeePayHistory                             
ORDER BY Rate;
SELECT DISTINCT UNIQUE RATES

The query with GROUP BY allows us to determine the number of employees associated with each pay rate. GROUP BY groups the rows by Rate, and the COUNT() function calculates the number of rows (employees) sharing the same pay rate.

SELECT Rate, COUNT(BusinessEntityID) AS EmployeeCount 
FROM HumanResources.EmployeePayHistory 
GROUP BY Rate 
ORDER BY Rate;
GROUP BY rates

To remove duplicates, use SELECT DISTINCT. To organize and aggregate data, use GROUP BY. Both are helpful when you need unique values and summary information, like the number of employees per rate.

The performance of GROUP BY and DISTINCT depends on the data structure and indexing of each specific case. Generally, GROUP BY tends to be more efficient when used with aggregates, while SELECT DISTINCT may perform faster if you only need unique rows.

Try it yourself with dbForge Studio

To illustrate this tutorial, we employed dbForge Studio for SQL Server, a multi-functional IDE for SQL Server databases with a smart graphical user interface covering all tasks in database development, management, and administration. dbForge Studio for SQL Server is a more powerful alternative to SSMS that includes additional tools not present in SSMS and enhances the functionality of the features shared by both solutions.

Watch the SSMS vs. dbForge Studio for SQL Server - Features Comparison video on the Devart YouTube channel for a more visual comparison of the two solutions.

In particular, we used the T-SQL Editor present in dbForge Studio to create and execute scripts against the database. The Editor offers a broad range of features for fast and easy T-SQL coding, such as context-aware code suggestions, code analysis, debugging, formatting, a library of code snippets, and many other options that allow the users to accelerate their productivity and improve the output quality at least twice.

Besides the T-SQL Editor, dbForge Studio for SQL Server equips the users with many other tools and options, such as:

  • Database schema/data comparison and synchronization
  • Source Control integration
  • Test data generation with flexible settings
  • Data migration with all popular data formats supported
  • User management
  • Server session monitoring
  • Automated database documenting
  • Task automation through the command line

You can try dbForge Studio for SQL Server in your workflow. Devart provides a 30-day fully functional free trial, so you can test the Studio's capabilities under a full workload. Download and install the software to see how it copes with your daily tasks.

Further learning

The following guides can be helpful when working with SQL queries: