Save Big on Cyber Monday! Up to 40% Off
ends in   {{days}}
Days
{{timeFormat.hours}}
:
{{timeFormat.minutes}}
:
{{timeFormat.seconds}}

MySQL Subqueries Guide:
The Complete Beginner's Handbook

Subqueries in MySQL are very useful when data manipulation gets complicated. MySQL is among the most powerful and widely used relational databases. Still, retrieving the exact data you need can quickly become challenging, especially if it involves multiple tables or calculations.

Instead of combining several queries and complex joins, you can place a MySQL subquery inside your SQL statements. This approach isolates your operations into smaller result sets, so you can build more flexible and targeted queries.

Let's explore the different subquery types, and learn how to use them in your SQL query with many examples and tips. For demonstration, we will use dbForge Studio for MySQL, which provides features for accelerated code editing, building complex queries, and efficient syntax checks.

What is a MySQL subquery?

Simply put, a MySQL subquery is a SELECT query inside another query, placed in parentheses. For example:

SELECT column1, column2
FROM t1
WHERE column1 IN (SELECT column2 FROM t2 WHERE column3 = some_value);

The inner part, (SELECT column2 FROM table2 WHERE column3 = some_value), runs first, and the outer query then uses its output. This can happen in a few ways.

You can use a subquery within a SELECT statement to compute a column value, like an average or sum. If you insert it into a WHERE clause, the subquery returns a list of values, and as such, you can filter data against dynamic criteria or comparisons.

You can also use subqueries in a FROM clause to define a temporary table (called "derived table"). Then, your main query can work with that temporary table just like any other, letting you select from it, join it with other tables, or apply further filtering.

However, subqueries can be as complex as regular SELECT statements, but they're not always the most efficient choice. In general, you can accomplish the same thing with a JOIN, which is usually faster with large datasets. A JOIN merges data directly, whereas a subquery executes a separate query. So, if the logic is clearer as a separate step, or you need to pre-process data, a subquery makes sense. Otherwise, a JOIN is usually the better bet for performance.

Types of subqueries in MySQL

Based on the kind of result they return, there are several main types of subqueries in MySQL: scalar, row, table, and correlated.

  • Scalar subqueries: These are the simplest and ideal for comparisons and filtering. They return a single value (e.g., date, number, string). Here's an example of a scalar subquery in a SELECT clause that shows each order ID along with the date of the most recent order:

SELECT order_id, (SELECT MAX(order_date) 
FROM orders) AS latest_order_date FROM orders; 

  • Row subqueries: These subqueries return a single row, though that row can have multiple columns. They come in handy when you want to compare values. For example, you can get the lowest-priced products in each category with a row subquery such as the following:

SELECT *
FROM products
WHERE (category, price) IN (SELECT category, MIN(price) FROM products GROUP BY category); 

  • Table subqueries: These can return a set of rows and columns (i.e., a temporary table). They're best for aggregating data or performing complex calculations before retrieving the final results. So, if you want to find customers who've placed orders over $1,000, you'd use:

SELECT customer_name
FROM customers
WHERE customer_id IN (SELECT customer_id FROM orders WHERE order_total > 1000); 

  • Correlated subqueries: These use values from the outer query's current row, running once for each row, so you can do row-by-row comparisons. For example, to get sales reps who've closed deals worth more than the average deal size for their region, you'd use this:

SELECT sr.rep_name
FROM sales_reps sr
WHERE sr.deal_value > (SELECT AVG(deal_value) FROM sales WHERE region_id = sr.region_id); 

Practical use of MySQL subqueries across different clauses

Let's overview how you can use the subqueries in the most popular clauses to retrieve the necessary information from your database. For testing purposes, we're going to use the tables with versatile information on orders along with the values we are going to retrieve with our subquery examples.

Use the script below to create and populate two tables: orders and order_items.

CREATE TABLE orders (
    order_id INT AUTO_INCREMENT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    total_cost DECIMAL(10, 2)
);

CREATE TABLE order_items (
    item_id INT AUTO_INCREMENT PRIMARY KEY,
    order_id INT,
    product_id INT,
    quantity INT,
    price_per_unit DECIMAL(10, 2)
);

INSERT INTO orders (customer_id, order_date, total_cost) VALUES
(1, '2023-01-01', 200.00),
(2, '2023-01-02', 150.00),
(3, '2023-01-03', 300.00);

INSERT INTO order_items (order_id, product_id, quantity, price_per_unit) VALUES
(1, 101, 2, 50.00),
(1, 102, 1, 100.00),
(2, 101, 1, 150.00),
(3, 103, 3, 100.00); 

As a result, you'll get orders table that tracks customer orders, including the customer who placed the order, the date of the order, and the total cost. The other table, named order_items, will track individual products that are part of an order.

Now, let's test the MySQL subqueries based on this data and structure.

MySQL subquery in SELECT clause

Let's test the query that will return a list of all orders with their order_id and the total number of items in each order.

SELECT order_id, (SELECT SUM(quantity) 
FROM order_items WHERE order_items.order_id = orders.order_id) AS total_items
FROM orders; 

Here's an output you can expect:

MySQL subquery in SELECT clause

Explanation:

For order_id = 1: Items: 2 (Product 101) + 1 (Product 102) = 3

For order_id = 2: Items: 1 (Product 101)

For order_id = 3: Items: 3 (Product 103)

This allows us to determine how many units of products were purchased in each order.

MySQL subquery in WHERE clause

Now let's retrieve only the orders which were made after the earliest order date made by the customer with customer_id being 1. As we place a subquery in the WHERE clause, we'll be able to calculate the earliest order date of the specified customer (with customer_id = 1), and then use this data to further filter our results.

Here's the query to test on our previous data:

SELECT order_id, total_cost
FROM orders
WHERE order_date > (SELECT MIN(order_date) FROM orders WHERE customer_id = 1); 

As a result, our query will find the earliest (MIN) order date for customer_id = 1. From the orders table, we will see that customer_id = 1 has an order on 2023-01-01.

The subquery will return this date, and then the main query will retrieve all orders where order_date is later than 2023-01-01. As a result, orders made on 2023-01-02 and 2023-01-03 will be included, and Order 1 (from 2023-01-01) is going to be excluded.

Output:

MySQL subquery in WHERE clause

MySQL subquery in FROM clause

Here's an example of how to use a subquery in the FROM clause to create a temporary ai table with the number of items per order (total_items).

After that, we will join our temporary table with the orders table based on order_id, which will ultimately let us select the data on order price and the number of items in it.

Here's the query to test:

SELECT o.order_id, o.total_cost, ai.total_items
FROM orders AS o
JOIN (SELECT order_id, COUNT(*) AS total_items FROM order_items 
GROUP BY order_id) AS ai ON o.order_id = ai.order_id; 

As a result, our subquery will group records in order_items by order_id and count how many items (COUNT(*)) exist in each order.

Within a temporary table (ai) we will get: order_id and total_items.

Finally, our main query uses JOIN to join the orders table (o) with the subquery (ai) on order_id.

The output:

MySQL subquery in FROM clause

MySQL subquery in JOIN clause

Let's now use the same table to analyze the orders that have a cost higher than the average total cost of the orders to pick on the details and check which items were purchased.

Here's a query to test:

SELECT o.order_id, o.total_cost, oi.product_id, oi.quantity
FROM orders AS o
JOIN order_items AS oi
ON o.order_id = oi.order_id
AND o.total_cost > (SELECT AVG(total_cost) FROM orders); 

Here, we have a subquery that calculates the average of the total_cost column from the orders table.

For our table data, it's going to be: AVG(total_cost) = (200.00 + 150.00 + 300.00) / 3 = 216.67

After that, our main query is going to join orders (o) with order_items (oi) where o.order_id = oi.order_id. Thus, we'll be able to filter orders where total cost is greater than 216.67.

The only order_id that qualifies is 3, and it has one item in order_items, so our final output will look like this:

MySQL subquery in JOIN clause

MySQL subquery in GROUP BY clause

Even though MySQL doesn't provide support for subqueries in GROUP BY, we still can use a query that solves our business goal by dynamically determining the grouping column (customer_id or order_date) based on certain conditions and executing a GROUP BY operation accordingly.

Here's the query to test:

SET @groupby_var := (
SELECT CASE
    WHEN (SELECT COUNT(DISTINCT customer_id) FROM orders) > (SELECT COUNT(DISTINCT order_date) FROM orders)
    THEN 'customer_id'
    ELSE 'order_date'
END
);


SET @query := CONCAT('SELECT ', @groupby_var, ', SUM(total_cost) AS total_revenue FROM orders GROUP BY ', @groupby_var);


PREPARE stmt FROM @query;
EXECUTE stmt;
DEALLOCATE PREPARE stmt; 

The first part of this query here assigns a column name (customer_id or order_date) to the variable @groupby_var.

If the number of unique customers (customer_id) is going to be greater than the number of unique order dates (order_date), we're going to group by customer_id. Otherwise, we'll group by order_date.

Column Distinct Count
customer_id 3
order_date 3

Since both counts are equal (3 = 3), the CASE statement chooses order_date (as the default ELSE case). So, @groupby_var = 'order_date' (our second option).

The next part of the query builds dynamically based on our previous data, so we get @groupby_var='order_date'. Thus, it will execute this query:

SELECT order_date, SUM(total_cost) AS total_revenue FROM orders GROUP BY order_date; 

The final part of our query will prepare the dynamically generated SQL statement, run the query, and clean up the prepared statement.

Here's the output:

MySQL subquery in GROUP BY clause
order_date total_revenue
2023-01-01 200.00
2023-01-02 150.00
2023-01-03 300.00

Such an approach lets us analyze the data to determine the key indicators based on clients or order dates while adapting grouping and aggregation to the prevailing volume of unique values. This approach can be highly beneficial for building effective marketing and sales strategies.

MySQL subquery in HAVING clause

Now, let's check how you can use a subquery to group orders by customer_id, counting the number of orders and the average cost of an order per client. HAVING will let us filter only the groups with an average order cost exceeding the average cost of all orders in a subquery.

Here's the query to test:

SELECT customer_id, COUNT(order_id) AS num_orders, AVG(total_cost) AS avg_cost
FROM orders
GROUP BY customer_id
HAVING AVG(total_cost) > (SELECT AVG(total_cost) FROM orders); 

The first part of this query groups the orders based on customer_id, so each customer gets a single row in the result. It also counts the number of orders each customer placed and calculates the average order cost per customer.

If we weren't using the HAVING clause for filtering, we'd get the following results:

customer_id num_orders avg_cost
1 1 200.00
2 1 150.00
3 1 300.00

However, in our query, we have a subquery that lets us filter this data by comparing the average total cost of orders per customer to the average cost from orders.

As our orders table contains three orders: (200.00, 150.00, 300.00), the overall average order cost is: 216.67

The HAVING clause filters out customers whose average order cost is less than or equal to 216.67.

The output of this query:

MySQL subquery in HAVING clause
customer_id num_orders avg_cost
3 1 300.00

MySQL subquery in ORDER BY

This subquery might have been useful when, for example, we must select all orders and arrange them using a descending order based on an average cost for all orders.

SELECT order_id, total_cost
FROM orders
ORDER BY (SELECT AVG(total_cost) FROM orders) DESC; 

Note that since the average cost is the same for every row, the sorting won't actually change the order of the rows.

MySQL subquery in LIMIT

MySQL does not support subqueries directly in the LIMIT clause. However, you can still get similar results using the following query with a workaround:

SET @order_count := (SELECT COUNT(*) FROM orders WHERE total_cost > 160);
SET @query := CONCAT('SELECT order_id, total_cost FROM orders ORDER BY total_cost DESC LIMIT ', @order_count);
PREPARE stmt FROM @query;
EXECUTE stmt;
DEALLOCATE PREPARE stmt; 

Like in the case with GROP BY, this query dynamically determines the LIMIT value based on the number of orders with total_cost > 160.

First, this query calculates the number of orders where total_cost is greater than 160 and stores the result in a user-defined variable @order_count.

Based on the inserted data: orders with total_cost > 160 are:

  • Order 1 → 200.00
  • Order 3 → 300.00

So, @order_count = 2.

Now, we're going to build a dynamic query that actually, with the previously retrieved result, reads as:

SELECT order_id, total_cost FROM orders ORDER BY total_cost DESC LIMIT 2; 

Next, we will execute this query, first preparing the dynamically created SQL statement, then running an actual query, and, lastly, freeing up the prepared statement.

The results are going to be arranged in descending order (DESC):

MySQL subquery in LIMIT
order_id total_cost
3 300.00
1 200.00

MySQL subquery in WITH

WITH (or Common Table Expression, CTE) can be considered a special type of subquery. It provides a way to define a temporary result set that can be used within a single SQL query. It functions similarly to a subquery by helping to structure and simplify complex queries, but with the key advantage that it can be reused multiple times within the same query. This increases the reliability of SQL code and makes it easier to maintain.

Here's a query example to test:

WITH FirstOrder AS (
SELECT customer_id, MIN(order_date) AS FirstOrderDate
    FROM orders
    GROUP BY customer_id
),
LastOrder AS (
    SELECT customer_id, MAX(order_date) AS LastOrderDate
    FROM orders
    GROUP BY customer_id
)
SELECT
    f.customer_id,
    f.FirstOrderDate,
    l.LastOrderDate,
    DATEDIFF(l.LastOrderDate, f.FirstOrderDate) AS DaysOfLoyalty
FROM FirstOrder f
JOIN LastOrder l ON f.customer_id = l.customer_id; 

This query calculates how many days a customer has been ordering by finding the earliest and latest order dates for each customer and then computing the difference to get insights for measuring LTV and customer loyalty.

First, this Common Table Expression (CTE) (FirstOrder) finds the earliest order date (MIN(order_date)) for each customer_id. Next, it finds the latest order date (MAX(order_date)) for each customer_id (LastOrder CTE).

And, finally, we use JOIN FirstOrder (f) and LastOrder (l) on customer_id.

DATEDIFF(l.LastOrderDate, f.FirstOrderDate) calculates the difference (in days) between the first and last order.

The output:

MySQL subquery in WITH
customer_id FirstOrderDate LastOrderDate DaysOfLoyalty
1 2023-01-01 2023-01-01 0
2 2023-01-02 2023-01-02 0
3 2023-01-03 2023-01-03 0

Note that in our case, since each customer has only one order, DaysOfLoyalty = 0. If they had multiple orders, the difference would be showing other values.

Subquery within a subquery

Let's check this approach using the following example:

SELECT
    customer_id,
    order_date,
    total_cost
FROM
    orders o
WHERE
    o.order_id IN (
        SELECT MIN(o2.order_id)
        FROM orders o2
        GROUP BY o2.customer_id
    )
    AND o.total_cost > (
        SELECT AVG(first_orders.total_cost)
        FROM (
            SELECT o3.total_cost
            FROM orders o3
            WHERE o3.order_id IN (
                SELECT MIN(o4.order_id)
                FROM orders o4
                GROUP BY o4.customer_id
            )
        ) AS first_orders
    )
ORDER BY
    o.order_date;  

This query identifies customers whose first order amount exceeds the average first order amount of all customers. This helps pinpoint customers who started their relationship with the company with high purchasing activity, potentially indicating a high future revenue from these customers.

  • First subquery (o2): Finds the order_id of each customer's first order by grouping data by customer_id and selecting the smallest order_id, which represents the first purchase.
  • Second subquery (o3, o4): Calculates the average value of all customers' first orders. The nested subquery (o4) first identifies each customer's first order and then (o3) computes the average cost of these orders.

The output:

Subquery within a subquery
customer_id order_date total_cost
3 2023-01-03 300.00

As you can see, the query calculates the average total cost of these first orders: 216.67

Common mistakes and best practices when using MySQL subqueries

While very useful, subqueries in MySQL can also be tricky. Here are some things to remember:

  • Avoid using subqueries when a join would be more efficient: Joins are faster, particularly if you're dealing with related tables and big sets of data. Subqueries are ideal when you're returning small numbers of rows.
  • Minimize nested subqueries: Nested subqueries make queries difficult to understand and can hurt performance, so try to rewrite them as joins whenever possible.
  • Index strategically: Make sure you have indexes on the columns used in your subquery's WHERE clause. This helps MySQL quickly locate the data.

How dbForge Studio for MySQL can help with subqueries

Simple queries are one thing, but when you have multiple subqueries, joins, and derived tables, you need a solid IDE with the functionality to keep your SQL organized and efficient.

With dbForge Studio for MySQL, you get all the advantages of intelligent SQL coding that suggests how you can complete your queries, provides an option for code snippets, and has your back when you need to validate syntax. This functionality will become your right hand for:

  • Context-aware phrase completion
  • Accurate syntax check
  • Advanced code refactoring

Another tool within dbForge Studio for MySQL that comes in handy is Query Profiler, which will assist you in query optimization. Whenever the queries take too long to execute, and it takes too much time to process multiple rows, you can use Query Profiler to analyze the query execution plan and thus assist you with the optimization.

Finally, MySQL Query Builder lets you skip SQL writing and build queries visually with drag and drop. It automatically generates the script template, and with just a few clicks, you can wrap a query in a subquery and see the impact of your changes in a diagram.

Let's make your queries faster, smarter, and more efficient with dbForge for MySQL!

The best part? You can download dbForge Studio for MySQL and try everything in your environment for 30 days for free.

Conclusion

Subqueries in MySQL are powerful, but nesting them deep or hitting performance walls can be a real headache. dbForge Studio for MySQL takes the pain out of working with subqueries, enabling you to visually create complex queries, debug tricky subquery logic in a snap, and identify performance bottlenecks with ease. Give the free trial a shot and download dbForge Studio for MySQL to streamline your MySQL workflows!

FAQ

dbForge Studio for MySQL

Your ultimate IDE for MySQL development and administration