SQL - Order of Execution of a Query


12 Oct 2024  Amey Kolhe  4 mins read.

Each query begins with finding the data we need in a database and then filtering that data down into something that can be processed and understood as quickly as possible. Because each part of the query is executed sequentially, it’s essential to understand the execution order to know how the result set is computed.


The Query Structure


SELECT DISTINCT column, AGG_FUNC(column_or_expression), 
FROM mytable
    JOIN another_table
ON mytable.column = another_table.column
WHERE constraint_expression
GROUP BY column
HAVING constraint_expression
ORDER BY column ASC / DESC
    LIMIT count
OFFSET COUNT;

Order of Execution


1. FROM and JOINs


The FROM clause and subsequent JOINs are first executed to determine the total working set of data being queried. This includes subqueries in the FROM clause and can cause temporary tables to be created under the hood containing all the columns and rows of the tables being joined.

2. WHERE


Once we have the total working data set, the first-pass WHERE constraints are applied to the individual rows, and rows that do not satisfy the constraint are discarded. Each constraint can only access columns directly from the tables requested in the FROM clause. Aliases in the SELECT part of the query are not accessible in most databases since they may include expressions dependent on parts of the query that have not yet been executed.

3. GROUP BY


After the WHERE constraints are applied, the remaining rows are then grouped based on common values in the column specified in the GROUP BY clause. As a result of the grouping, there will only be as many rows as there are unique values in that column. Implicitly, you should only use this when your query has aggregate functions.

4. HAVING


If the query has a GROUP BY clause, then the constraints in the HAVING clause are applied to the grouped rows, discarding the grouped rows that don’t satisfy the constraint. Like the WHERE clause, aliases are also inaccessible from this step in most databases.

5. SELECT


Any expressions in the SELECT part of the query are finally computed.

6. DISTINCT


Of the remaining rows, rows with duplicate values in the column marked as DISTINCT will be discarded.

7. ORDER BY


If an order is specified by the ORDER BY clause, the rows are sorted by the specified data in ascending or descending order. Since all the expressions in the SELECT part of the query have been computed, you can reference aliases in this clause.

8. LIMIT / OFFSET


Finally, the rows that fall outside the range specified by the LIMIT and OFFSET are discarded, leaving the final set of rows to be returned from the query.


Writing Efficient SQL Queries


  • The first thing you must know while writing the SQL queries is the correct order of SQL query execution.
    • Since many people think SQL processes queries from top to bottom as they have written.
    • But SQL processes queries in the order: FROM, JOIN, WHERE, GROUP BY, HAVING, SELECT, DISTINCT, ORDER BY, and finally, LIMIT/OFFSET.
  • One of the common mistakes is using aliases defined in the SELECT clause within the WHERE clause.
    • Because SQL processes the WHERE clause before the SELECT clause.
  • Use the HAVING clause if you need to filter your query based on the result of an aggregate function.
  • While joining multiple tables, start with the smallest table or the table that allows you to filter out the most data early on.

Conclusion


Not every query needs to have all the parts listed above, but a part of why SQL is so flexible is that it allows developers and data analysts to quickly manipulate data without having to write additional code, all just by using the above clauses.


Sql