SQL is one of the most widely used programming language but yet it remains too much undervalued. Here is some quick SQL query snippets for refreshing the memory.
Optional: Learn about
FOREIGN KEY and
PRIMARY KEY and the different Database Normalization types from this Wikipedia page. Probably you will need only up to 1st to 4th Normal Form of a database.
These basic concepts are useful to understand why we need table joins. But not entirely essential.
Also this note does not talk about creating database schema and modifying it.
Basic SQL Syntax
In the example below, you are querying two columns,
column_name_1, column_name_2 from the
table but we only want the unique values from the second column, so we use
DISTINCT. The conditions to select particular data are added in the
WHERE clause. The two selected columns are then sorted first by ascending order and then by descending order using the
ORDER BY clause. And finally we are only interested in maximum 10 rows.
SELECT column_name_1, DISTINCT(column_name_2) FROM table_name WHERE column_name_1 = "abc" and column_name_2 = "xyz" ORDER BY column_name_1 ASC, column_name_2 DESC LIMIT 10
Another simple clause is CASE WHEN which is useful as a if-else block. Between a
CASE ... END There can be as many
WHEN block and then one
CASE WHEN condition_1 THEN result_1 WHEN condition_2 THEN result_2 ELSE final_value END
Union vs Union All
Union enables stacking two table with same columns one on top of the other.
UNION only keeps distinct rows from two tables, on the other hand
UNION ALL keeps all rows.
Aggregate functions are basically statistical summary functions like SUM(), MIN(), MAX(), AVG() etc. See the section titled “Rank, Window, Lag, Lead” below.
From SQLShack.com “In the SQL GROUP BY clause, we can use a column in the select statement if it is used in Group by clause as well. It does not allow any column in the select clause that is not part of GROUP BY clause”. Definitely check out the link to get a clear idea about these useful clauses and their differences.
Joins are very intuitive but may become quite complex especially when used with nested sub-queries.
There are various types of SQL joins. Cross join simply makes all the combinations of the two tables as shown in the image from SQLShack.com. You should check out their post on Cross join, it is very detailed!
Cross Join can be run by this example where
table_name_2 are cross joined. This is often time very inefficient. So be careful with cross joins and look out for ways to avoid it.
SELECT column_name_1, column_name_2 FROM table_name_1,table_name_2
I find this image below to explain the SQL joins very effective
Carsson Forter writes, “By filtering and aggregating your data before joining, you write the most efficient SQL. Joins are expensive to process so you want the fewest possible rows before joining two tables together… if you have a
JOIN and a
WHERE clause in the same CTE, SQL processes the
So trying to filter and then joining is typically more efficient. That brings us to more advanced topics, including subqueries and CTEs.
Common Table Expressions (CTE)
CTEs gives a way to store some queried data with a name so that it can be used in more complex queries. That way, CTEs help write complex SQL queries in a cleaner way.
CTEs commonly have this syntax, you can have multiple CTE’s. Source here
WITH cte1 as (SELECT * from table WHERE ...), cte2 as (SELECT * from table WHERE ...) select * from cte1 union select * from cte2
This article by Carsson Forter provides a great example of using multiple CTEs to calculate the avg time between two transactions of a customer. This is his example which also used
group by clause to aggregate data by user name.
-- First, find all of user_a's transactions today with user_a_trans as ( SELECT username, time FROM transactions WHERE day = '2017-09-08' AND username = 'user_a'), -- Join each transaction to all transactions occurring after it joined_trans as ( SELECT username, time, future_times FROM user_a_trans a INNER JOIN user_a_trans b ON b.time > a.time), -- Find the immediate next transaction using MIN() next_trans as ( SELECT username, time, MIN(future_times) as next_time FROM joined_trans GROUP BY username, time) -- Average difference of the time and the next transaction's time SELECT AVG(next_time - time) as avg_time_to_next_transaction from next_trans;
Rank, Window, Lag, Lead
The Window clause is useful to set a condition over which data is split.
WINDOW can be at a single location and referred to while calling aggregation functions like
AVG(). This is the link for the code below.
SELECT sum(salary) OVER w, avg(salary) OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);
LAG, see a thorough explanations here. The same logic applies on
For rank functions,
DENSE_RANK(), see this resource.
FIRST_VALUE and LAST_VALUE
A good example of these functions is found on this website. Copying the example query here
USE AdventureWorks2012; GO SELECT JobTitle, LastName, VacationHours, FIRST_VALUE(LastName) OVER ( PARTITION BY JobTitle ORDER BY VacationHours ASC ROWS UNBOUNDED PRECEDING ) AS FewestVacationHours FROM HumanResources.Employee AS e INNER JOIN Person.Person AS p ON e.BusinessEntityID = p.BusinessEntityID ORDER BY JobTitle;
Geospatial SQL Queries
This is another important group of SQL queries practically useful but not talked about in most cases.