Key Takeaways

1. SQL fundamentals: Tables, queries, and data manipulation

A database is nothing more than a set of related information.

Tables form the foundation. In relational databases, data is organized into tables consisting of rows and columns. Each table represents an entity (e.g., customers, orders) with columns defining attributes and rows containing individual records.

CRUD operations. SQL provides four basic operations for interacting with data:

  • CREATE: Insert new records into tables
  • READ: Retrieve data using SELECT statements
  • UPDATE: Modify existing records
  • DELETE: Remove records from tables

Query structure. A typical SQL query includes:

  • SELECT: Specifies which columns to retrieve
  • FROM: Identifies the source table(s)
  • WHERE: Filters rows based on conditions
  • ORDER BY: Sorts the result set

2. Joining tables: Unlocking relationships in relational databases

The role of indexes is to facilitate the retrieval of a subset of a table's rows and columns without the need to inspect every row in the table.

Types of joins. SQL offers several ways to combine data from multiple tables:

  • INNER JOIN: Returns only matching rows from both tables
  • LEFT/RIGHT OUTER JOIN: Includes all rows from one table, even without matches
  • FULL OUTER JOIN: Includes all rows from both tables
  • CROSS JOIN: Generates all possible combinations (Cartesian product)

Join conditions. The ON clause specifies how tables are related, typically using primary and foreign key relationships. For example:
sql
SELECT customers.name, orders.order_date
FROM customers
INNER JOIN orders ON customers.id = orders.customer_id

Multiple table joins. Complex queries often involve joining three or more tables to gather related information from across the database schema.

3. Filtering and grouping data: Refining query results

A where clause may contain one or more conditions, separated by the operators and and or.

WHERE clause power. Filtering allows you to retrieve only the data you need:

  • Comparison operators: =, <>, <, >, BETWEEN, IN, LIKE
  • Logical operators: AND, OR, NOT
  • Pattern matching: Using wildcards (% and _) with LIKE

Aggregate functions. SQL provides functions to summarize data:

  • COUNT(): Number of rows
  • SUM(): Total of numeric values
  • AVG(): Average of numeric values
  • MIN()/MAX(): Smallest/largest values

GROUP BY and HAVING. These clauses enable analysis of data subsets:

  • GROUP BY: Organizes rows into groups based on column values
  • HAVING: Filters groups (similar to WHERE, but for grouped data)

4. Subqueries: Nested power for complex data retrieval

A subquery is a query contained within another SQL statement (which I refer to as the containing statement for the rest of this discussion).

Types of subqueries:

  • Scalar: Returns a single value
  • Column: Returns a single column of multiple rows
  • Row: Returns a single row of multiple columns
  • Table: Returns multiple columns and rows

Subquery locations:

  • SELECT: For calculated columns
  • FROM: As derived tables
  • WHERE: For dynamic filtering
  • HAVING: For filtering grouped results

Correlated subqueries. These reference columns from the outer query, allowing row-by-row processing. While powerful, they can impact performance on large datasets.

5. Advanced SQL: Views, transactions, and metadata

Views are created for various reasons, including to hide columns from users and to simplify complex database designs.

Views. Virtual tables based on SELECT statements:

  • Simplify complex queries
  • Provide data security by restricting access
  • Present consistent interfaces as schemas evolve

Transactions. Ensure data integrity for multi-step operations:

  • BEGIN: Starts a transaction
  • COMMIT: Saves changes permanently
  • ROLLBACK: Undoes changes if an error occurs

Metadata. Information about the database structure:

  • Data dictionary: Stores definitions of database objects
  • Information schema: Standardized way to access metadata
  • System catalogs: Database-specific metadata tables

6. Indexing and constraints: Optimizing database performance

Indexes are the mechanism the database server uses to control simultaneous use of data resources.

Indexes. Improve query performance:

  • B-tree: Balanced tree structure, good for most data types
  • Bitmap: Efficient for columns with low cardinality
  • Full-text: Optimized for searching text documents

Constraints. Enforce data integrity rules:

  • Primary Key: Ensures unique identification of rows
  • Foreign Key: Maintains referential integrity between tables
  • Unique: Prevents duplicate values in a column
  • Check: Enforces specific conditions on column values

Query optimization. Techniques to improve performance:

  • Analyze execution plans
  • Use appropriate indexes
  • Avoid full table scans when possible
  • Optimize JOIN operations

7. Analytics with SQL: Window functions for data insights

Using analytic functions, you can do all of these things and more.

Window functions. Perform calculations across sets of rows:

  • OVER clause: Defines the window of rows to operate on
  • PARTITION BY: Groups rows for analysis
  • ORDER BY: Determines the sequence of rows within partitions

Ranking functions:

  • ROW_NUMBER(): Assigns unique numbers to rows
  • RANK(): Assigns rankings with gaps for ties
  • DENSE_RANK(): Assigns rankings without gaps

Aggregate window functions:

  • SUM(), AVG(), COUNT() over windows
  • Running totals and moving averages

Offset functions:

  • LAG(): Access data from previous rows
  • LEAD(): Access data from following rows

Last updated:

Report Issue