Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Optimize SQL queries with index strategies, query plan analysis, and rewrite patterns for faster database performance.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
SKILL.md
1---2name: sql-optimization-patterns3description: Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.4---56# SQL Optimization Patterns78Transform slow database queries into lightning-fast operations through systematic optimization, proper indexing, and query plan analysis.910## When to Use This Skill1112- Debugging slow-running queries13- Designing performant database schemas14- Optimizing application response times15- Reducing database load and costs16- Improving scalability for growing datasets17- Analyzing EXPLAIN query plans18- Implementing efficient indexes19- Resolving N+1 query problems2021## Core Concepts2223### 1. Query Execution Plans (EXPLAIN)2425Understanding EXPLAIN output is fundamental to optimization.2627**PostgreSQL EXPLAIN:**2829```sql30-- Basic explain31EXPLAIN SELECT * FROM users WHERE email = '[email protected]';3233-- With actual execution stats34EXPLAIN ANALYZE35SELECT * FROM users WHERE email = '[email protected]';3637-- Verbose output with more details38EXPLAIN (ANALYZE, BUFFERS, VERBOSE)39SELECT u.*, o.order_total40FROM users u41JOIN orders o ON u.id = o.user_id42WHERE u.created_at > NOW() - INTERVAL '30 days';43```4445**Key Metrics to Watch:**4647- **Seq Scan**: Full table scan (usually slow for large tables)48- **Index Scan**: Using index (good)49- **Index Only Scan**: Using index without touching table (best)50- **Nested Loop**: Join method (okay for small datasets)51- **Hash Join**: Join method (good for larger datasets)52- **Merge Join**: Join method (good for sorted data)53- **Cost**: Estimated query cost (lower is better)54- **Rows**: Estimated rows returned55- **Actual Time**: Real execution time5657### 2. Index Strategies5859Indexes are the most powerful optimization tool.6061**Index Types:**6263- **B-Tree**: Default, good for equality and range queries64- **Hash**: Only for equality (=) comparisons65- **GIN**: Full-text search, array queries, JSONB66- **GiST**: Geometric data, full-text search67- **BRIN**: Block Range INdex for very large tables with correlation6869```sql70-- Standard B-Tree index71CREATE INDEX idx_users_email ON users(email);7273-- Composite index (order matters!)74CREATE INDEX idx_orders_user_status ON orders(user_id, status);7576-- Partial index (index subset of rows)77CREATE INDEX idx_active_users ON users(email)78WHERE status = 'active';7980-- Expression index81CREATE INDEX idx_users_lower_email ON users(LOWER(email));8283-- Covering index (include additional columns)84CREATE INDEX idx_users_email_covering ON users(email)85INCLUDE (name, created_at);8687-- Full-text search index88CREATE INDEX idx_posts_search ON posts89USING GIN(to_tsvector('english', title || ' ' || body));9091-- JSONB index92CREATE INDEX idx_metadata ON events USING GIN(metadata);93```9495### 3. Query Optimization Patterns9697**Avoid SELECT \*:**9899```sql100-- Bad: Fetches unnecessary columns101SELECT * FROM users WHERE id = 123;102103-- Good: Fetch only what you need104SELECT id, email, name FROM users WHERE id = 123;105```106107**Use WHERE Clause Efficiently:**108109```sql110-- Bad: Function prevents index usage111SELECT * FROM users WHERE LOWER(email) = '[email protected]';112113-- Good: Create functional index or use exact match114CREATE INDEX idx_users_email_lower ON users(LOWER(email));115-- Then:116SELECT * FROM users WHERE LOWER(email) = '[email protected]';117118-- Or store normalized data119SELECT * FROM users WHERE email = '[email protected]';120```121122**Optimize JOINs:**123124```sql125-- Bad: Cartesian product then filter126SELECT u.name, o.total127FROM users u, orders o128WHERE u.id = o.user_id AND u.created_at > '2024-01-01';129130-- Good: Filter before join131SELECT u.name, o.total132FROM users u133JOIN orders o ON u.id = o.user_id134WHERE u.created_at > '2024-01-01';135136-- Better: Filter both tables137SELECT u.name, o.total138FROM (SELECT * FROM users WHERE created_at > '2024-01-01') u139JOIN orders o ON u.id = o.user_id;140```141142## Optimization Patterns143144### Pattern 1: Eliminate N+1 Queries145146**Problem: N+1 Query Anti-Pattern**147148```python149# Bad: Executes N+1 queries150users = db.query("SELECT * FROM users LIMIT 10")151for user in users:152orders = db.query("SELECT * FROM orders WHERE user_id = ?", user.id)153# Process orders154```155156**Solution: Use JOINs or Batch Loading**157158```sql159-- Solution 1: JOIN160SELECT161u.id, u.name,162o.id as order_id, o.total163FROM users u164LEFT JOIN orders o ON u.id = o.user_id165WHERE u.id IN (1, 2, 3, 4, 5);166167-- Solution 2: Batch query168SELECT * FROM orders169WHERE user_id IN (1, 2, 3, 4, 5);170```171172```python173# Good: Single query with JOIN or batch load174# Using JOIN175results = db.query("""176SELECT u.id, u.name, o.id as order_id, o.total177FROM users u178LEFT JOIN orders o ON u.id = o.user_id179WHERE u.id IN (1, 2, 3, 4, 5)180""")181182# Or batch load183users = db.query("SELECT * FROM users LIMIT 10")184user_ids = [u.id for u in users]185orders = db.query(186"SELECT * FROM orders WHERE user_id IN (?)",187user_ids188)189# Group orders by user_id190orders_by_user = {}191for order in orders:192orders_by_user.setdefault(order.user_id, []).append(order)193```194195### Pattern 2: Optimize Pagination196197**Bad: OFFSET on Large Tables**198199```sql200-- Slow for large offsets201SELECT * FROM users202ORDER BY created_at DESC203LIMIT 20 OFFSET 100000; -- Very slow!204```205206**Good: Cursor-Based Pagination**207208```sql209-- Much faster: Use cursor (last seen ID)210SELECT * FROM users211WHERE created_at < '2024-01-15 10:30:00' -- Last cursor212ORDER BY created_at DESC213LIMIT 20;214215-- With composite sorting216SELECT * FROM users217WHERE (created_at, id) < ('2024-01-15 10:30:00', 12345)218ORDER BY created_at DESC, id DESC219LIMIT 20;220221-- Requires index222CREATE INDEX idx_users_cursor ON users(created_at DESC, id DESC);223```224225### Pattern 3: Aggregate Efficiently226227**Optimize COUNT Queries:**228229```sql230-- Bad: Counts all rows231SELECT COUNT(*) FROM orders; -- Slow on large tables232233-- Good: Use estimates for approximate counts234SELECT reltuples::bigint AS estimate235FROM pg_class236WHERE relname = 'orders';237238-- Good: Filter before counting239SELECT COUNT(*) FROM orders240WHERE created_at > NOW() - INTERVAL '7 days';241242-- Better: Use index-only scan243CREATE INDEX idx_orders_created ON orders(created_at);244SELECT COUNT(*) FROM orders245WHERE created_at > NOW() - INTERVAL '7 days';246```247248**Optimize GROUP BY:**249250```sql251-- Bad: Group by then filter252SELECT user_id, COUNT(*) as order_count253FROM orders254GROUP BY user_id255HAVING COUNT(*) > 10;256257-- Better: Filter first, then group (if possible)258SELECT user_id, COUNT(*) as order_count259FROM orders260WHERE status = 'completed'261GROUP BY user_id262HAVING COUNT(*) > 10;263264-- Best: Use covering index265CREATE INDEX idx_orders_user_status ON orders(user_id, status);266```267268### Pattern 4: Subquery Optimization269270**Transform Correlated Subqueries:**271272```sql273-- Bad: Correlated subquery (runs for each row)274SELECT u.name, u.email,275(SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) as order_count276FROM users u;277278-- Good: JOIN with aggregation279SELECT u.name, u.email, COUNT(o.id) as order_count280FROM users u281LEFT JOIN orders o ON o.user_id = u.id282GROUP BY u.id, u.name, u.email;283284-- Better: Use window functions285SELECT DISTINCT ON (u.id)286u.name, u.email,287COUNT(o.id) OVER (PARTITION BY u.id) as order_count288FROM users u289LEFT JOIN orders o ON o.user_id = u.id;290```291292**Use CTEs for Clarity:**293294```sql295-- Using Common Table Expressions296WITH recent_users AS (297SELECT id, name, email298FROM users299WHERE created_at > NOW() - INTERVAL '30 days'300),301user_order_counts AS (302SELECT user_id, COUNT(*) as order_count303FROM orders304WHERE created_at > NOW() - INTERVAL '30 days'305GROUP BY user_id306)307SELECT ru.name, ru.email, COALESCE(uoc.order_count, 0) as orders308FROM recent_users ru309LEFT JOIN user_order_counts uoc ON ru.id = uoc.user_id;310```311312### Pattern 5: Batch Operations313314**Batch INSERT:**315316```sql317-- Bad: Multiple individual inserts318INSERT INTO users (name, email) VALUES ('Alice', '[email protected]');319INSERT INTO users (name, email) VALUES ('Bob', '[email protected]');320INSERT INTO users (name, email) VALUES ('Carol', '[email protected]');321322-- Good: Batch insert323INSERT INTO users (name, email) VALUES324('Alice', '[email protected]'),325('Bob', '[email protected]'),326('Carol', '[email protected]');327328-- Better: Use COPY for bulk inserts (PostgreSQL)329COPY users (name, email) FROM '/tmp/users.csv' CSV HEADER;330```331332**Batch UPDATE:**333334```sql335-- Bad: Update in loop336UPDATE users SET status = 'active' WHERE id = 1;337UPDATE users SET status = 'active' WHERE id = 2;338-- ... repeat for many IDs339340-- Good: Single UPDATE with IN clause341UPDATE users342SET status = 'active'343WHERE id IN (1, 2, 3, 4, 5, ...);344345-- Better: Use temporary table for large batches346CREATE TEMP TABLE temp_user_updates (id INT, new_status VARCHAR);347INSERT INTO temp_user_updates VALUES (1, 'active'), (2, 'active'), ...;348349UPDATE users u350SET status = t.new_status351FROM temp_user_updates t352WHERE u.id = t.id;353```354355## Advanced Techniques356357### Materialized Views358359Pre-compute expensive queries.360361```sql362-- Create materialized view363CREATE MATERIALIZED VIEW user_order_summary AS364SELECT365u.id,366u.name,367COUNT(o.id) as total_orders,368SUM(o.total) as total_spent,369MAX(o.created_at) as last_order_date370FROM users u371LEFT JOIN orders o ON u.id = o.user_id372GROUP BY u.id, u.name;373374-- Add index to materialized view375CREATE INDEX idx_user_summary_spent ON user_order_summary(total_spent DESC);376377-- Refresh materialized view378REFRESH MATERIALIZED VIEW user_order_summary;379380-- Concurrent refresh (PostgreSQL)381REFRESH MATERIALIZED VIEW CONCURRENTLY user_order_summary;382383-- Query materialized view (very fast)384SELECT * FROM user_order_summary385WHERE total_spent > 1000386ORDER BY total_spent DESC;387```388389### Partitioning390391Split large tables for better performance.392393```sql394-- Range partitioning by date (PostgreSQL)395CREATE TABLE orders (396id SERIAL,397user_id INT,398total DECIMAL,399created_at TIMESTAMP400) PARTITION BY RANGE (created_at);401402-- Create partitions403CREATE TABLE orders_2024_q1 PARTITION OF orders404FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');405406CREATE TABLE orders_2024_q2 PARTITION OF orders407FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');408409-- Queries automatically use appropriate partition410SELECT * FROM orders411WHERE created_at BETWEEN '2024-02-01' AND '2024-02-28';412-- Only scans orders_2024_q1 partition413```414415### Query Hints and Optimization416417```sql418-- Force index usage (MySQL)419SELECT * FROM users420USE INDEX (idx_users_email)421WHERE email = '[email protected]';422423-- Parallel query (PostgreSQL)424SET max_parallel_workers_per_gather = 4;425SELECT * FROM large_table WHERE condition;426427-- Join hints (PostgreSQL)428SET enable_nestloop = OFF; -- Force hash or merge join429```430431## Best Practices4324331. **Index Selectively**: Too many indexes slow down writes4342. **Monitor Query Performance**: Use slow query logs4353. **Keep Statistics Updated**: Run ANALYZE regularly4364. **Use Appropriate Data Types**: Smaller types = better performance4375. **Normalize Thoughtfully**: Balance normalization vs performance4386. **Cache Frequently Accessed Data**: Use application-level caching4397. **Connection Pooling**: Reuse database connections4408. **Regular Maintenance**: VACUUM, ANALYZE, rebuild indexes441442```sql443-- Update statistics444ANALYZE users;445ANALYZE VERBOSE orders;446447-- Vacuum (PostgreSQL)448VACUUM ANALYZE users;449VACUUM FULL users; -- Reclaim space (locks table)450451-- Reindex452REINDEX INDEX idx_users_email;453REINDEX TABLE users;454```455456## Common Pitfalls457458- **Over-Indexing**: Each index slows down INSERT/UPDATE/DELETE459- **Unused Indexes**: Waste space and slow writes460- **Missing Indexes**: Slow queries, full table scans461- **Implicit Type Conversion**: Prevents index usage462- **OR Conditions**: Can't use indexes efficiently463- **LIKE with Leading Wildcard**: `LIKE '%abc'` can't use index464- **Function in WHERE**: Prevents index usage unless functional index exists465466## Monitoring Queries467468```sql469-- Find slow queries (PostgreSQL)470SELECT query, calls, total_time, mean_time471FROM pg_stat_statements472ORDER BY mean_time DESC473LIMIT 10;474475-- Find missing indexes (PostgreSQL)476SELECT477schemaname,478tablename,479seq_scan,480seq_tup_read,481idx_scan,482seq_tup_read / seq_scan AS avg_seq_tup_read483FROM pg_stat_user_tables484WHERE seq_scan > 0485ORDER BY seq_tup_read DESC486LIMIT 10;487488-- Find unused indexes (PostgreSQL)489SELECT490schemaname,491tablename,492indexname,493idx_scan,494idx_tup_read,495idx_tup_fetch496FROM pg_stat_user_indexes497WHERE idx_scan = 0498ORDER BY pg_relation_size(indexrelid) DESC;499```500