PostgreSQL

Recruitment and knowledge question base. Filter, search and test your knowledge.

Topics

Answer

An index is a separate data structure (usually B‑tree) that stores ordered keys and pointers to rows. It lets PostgreSQL find matching rows without scanning the whole table, speeding up lookups and joins at the cost of extra storage and slower writes.

CREATE INDEX idx_users_email ON users(email);
EXPLAIN ANALYZE SELECT * FROM users WHERE email = '[email protected]';
mediumcomposite-indexquery-plannerperformance

Answer

Use a composite (multi‑column) index when queries commonly filter or sort by the same set of columns in the same order. PostgreSQL uses the leftmost‑prefix rule, so column order matters. Avoid composites if leading columns are low‑selectivity or rarely used together.

Answer

A transaction groups SQL statements into an all‑or‑nothing unit with ACID guarantees. PostgreSQL isolation levels are Read Committed (default), Repeatable Read, and Serializable, which control phenomena like non‑repeatable reads and phantoms.

mediumvacuumautovacuummaintenance+1

Answer

VACUUM removes dead tuples left by MVCC, updates visibility maps and statistics, and helps prevent table bloat and transaction ID wraparound. Autovacuum runs this automatically in the background so the database stays healthy.

hardmvccconcurrencypostgresql

Answer

MVCC (multi‑version concurrency control) keeps multiple versions of rows. Updates create new row versions with transaction IDs; readers see a consistent snapshot without blocking writers. Old versions are later cleaned up by VACUUM.

Answer

MVCC (Multi-Version Concurrency Control) lets readers see a consistent snapshot while writers create new row versions, reducing read/write blocking under concurrency.

Answer

Because MVCC creates dead row versions, VACUUM cleans them up so space can be reused and query performance stays good. Autovacuum runs it automatically to prevent table bloat and transaction ID wraparound issues.

easyexplainanalyzequery-plan

Answer

`EXPLAIN` shows the query plan; `EXPLAIN ANALYZE` also runs the query and reports real timings/rows. It’s the main tool to understand why a query is slow.

EXPLAIN ANALYZE
SELECT *
FROM users
WHERE email = '[email protected]';

Answer

It inserts a row, but if it violates a unique constraint/index, it updates the existing row instead. It’s a safe way to “insert or update” in one statement.

INSERT INTO users(email, name)
VALUES ('[email protected]', 'Ada')
ON CONFLICT (email)
DO UPDATE SET name = EXCLUDED.name;

Answer

GIN (Generalized Inverted Index) is great for “contains” queries on composite types like arrays and `jsonb` (e.g., `@>`, `?`). It indexes many-to-many relationships between keys and rows.

CREATE INDEX idx_events_payload ON events USING GIN (payload);

-- example query (jsonb contains)
SELECT * FROM events WHERE payload @> '{"type":"click"}';
mediumlockingfor-updatetransactions

Answer

It locks the selected rows for the duration of the transaction, preventing other transactions from updating (or locking) them until you commit/rollback. It’s used to safely read-modify-write without lost updates.

Answer

Creating many Postgres connections is expensive (memory, process/thread work). A pool reuses connections and limits concurrency to protect the DB; you trade some session-level features (depending on pooling mode) for stability.

Answer

A deadlock happens when two transactions hold locks the other needs (A waits for B, B waits for A). Reduce them by locking rows in a consistent order, keeping transactions short, and avoiding unnecessary `FOR UPDATE`.

Answer

Partitioning helps large tables where queries filter by a partition key (time, tenant) so the planner can prune partitions. A common pitfall is not filtering by the partition key — then you scan many partitions and get little benefit.

Answer

JSON stores text; JSONB stores a binary representation that is faster to query and index. JSONB supports GIN indexes and operators efficiently, which is why it’s the common choice.

Answer

`ANALYZE` updates table statistics (row counts, value distribution). The planner uses these stats to choose good plans; stale stats can lead to slow plans like wrong joins or missing index usage.

Answer

BRIN is great for huge tables where data is naturally ordered on disk (e.g., time-series by created_at). It’s tiny and fast to maintain, but less precise than B-tree and relies on good correlation with physical order.

Answer

`work_mem` limits memory per operation (sort, hash join, aggregation). If the operation needs more memory than `work_mem`, Postgres writes temporary data to disk, which is much slower and can hurt query performance.

Answer

WAL (Write-Ahead Logging) writes changes to a log before they are applied to data files. After a crash, Postgres replays WAL to recover. Replication can ship WAL to replicas so they apply the same changes in the same order.

Answer

Index-only scan means the query can be answered from the index without reading table pages, because the index covers needed columns. Postgres still must know if rows are visible to the transaction; the visibility map tracks pages where all rows are visible, allowing the heap lookup to be skipped.

easymonitoringpg_stat_activitypostgres

Answer

It shows current connections and running queries: state, duration, waits, and the SQL text. It’s useful to find long-running queries, locks, and connection issues.

Answer

`VACUUM FULL` rewrites the whole table to compact it and reclaim space. It takes stronger locks and can block reads/writes on that table, so it’s disruptive; prefer regular VACUUM/autovacuum and fix bloat causes first.

Answer

`REINDEX` rebuilds indexes. You use it when an index is bloated or suspected corrupted, or after certain operations where rebuilding helps performance. It can take locks and consume resources, so plan it.

Answer

Physical replication ships WAL and keeps an exact byte-level copy (great for HA). Logical replication replicates changes at the table level (INSERT/UPDATE/DELETE), which is flexible for migrations and selective replication, but more complex.

Answer

Sequences generate numbers independently of transactions. If a transaction rolls back after taking a value, or values are cached and not used, you get gaps. This is normal; don’t rely on IDs being consecutive.

Answer

A schema is a namespace inside a database (it groups tables, views, functions, etc.). You use schemas to organize objects, separate concerns (e.g., app vs audit), and manage permissions. `search_path` controls which schemas are searched when you don’t qualify names.

Answer

A partial index indexes only rows that match a WHERE condition. It’s useful when most rows don’t need to be indexed (e.g., only active or not-deleted rows), making the index smaller and faster to scan and maintain.

CREATE INDEX idx_users_active_email
ON users (email)
WHERE active = true AND deleted_at IS NULL;

Answer

Window functions compute values over a “window” of rows while keeping each row in the result (unlike GROUP BY which collapses rows). They’re great for rankings, running totals, and “top N per group”.

SELECT user_id,
       created_at,
       ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
FROM logins;

Answer

Advisory locks are application-defined locks (`pg_advisory_lock`) not tied to a specific table row. They help when you need “only one worker does this” coordination (e.g., one scheduled job). They’re not a substitute for DB constraints and you must handle timeouts and failures carefully.

Answer

`LISTEN/NOTIFY` is lightweight pub/sub inside Postgres: clients can listen on a channel and receive notifications when someone sends `NOTIFY`. It’s great for signals (cache invalidation, “something changed”), but it’s not durable (messages can be missed when disconnected) and it’s not a full message queue.

Answer

`COPY` is used for bulk importing/exporting data (from/to a file or STDIN/STDOUT). It’s fast because it’s optimized for bulk operations and avoids per-row overhead that you often get with many individual INSERTs.

COPY users(email, created_at)
FROM STDIN WITH (FORMAT csv, HEADER true);

Answer

A CTE is a named subquery that can make complex SQL easier to read. Performance gotcha: in some cases the planner may materialize the CTE (compute it fully) instead of inlining it, which can be slower. In newer Postgres versions CTEs are often inlined, but you can still force materialization.

Answer

In transaction pooling, PgBouncer can switch the underlying DB connection between transactions. That means session state (temp tables, session variables, prepared statements) may not persist. Apps that assume a stable session can break; you may need session pooling or avoid session state.

Answer

Because of MVCC, updates/deletes create dead tuples that must be cleaned up by VACUUM. Bloat happens when cleanup can’t keep up (heavy updates, long transactions, poor autovacuum settings), so tables/indexes grow. Mitigation: tune autovacuum, avoid long transactions, and periodically `REINDEX`/`VACUUM (FULL)` or use online tools like pg_repack when needed.

Answer

A HOT update (Heap-Only Tuple) happens when an UPDATE changes no indexed columns and there is space on the page, so Postgres can avoid updating indexes. `fillfactor` leaves free space in pages to make HOT updates more likely. This reduces index bloat and can improve write performance.

Answer

Postgres uses MVCC (multi‑version concurrency control): updates create new row versions while readers use a snapshot to see a consistent view. That means readers can read old versions without blocking writers; VACUUM later cleans up dead tuples.

mediumpostgresvacuummaintenance+1

Answer

VACUUM reclaims dead tuples for reuse without locking the table heavily. VACUUM FULL rewrites the table to physically shrink it, but requires an exclusive lock and can be slow. Use FULL sparingly.

Answer

Autovacuum runs when table change thresholds (based on row updates/deletes) are exceeded. It cleans dead tuples, updates statistics, and prevents bloat and transaction ID wraparound. Without it, performance and reliability degrade.

Answer

ANALYZE collects statistics about table data (row counts, value distribution). The planner uses these stats to estimate costs and choose join orders and index usage. Stale stats can lead to bad plans.

Answer

GIN is good for indexing composite values like arrays, JSONB, and full‑text search (fast lookups but heavier writes). GiST is a flexible index for range, geometric, and similarity searches; it supports custom operators and is often used for spatial data.

mediumpostgresbrinindexes+1

Answer

BRIN (Block Range INdex) is useful for very large tables where data is naturally ordered (e.g., time‑series). It stores summaries per block range, so it’s tiny and fast to build, but less precise than B‑tree.

Answer

Advisory locks are application‑defined locks in Postgres (not tied to specific rows). They’re useful for coordinating work like distributed jobs or ensuring a single instance performs a task. They’re voluntary—apps must cooperate to respect them.

easypostgreslisten-notifypubsub+1

Answer

LISTEN/NOTIFY provides lightweight pub/sub inside Postgres. A session can LISTEN on a channel and another session can NOTIFY it, which is useful for cache invalidation or waking background workers without polling.

mediumpostgresviewsmaterialized-view+1

Answer

A view is a stored query that runs on every access. A materialized view stores the result physically and must be refreshed to update. Materialized views can be much faster for heavy reads but can be stale.

Answer

`pg_stat_activity` shows current sessions and running queries. `pg_stat_statements` aggregates query statistics (calls, total time) to identify slow or frequent queries. Together they help spot blockers, long transactions, and heavy SQL.