PostgreSQL

Recruitment and knowledge question base. Filter, search and test your knowledge.

Topics

What is an index in PostgreSQL and how does it help performance?

easyindexbtreeperformance

Answer

An index is a separate data structure (usually B‑tree) that stores ordered keys and pointers to rows. It lets PostgreSQL find matching rows without scanning the whole table, speeding up lookups and joins at the cost of extra storage and slower writes.

When should you use a composite index?

mediumcomposite-indexquery-plannerperformance

Answer

Use a composite (multi‑column) index when queries commonly filter or sort by the same set of columns in the same order. PostgreSQL uses the leftmost‑prefix rule, so column order matters. Avoid composites if leading columns are low‑selectivity or rarely used together.

Explain transactions and isolation levels in PostgreSQL.

mediumtransactionsisolationacid+1

Answer

A transaction groups SQL statements into an all‑or‑nothing unit with ACID guarantees. PostgreSQL isolation levels are Read Committed (default), Repeatable Read, and Serializable, which control phenomena like non‑repeatable reads and phantoms.

What is VACUUM/autovacuum and why is it important?

mediumvacuumautovacuummaintenance+1

Answer

VACUUM removes dead tuples left by MVCC, updates visibility maps and statistics, and helps prevent table bloat and transaction ID wraparound. Autovacuum runs this automatically in the background so the database stays healthy.

How does MVCC work in PostgreSQL?

hardmvccconcurrencypostgresql

Answer

MVCC (multi‑version concurrency control) keeps multiple versions of rows. Updates create new row versions with transaction IDs; readers see a consistent snapshot without blocking writers. Old versions are later cleaned up by VACUUM.

PostgreSQL MVCC — what is it in one sentence?

easypostgresmvccconcurrency

Answer

MVCC (Multi-Version Concurrency Control) lets readers see a consistent snapshot while writers create new row versions, reducing read/write blocking under concurrency.

Why do we need VACUUM (and autovacuum) in PostgreSQL?

easyvacuumautovacuumbloat

Answer

Because MVCC creates dead row versions, VACUUM cleans them up so space can be reused and query performance stays good. Autovacuum runs it automatically to prevent table bloat and transaction ID wraparound issues.

What is `EXPLAIN ANALYZE` used for?

easyexplainanalyzequery-plan

Answer

`EXPLAIN` shows the query plan; `EXPLAIN ANALYZE` also runs the query and reports real timings/rows. It’s the main tool to understand why a query is slow.

Postgres UPSERT — what does `ON CONFLICT DO UPDATE` do?

mediumupserton-conflictsql

Answer

It inserts a row, but if it violates a unique constraint/index, it updates the existing row instead. It’s a safe way to “insert or update” in one statement.

What is a GIN index and when would you use it?

mediumginindexjsonb+1

Answer

GIN (Generalized Inverted Index) is great for “contains” queries on composite types like arrays and `jsonb` (e.g., `@>`, `?`). It indexes many-to-many relationships between keys and rows.

What does `SELECT ... FOR UPDATE` do in PostgreSQL?

mediumlockingfor-updatetransactions

Answer

It locks the selected rows for the duration of the transaction, preventing other transactions from updating (or locking) them until you commit/rollback. It’s used to safely read-modify-write without lost updates.

Why do we use connection pooling with Postgres (e.g., PgBouncer)?

hardconnection-poolpgbouncerperformance

Answer

Creating many Postgres connections is expensive (memory, process/thread work). A pool reuses connections and limits concurrency to protect the DB; you trade some session-level features (depending on pooling mode) for stability.

How do deadlocks happen in Postgres and how do you reduce them?

harddeadlocklockingtransactions

Answer

A deadlock happens when two transactions hold locks the other needs (A waits for B, B waits for A). Reduce them by locking rows in a consistent order, keeping transactions short, and avoiding unnecessary `FOR UPDATE`.

Table partitioning in Postgres — when does it help (and a common pitfall)?

hardpartitioningperformanceplanner

Answer

Partitioning helps large tables where queries filter by a partition key (time, tenant) so the planner can prune partitions. A common pitfall is not filtering by the partition key — then you scan many partitions and get little benefit.

JSON vs JSONB in Postgres — what’s the difference (and why JSONB is common)?

hardjsonjsonbgin+1

Answer

JSON stores text; JSONB stores a binary representation that is faster to query and index. JSONB supports GIN indexes and operators efficiently, which is why it’s the common choice.

What does `ANALYZE` do in PostgreSQL and why does it matter?

easyanalyzestatisticsplanner+1

Answer

`ANALYZE` updates table statistics (row counts, value distribution). The planner uses these stats to choose good plans; stale stats can lead to slow plans like wrong joins or missing index usage.

BRIN vs B-tree index — when is BRIN a good fit?

mediumbrinbtreeindex+1

Answer

BRIN is great for huge tables where data is naturally ordered on disk (e.g., time-series by created_at). It’s tiny and fast to maintain, but less precise than B-tree and relies on good correlation with physical order.

What is `work_mem` and why can sorts or hash joins spill to disk?

mediumwork_memsorthash-join+1

Answer

`work_mem` limits memory per operation (sort, hash join, aggregation). If the operation needs more memory than `work_mem`, Postgres writes temporary data to disk, which is much slower and can hurt query performance.

What is WAL and how does it help with crash recovery and replication?

hardwalrecoveryreplication+1

Answer

WAL (Write-Ahead Logging) writes changes to a log before they are applied to data files. After a crash, Postgres replays WAL to recover. Replication can ship WAL to replicas so they apply the same changes in the same order.

What is an index-only scan and what is the “visibility map” role?

hardindex-only-scanvisibility-mapindex+1

Answer

Index-only scan means the query can be answered from the index without reading table pages, because the index covers needed columns. Postgres still must know if rows are visible to the transaction; the visibility map tracks pages where all rows are visible, allowing the heap lookup to be skipped.

What is `pg_stat_activity` used for?

easymonitoringpg_stat_activitypostgres

Answer

It shows current connections and running queries: state, duration, waits, and the SQL text. It’s useful to find long-running queries, locks, and connection issues.

What is `VACUUM FULL` and why can it be disruptive?

mediumvacuum-fullbloatlocks

Answer

`VACUUM FULL` rewrites the whole table to compact it and reclaim space. It takes stronger locks and can block reads/writes on that table, so it’s disruptive; prefer regular VACUUM/autovacuum and fix bloat causes first.

What does `REINDEX` do and when would you use it?

mediumreindexindexesmaintenance

Answer

`REINDEX` rebuilds indexes. You use it when an index is bloated or suspected corrupted, or after certain operations where rebuilding helps performance. It can take locks and consume resources, so plan it.

Physical vs logical replication in PostgreSQL — what’s the difference?

hardreplicationwallogical+2

Answer

Physical replication ships WAL and keeps an exact byte-level copy (great for HA). Logical replication replicates changes at the table level (INSERT/UPDATE/DELETE), which is flexible for migrations and selective replication, but more complex.

Sequences/IDENTITY: why can generated IDs have gaps?

hardsequenceidentitytransactions+1

Answer

Sequences generate numbers independently of transactions. If a transaction rolls back after taking a value, or values are cached and not used, you get gaps. This is normal; don’t rely on IDs being consecutive.

What is a schema in PostgreSQL and why use it?

easypostgresschemanamespaces+1

Answer

A schema is a namespace inside a database (it groups tables, views, functions, etc.). You use schemas to organize objects, separate concerns (e.g., app vs audit), and manage permissions. `search_path` controls which schemas are searched when you don’t qualify names.

What is a partial index in PostgreSQL and when is it useful?

mediumpostgresindexpartial-index+1

Answer

A partial index indexes only rows that match a WHERE condition. It’s useful when most rows don’t need to be indexed (e.g., only active or not-deleted rows), making the index smaller and faster to scan and maintain.

Window functions: what problem do they solve? Give a quick example.

mediumpostgressqlwindow-functions+1

Answer

Window functions compute values over a “window” of rows while keeping each row in the result (unlike GROUP BY which collapses rows). They’re great for rankings, running totals, and “top N per group”.

What are advisory locks in PostgreSQL and when do they help?

hardpostgreslockingadvisory-locks+1

Answer

Advisory locks are application-defined locks (`pg_advisory_lock`) not tied to a specific table row. They help when you need “only one worker does this” coordination (e.g., one scheduled job). They’re not a substitute for DB constraints and you must handle timeouts and failures carefully.

Postgres `LISTEN/NOTIFY`: what does it do and what are its limitations?

hardpostgreslisten-notifypubsub+1

Answer

`LISTEN/NOTIFY` is lightweight pub/sub inside Postgres: clients can listen on a channel and receive notifications when someone sends `NOTIFY`. It’s great for signals (cache invalidation, “something changed”), but it’s not durable (messages can be missed when disconnected) and it’s not a full message queue.

PostgreSQL `COPY`: what is it used for and why is it fast?

easypostgrescopybulk-load+1

Answer

`COPY` is used for bulk importing/exporting data (from/to a file or STDIN/STDOUT). It’s fast because it’s optimized for bulk operations and avoids per-row overhead that you often get with many individual INSERTs.

CTE (`WITH`): what is it and what is a performance gotcha?

mediumpostgressqlcte+1

Answer

A CTE is a named subquery that can make complex SQL easier to read. Performance gotcha: in some cases the planner may materialize the CTE (compute it fully) instead of inlining it, which can be slower. In newer Postgres versions CTEs are often inlined, but you can still force materialization.

PgBouncer pooling modes: why can transaction pooling break some apps?

mediumpostgrespgbouncerpooling+1

Answer

In transaction pooling, PgBouncer can switch the underlying DB connection between transactions. That means session state (temp tables, session variables, prepared statements) may not persist. Apps that assume a stable session can break; you may need session pooling or avoid session state.

PostgreSQL bloat: what causes it and how do you reduce it?

hardpostgresmvccvacuum+2

Answer

Because of MVCC, updates/deletes create dead tuples that must be cleaned up by VACUUM. Bloat happens when cleanup can’t keep up (heavy updates, long transactions, poor autovacuum settings), so tables/indexes grow. Mitigation: tune autovacuum, avoid long transactions, and periodically `REINDEX`/`VACUUM (FULL)` or use online tools like pg_repack when needed.

HOT updates and `fillfactor`: what are they and why can they help?

hardpostgreshot-updatefillfactor+1

Answer

A HOT update (Heap-Only Tuple) happens when an UPDATE changes no indexed columns and there is space on the page, so Postgres can avoid updating indexes. `fillfactor` leaves free space in pages to make HOT updates more likely. This reduces index bloat and can improve write performance.

MVCC in Postgres: why don’t readers block writers?

mediumpostgresmvccconcurrency+1

Answer

Postgres uses MVCC (multi‑version concurrency control): updates create new row versions while readers use a snapshot to see a consistent view. That means readers can read old versions without blocking writers; VACUUM later cleans up dead tuples.

VACUUM vs VACUUM FULL: what’s the difference?

mediumpostgresvacuummaintenance+1

Answer

VACUUM reclaims dead tuples for reuse without locking the table heavily. VACUUM FULL rewrites the table to physically shrink it, but requires an exclusive lock and can be slow. Use FULL sparingly.

Autovacuum: when does it run and why does it matter?

mediumpostgresautovacuumstats+1

Answer

Autovacuum runs when table change thresholds (based on row updates/deletes) are exceeded. It cleans dead tuples, updates statistics, and prevents bloat and transaction ID wraparound. Without it, performance and reliability degrade.

ANALYZE: what does it do and how does it affect query plans?

mediumpostgresanalyzeoptimizer+1

Answer

ANALYZE collects statistics about table data (row counts, value distribution). The planner uses these stats to estimate costs and choose join orders and index usage. Stale stats can lead to bad plans.

GIN vs GiST indexes: when would you use each?

hardpostgresindexesgin+1

Answer

GIN is good for indexing composite values like arrays, JSONB, and full‑text search (fast lookups but heavier writes). GiST is a flexible index for range, geometric, and similarity searches; it supports custom operators and is often used for spatial data.

BRIN index: when is it useful?

mediumpostgresbrinindexes+1

Answer

BRIN (Block Range INdex) is useful for very large tables where data is naturally ordered (e.g., time‑series). It stores summaries per block range, so it’s tiny and fast to build, but less precise than B‑tree.

Advisory locks: what are they and when would you use them?

mediumpostgreslocksadvisory+1

Answer

Advisory locks are application‑defined locks in Postgres (not tied to specific rows). They’re useful for coordinating work like distributed jobs or ensuring a single instance performs a task. They’re voluntary—apps must cooperate to respect them.

LISTEN/NOTIFY: what problem does it solve?

easypostgreslisten-notifypubsub+1

Answer

LISTEN/NOTIFY provides lightweight pub/sub inside Postgres. A session can LISTEN on a channel and another session can NOTIFY it, which is useful for cache invalidation or waking background workers without polling.

Materialized view vs view: what’s the difference?

mediumpostgresviewsmaterialized-view+1

Answer

A view is a stored query that runs on every access. A materialized view stores the result physically and must be refreshed to update. Materialized views can be much faster for heavy reads but can be stale.

How do `pg_stat_activity` and `pg_stat_statements` help in troubleshooting?

mediumpostgresmonitoringpg_stat_activity+1

Answer

`pg_stat_activity` shows current sessions and running queries. `pg_stat_statements` aggregates query statistics (calls, total time) to identify slow or frequent queries. Together they help spot blockers, long transactions, and heavy SQL.