MongoDB

Recruitment and knowledge question base. Filter, search and test your knowledge.

Topics

What is a document in MongoDB and how is data modeled?

easydocumentbsonmodeling+1

Answer

A document is a JSON‑like (BSON) record of key‑value pairs that can contain nested objects and arrays. Collections hold documents and the schema is flexible. Data is modeled mainly by embedding related data or referencing it across collections based on access patterns.

Explain indexing in MongoDB.

mediumindexcompound-indexperformance+1

Answer

Indexes in MongoDB store ordered keys to let the database locate matching documents quickly instead of scanning the whole collection. You can have single‑field, compound, multikey, text or geospatial indexes. Indexes speed reads but cost memory/storage and slow writes.

Embedding vs referencing in MongoDB schema design?

mediumembeddingreferencingschema-design+1

Answer

Embedding stores related data inside one document, giving fast reads and atomic updates but risking large documents and duplication. Referencing stores IDs to other documents/collections, avoiding duplication and supporting large relationships, but requiring extra queries or $lookup.

How does the MongoDB aggregation pipeline work?

hardaggregationpipelinemongodb

Answer

The aggregation pipeline processes documents through ordered stages like $match, $group, $project and $sort. Each stage transforms the stream and passes results to the next one, enabling complex analytics similar to SQL.

Replica sets vs sharding in MongoDB?

hardreplicationshardingscalability+1

Answer

Replica sets provide high availability: one primary replicates to secondaries with automatic failover. Sharding provides horizontal scaling: data is partitioned across shards by a shard key and routed via mongos. You often use both together.

MongoDB basics: what is a document and a collection?

easymongodbdocumentcollection+1

Answer

A document is a JSON-like record (BSON) with fields; a collection is a group of documents (like a table, but schema is flexible). Documents typically have an `_id` as a primary identifier.

Embedding vs referencing in MongoDB — when would you embed?

easymodelingembeddingreferencing

Answer

Embed when data is owned by the parent and is usually read together (one query, atomic update in one document). Reference when the related data is large, shared, or grows without bounds.

What is an ObjectId in MongoDB?

easyobjectididmongodb

Answer

ObjectId is a 12-byte identifier commonly used as `_id`. It includes a timestamp and other bits to be unique; it’s not strictly sequential, but it’s roughly time-ordered.

Compound index order — what does the “prefix rule” mean?

mediumindexcompound-indexperformance

Answer

For an index like `{a: 1, b: 1, c: 1}`, queries can efficiently use the leftmost prefix (a), (a,b), (a,b,c). If you skip `a`, the index is much less useful for that query.

Aggregation pipeline — what is it and what is it used for?

mediumaggregationpipelinegroup

Answer

It’s a sequence of stages (`$match`, `$group`, `$project`, ...) that transforms documents and can compute aggregates. Use it for reporting, grouping, filtering, and shaping data on the server side.

What is a TTL index in MongoDB?

mediumttlindexretention

Answer

A TTL index automatically deletes documents after a time based on a date field (or after a fixed expire time). It’s useful for sessions, tokens, temporary data, and logs with retention.

Replica set — what happens during an election and why do read/write concerns matter?

hardreplica-setconsistencyelection

Answer

In a replica set, a new primary is elected if the current one fails. Write concern controls when a write is considered “acknowledged” (e.g., majority), and read concern controls what consistency a read requires — it’s a trade-off between consistency, latency, and availability.

Sharding — what makes a good shard key (and a common bad choice)?

hardshardingshard-keyscaling

Answer

A good shard key has high cardinality and good distribution, and supports your common query patterns. A common bad choice is a monotonically increasing key (like timestamp) that creates hotspots on one shard.

MongoDB transactions — when are they useful and what’s the cost?

hardtransactionsmongodbperformance

Answer

They’re useful when you need atomic changes across multiple documents/collections. The cost is more overhead and reduced performance compared to single-document operations; you should model data to avoid needing multi-doc transactions when possible.

Pagination at scale — why can `skip/limit` become slow and what’s a better pattern?

hardpaginationskip-limitperformance

Answer

`skip` must walk past many documents, so deep pages get slower. A better pattern is range/seek pagination (e.g., by `_id` or a createdAt index) using “greater than last seen” with sorting.

In MongoDB updates, what does `$set` do?

easymongodbupdateset

Answer

`$set` assigns a value to a field (and creates it if it doesn’t exist). It updates only selected fields instead of replacing the whole document.

What is a projection in MongoDB and why use it?

mediumprojectionfindperformance

Answer

Projection means selecting only specific fields to return (include/exclude). It reduces payload size, improves performance, and avoids leaking unnecessary data.

What does `upsert: true` mean in MongoDB updates?

mediumupsertupdatemongodb

Answer

Upsert means “update or insert”: if no document matches the filter, MongoDB inserts a new document (based on the update) instead of updating.

Why are unbounded arrays inside MongoDB documents dangerous?

hardarraysschema-designlimits+1

Answer

Documents have a size limit (16MB) and large arrays make documents grow, causing more IO and slower updates. They can also lead to hot documents and contention; prefer separate collections or bucketing for unbounded lists.

What are MongoDB Change Streams and what do they require?

hardchange-streamsoplogreplica-set+1

Answer

Change Streams let you subscribe to real-time changes (insert/update/delete) in a collection/database. They require a replica set (or sharded cluster) because they rely on the oplog.

`$push` vs `$addToSet` — what’s the difference?

easymongodbarraysupdate+1

Answer

`$push` appends a value to an array (allows duplicates). `$addToSet` adds only if the value is not already present (set-like, no duplicates).

How does a unique index work in MongoDB (and why do you need it)?

mediumunique-indexconcurrencymongodb

Answer

A unique index enforces that no two documents can have the same value for the indexed key(s). Without it, application-level checks can race and create duplicates under concurrency.

What does `$lookup` do in MongoDB aggregation (and what’s a caveat)?

mediumlookupaggregationjoins+1

Answer

`$lookup` joins documents from another collection (like a left join). Caveat: joins can be expensive at scale; make sure you have indexes on join keys and consider denormalization/embedding when appropriate.

Write concern `w:1` vs `w:"majority"` — what’s the trade-off?

hardwrite-concerndurabilityreplica-set

Answer

`w:1` acknowledges once the primary accepts the write (lower latency, but higher risk on failover). `w:"majority"` waits for replication to a majority of nodes (higher durability, higher latency).

Why can some MongoDB updates get slower over time (document growth/moves)?

hardperformancedocument-growthupdates+1

Answer

If updates make a document grow beyond its allocated space, MongoDB may need to move it to a new location, causing extra IO and fragmentation. Avoid unbounded growth, update in place when possible, and model data to keep documents stable in size.

MongoDB read preference: what does it control (primary vs secondary)?

easymongoreplica-setread-preference+1

Answer

Read preference controls where reads go in a replica set (e.g., `primary`, `secondary`, `secondaryPreferred`). Reading from secondaries can reduce load on the primary, but you might read slightly stale data depending on replication lag and read concern.

What is a text index in MongoDB and what are its limitations?

mediummongoindextext-search+1

Answer

A text index enables full-text search with the `$text` query and basic stemming/scoring. Limitations: you can have only one text index per collection, and it’s not as feature-rich as dedicated search engines (advanced ranking, fuzzy search).

MongoDB schema validation: why use it in a “schemaless” database?

mediummongoschema-validationjson-schema+1

Answer

Schema validation (e.g., JSON Schema) rejects malformed documents on write, which protects data quality. It helps during refactors and when many services write to the same collection, but you should keep rules backward compatible (allow old and new shapes during migrations).

Read concern in MongoDB: `local` vs `majority` - what changes?

hardmongoread-concernconsistency+1

Answer

`local` can return data that exists on the node you read from (fast, but possibly not replicated yet). `majority` returns data acknowledged by a majority of replica set members, which is stronger consistency but usually higher latency.

Sharded MongoDB: why are “scatter-gather” queries bad and how do you avoid them?

hardmongoshardingshard-key+1

Answer

If a query doesn’t include the shard key (or its useful prefix), `mongos` may have to ask many shards and merge results (“scatter-gather”), which increases latency and load. You avoid it by choosing a shard key that matches your common query patterns and by ensuring queries include it.

MongoDB document-level atomicity: what does it guarantee?

easymongoatomicitydocument+1

Answer

It guarantees that a single write on one document is all-or-nothing: readers won't see a “half-updated” document. It does not make multi-document changes atomic—use a transaction when you need atomicity across multiple documents/collections.

What is a covered query in MongoDB and why can it be faster?

mediummongoindexescovered-query+1

Answer

A covered query can be answered using only an index, without fetching the full document. It can be faster because it avoids reading documents from disk/memory. You get it when the filter and the returned fields are all in the same index (and you don’t need any other fields).

Aggregation pipeline performance: why put `$match` (and `$project`) early?

mediummongoaggregationpipeline+1

Answer

Early `$match` reduces the number of documents that flow through later stages, so the pipeline does less work. If `$match` is early and matches an indexed field, MongoDB can use the index. Early `$project` reduces the payload size, which can reduce memory and network use.

MongoDB transaction write conflicts: why do they happen and how should you handle them?

hardmongotransactionsconcurrency+1

Answer

They happen when concurrent transactions try to update the same documents/keys, so one transaction must abort to keep isolation. Apps should treat these as retryable: retry the whole transaction (with backoff), keep transactions short, and make side effects idempotent so retries are safe.

Sharded MongoDB balancing (chunk migrations): what can go wrong and how do you reduce impact?

hardmongoshardingbalancer+1

Answer

The balancer moves chunks between shards to keep data evenly distributed. Migrations consume CPU/IO/network and can increase latency, especially if you move large or “hot” chunks. Reduce impact with a good shard key (avoid hotspots), monitor migrations, schedule them off-peak, and use zones/pre-splitting when appropriate.

Embedding vs referencing: when would you embed documents in MongoDB?

mediummongoschema-designembedding+1

Answer

Embed when the child data is accessed together with the parent and doesn’t grow unbounded (e.g., order with line items). Embedding gives fewer queries and atomic updates within one document. Reference when you need many‑to‑many relationships or unbounded growth.

TTL index: what does it do and when would you use it?

easymongottlindex+1

Answer

A TTL index automatically deletes documents after a specified time based on a date field. Use it for expiring data like sessions, OTPs, or temporary logs to avoid manual cleanup.

Capped collections: what are they and when are they useful?

mediummongocapped-collectionstorage+1

Answer

A capped collection has a fixed size and behaves like a circular buffer: when it’s full, the oldest documents are overwritten. They’re useful for logs, caches, or streams where you want bounded storage and insertion order.

Change streams: what are they used for?

mediummongochange-streamscdc+1

Answer

Change streams let you watch real‑time changes (inserts/updates/deletes) in a collection or database, similar to a CDC stream. They’re useful for reactive workflows, cache invalidation, and event‑driven architectures.

Read concern vs write concern: what do they control?

mediummongoconsistencyread-concern+1

Answer

Read concern controls the consistency/visibility of reads (e.g., local, majority). Write concern controls durability/acknowledgement of writes (e.g., w:1, majority). They trade off latency for stronger guarantees.

Read preference in replica sets: what does `primary` vs `secondary` mean?

mediummongoreplica-setread-preference+1

Answer

`primary` reads from the primary only (stronger consistency). `secondary` allows reads from secondaries (lower latency and offload, but potentially stale data). There are also modes like `primaryPreferred` and `secondaryPreferred`.

Replica set elections: what happens during an election?

mediummongoreplica-setelection+1

Answer

If the primary is unavailable, secondaries hold an election to choose a new primary based on election criteria. During this time, writes are unavailable and some reads may be stale. After election, a new primary accepts writes.

`$lookup`: what does it do and what is a common pitfall?

mediummongolookupaggregation+1

Answer

`$lookup` performs a left outer join between collections in the aggregation pipeline. A common pitfall is large join fan‑out, which can be slow and memory‑heavy. Ensure indexes on join keys and filter early.

Text indexes: when would you use them and what’s a limitation?

mediummongotext-indexsearch+1

Answer

Text indexes enable full‑text search on string fields with stemming and tokenization. They’re useful for search features but have limitations (e.g., only one text index per collection and less flexibility than dedicated search engines).

Unique vs sparse vs partial indexes: what’s the difference?

hardmongoindexesunique+2

Answer

A unique index enforces uniqueness for indexed documents. A sparse index only includes documents that have the indexed field. A partial index includes documents matching a filter expression. You can combine them to enforce uniqueness only for a subset of documents.