MongoDB

Recruitment and knowledge question base. Filter, search and test your knowledge.

Topics

Answer

A document is a JSON‑like (BSON) record of key‑value pairs that can contain nested objects and arrays. Collections hold documents and the schema is flexible. Data is modeled mainly by embedding related data or referencing it across collections based on access patterns.

mediumindexcompound-indexperformance+1

Answer

Indexes in MongoDB store ordered keys to let the database locate matching documents quickly instead of scanning the whole collection. You can have single‑field, compound, multikey, text or geospatial indexes. Indexes speed reads but cost memory/storage and slow writes.

mediumembeddingreferencingschema-design+1

Answer

Embedding stores related data inside one document, giving fast reads and atomic updates but risking large documents and duplication. Referencing stores IDs to other documents/collections, avoiding duplication and supporting large relationships, but requiring extra queries or $lookup.

Answer

The aggregation pipeline processes documents through ordered stages like $match, $group, $project and $sort. Each stage transforms the stream and passes results to the next one, enabling complex analytics similar to SQL.

[
  { "$match": { "status": "paid" } },
  { "$group": { "_id": "$customerId", "total": { "$sum": "$amount" } } },
  { "$sort": { "total": -1 } }
]
hardreplicationshardingscalability+1

Answer

Replica sets provide high availability: one primary replicates to secondaries with automatic failover. Sharding provides horizontal scaling: data is partitioned across shards by a shard key and routed via mongos. You often use both together.

Answer

A document is a JSON-like record (BSON) with fields; a collection is a group of documents (like a table, but schema is flexible). Documents typically have an `_id` as a primary identifier.

Answer

Embed when data is owned by the parent and is usually read together (one query, atomic update in one document). Reference when the related data is large, shared, or grows without bounds.

easyobjectididmongodb

Answer

ObjectId is a 12-byte identifier commonly used as `_id`. It includes a timestamp and other bits to be unique; it’s not strictly sequential, but it’s roughly time-ordered.

Answer

For an index like `{a: 1, b: 1, c: 1}`, queries can efficiently use the leftmost prefix (a), (a,b), (a,b,c). If you skip `a`, the index is much less useful for that query.

db.users.createIndex({ orgId: 1, email: 1 })

// uses prefix (orgId, email)
db.users.find({ orgId: 1, email: "[email protected]" })

Answer

It’s a sequence of stages (`$match`, `$group`, `$project`, ...) that transforms documents and can compute aggregates. Use it for reporting, grouping, filtering, and shaping data on the server side.

db.orders.aggregate([
  { $match: { status: "PAID" } },
  { $group: { _id: "$customerId", total: { $sum: "$amount" } } }
])
mediumttlindexretention

Answer

A TTL index automatically deletes documents after a time based on a date field (or after a fixed expire time). It’s useful for sessions, tokens, temporary data, and logs with retention.

Answer

In a replica set, a new primary is elected if the current one fails. Write concern controls when a write is considered “acknowledged” (e.g., majority), and read concern controls what consistency a read requires — it’s a trade-off between consistency, latency, and availability.

Answer

A good shard key has high cardinality and good distribution, and supports your common query patterns. A common bad choice is a monotonically increasing key (like timestamp) that creates hotspots on one shard.

Answer

They’re useful when you need atomic changes across multiple documents/collections. The cost is more overhead and reduced performance compared to single-document operations; you should model data to avoid needing multi-doc transactions when possible.

Answer

`skip` must walk past many documents, so deep pages get slower. A better pattern is range/seek pagination (e.g., by `_id` or a createdAt index) using “greater than last seen” with sorting.

db.posts.find({ _id: { $gt: lastId } })
  .sort({ _id: 1 })
  .limit(20)

Answer

`$set` assigns a value to a field (and creates it if it doesn’t exist). It updates only selected fields instead of replacing the whole document.

Answer

Projection means selecting only specific fields to return (include/exclude). It reduces payload size, improves performance, and avoids leaking unnecessary data.

db.users.find(
  { active: true },
  { email: 1, name: 1, _id: 0 }
)

Answer

Upsert means “update or insert”: if no document matches the filter, MongoDB inserts a new document (based on the update) instead of updating.

Answer

Documents have a size limit (16MB) and large arrays make documents grow, causing more IO and slower updates. They can also lead to hot documents and contention; prefer separate collections or bucketing for unbounded lists.

Answer

Change Streams let you subscribe to real-time changes (insert/update/delete) in a collection/database. They require a replica set (or sharded cluster) because they rely on the oplog.

Answer

`$push` appends a value to an array (allows duplicates). `$addToSet` adds only if the value is not already present (set-like, no duplicates).

Answer

A unique index enforces that no two documents can have the same value for the indexed key(s). Without it, application-level checks can race and create duplicates under concurrency.

db.users.createIndex({ email: 1 }, { unique: true })

Answer

`$lookup` joins documents from another collection (like a left join). Caveat: joins can be expensive at scale; make sure you have indexes on join keys and consider denormalization/embedding when appropriate.

Answer

`w:1` acknowledges once the primary accepts the write (lower latency, but higher risk on failover). `w:"majority"` waits for replication to a majority of nodes (higher durability, higher latency).

Answer

If updates make a document grow beyond its allocated space, MongoDB may need to move it to a new location, causing extra IO and fragmentation. Avoid unbounded growth, update in place when possible, and model data to keep documents stable in size.

Answer

Read preference controls where reads go in a replica set (e.g., `primary`, `secondary`, `secondaryPreferred`). Reading from secondaries can reduce load on the primary, but you might read slightly stale data depending on replication lag and read concern.

Answer

A text index enables full-text search with the `$text` query and basic stemming/scoring. Limitations: you can have only one text index per collection, and it’s not as feature-rich as dedicated search engines (advanced ranking, fuzzy search).

Answer

Schema validation (e.g., JSON Schema) rejects malformed documents on write, which protects data quality. It helps during refactors and when many services write to the same collection, but you should keep rules backward compatible (allow old and new shapes during migrations).

Answer

`local` can return data that exists on the node you read from (fast, but possibly not replicated yet). `majority` returns data acknowledged by a majority of replica set members, which is stronger consistency but usually higher latency.

Answer

If a query doesn’t include the shard key (or its useful prefix), `mongos` may have to ask many shards and merge results (“scatter-gather”), which increases latency and load. You avoid it by choosing a shard key that matches your common query patterns and by ensuring queries include it.

Answer

It guarantees that a single write on one document is all-or-nothing: readers won't see a “half-updated” document. It does not make multi-document changes atomic—use a transaction when you need atomicity across multiple documents/collections.

Answer

A covered query can be answered using only an index, without fetching the full document. It can be faster because it avoids reading documents from disk/memory. You get it when the filter and the returned fields are all in the same index (and you don’t need any other fields).

db.users.createIndex({ email: 1, createdAt: 1 })

// Only indexed fields are returned => can be covered
db.users.find({ email: "[email protected]" }, { _id: 0, email: 1, createdAt: 1 })

Answer

Early `$match` reduces the number of documents that flow through later stages, so the pipeline does less work. If `$match` is early and matches an indexed field, MongoDB can use the index. Early `$project` reduces the payload size, which can reduce memory and network use.

Answer

They happen when concurrent transactions try to update the same documents/keys, so one transaction must abort to keep isolation. Apps should treat these as retryable: retry the whole transaction (with backoff), keep transactions short, and make side effects idempotent so retries are safe.

Answer

The balancer moves chunks between shards to keep data evenly distributed. Migrations consume CPU/IO/network and can increase latency, especially if you move large or “hot” chunks. Reduce impact with a good shard key (avoid hotspots), monitor migrations, schedule them off-peak, and use zones/pre-splitting when appropriate.

Answer

Embed when the child data is accessed together with the parent and doesn’t grow unbounded (e.g., order with line items). Embedding gives fewer queries and atomic updates within one document. Reference when you need many‑to‑many relationships or unbounded growth.

Answer

A TTL index automatically deletes documents after a specified time based on a date field. Use it for expiring data like sessions, OTPs, or temporary logs to avoid manual cleanup.

db.sessions.createIndex({ expiresAt: 1 }, { expireAfterSeconds: 0 })

Answer

A capped collection has a fixed size and behaves like a circular buffer: when it’s full, the oldest documents are overwritten. They’re useful for logs, caches, or streams where you want bounded storage and insertion order.

mediummongochange-streamscdc+1

Answer

Change streams let you watch real‑time changes (inserts/updates/deletes) in a collection or database, similar to a CDC stream. They’re useful for reactive workflows, cache invalidation, and event‑driven architectures.

mediummongoconsistencyread-concern+1

Answer

Read concern controls the consistency/visibility of reads (e.g., local, majority). Write concern controls durability/acknowledgement of writes (e.g., w:1, majority). They trade off latency for stronger guarantees.

Answer

`primary` reads from the primary only (stronger consistency). `secondary` allows reads from secondaries (lower latency and offload, but potentially stale data). There are also modes like `primaryPreferred` and `secondaryPreferred`.

Answer

If the primary is unavailable, secondaries hold an election to choose a new primary based on election criteria. During this time, writes are unavailable and some reads may be stale. After election, a new primary accepts writes.

Answer

`$lookup` performs a left outer join between collections in the aggregation pipeline. A common pitfall is large join fan‑out, which can be slow and memory‑heavy. Ensure indexes on join keys and filter early.

Answer

Text indexes enable full‑text search on string fields with stemming and tokenization. They’re useful for search features but have limitations (e.g., only one text index per collection and less flexibility than dedicated search engines).

Answer

A unique index enforces uniqueness for indexed documents. A sparse index only includes documents that have the indexed field. A partial index includes documents matching a filter expression. You can combine them to enforce uniqueness only for a subset of documents.