MongoDB Schema Design Pattern - Extended Reference Pattern

發表於 2025-04-06 更新於 2025-04-08 分類於 MongoDB 所需閱讀時間 ≈ 4 分鐘

用一句簡單的話說

將不同 collection 中經常需要一起被 query 的資料複製到主要的 document 中，減少資源的耗損。

場景

將不同概念的資料放在不同 collection 是常見的做法，但有時候會遇到需要同時從不同 collection 中拿資料的場景出現，在關聯式資料庫中會透過 JOIN 做到，而 MongoDB 則透過 aggregation pipeline 中的 $lookup stage 來處理，兩者的概念類似而且都消耗資源。

例如，有一個社交 App 有 posts, users, comments 這三個 collection

// post document
{
  "_id": ObjectId("post_456"),
  "title": "My First Post",
  "content": "This is my first blog post!",
  "author": ObjectId("user_123"), 
}

// comment document
{
  "_id": ObjectId("comment_789"),
  "post_id": ObjectId("post_456"),
  "user": {
    "_id": ObjectId("user_456"),
    "name": "Bob"
  },
  "content": "Great post!",
  "created_at": ISODate("2025-03-01T12:00:00Z")
}

// user document
{
  "_id": ObjectId("user_123"),
  "name": "Alice",
  "email": "[email protected]",
  "profile_pic": "https://example.com/alice.jpg"
}

每次顯示貼文都需要一併顯示對應的評論及 author，這個例子可以使用 aggregation pipeline 中的 $lookup 找出一篇 post 對應的 comment 及 author，但是在多個 collection 中查找資料比較消耗資源，這時就是套用 Extended Reference Pattern 的好時機。

為了實作 Extended Reference Pattern，在 post document 中

將 author field 改成 subdocument 放入對應的 user 的部分資料
新增一個 comments field，放入該 post id 對應的所有 comment

{
  "_id": ObjectId("post_456"),
  "title": "My First Post",
  "content": "This is my first blog post!",
  "author": {
    "_id": ObjectId("user_123"),
    "name": "Alice",
    "profile_pic": "https://example.com/alice.jpg"
  },
  "comments": [
    {
      "_id": ObjectId("comment_789"),
      "content": "Great post!",
      "user": {
        "_id": ObjectId("user_456"),
        "name": "Bob"
      }
    }
  ]
}

如此一來僅需要對 post collection 進行 query 就能拿到所有的資料，節省了不同 collection 之間查詢的資源消耗也減少了 query 的延遲。

資料變更策略

然而，這樣會造成資料的重複，當 user document, comment document 被更新的時候需要一並更新 post document，必須思考的事情是：

在資源更新的時候有哪些 extended references 需要同時被更新
這些 references 需要及時馬上被更新嗎?

在資料變更時需要思考同步資料的策略。

如果影響的範圍很小，在 application code 中直接更新所有 referenced documents 是可行的，但要確保變更涵蓋到所有的 collection。

再來，MongoDB 的 change stream 也是一個選擇，這邊不會詳細介紹，但它可以監聽特定 collection 的變更，當主 collection 的數據更新時，可以觸發對相關 collection 的同步。

相反的，如果你的更新不需要非常及時，可以安排一個定期任務去執行資料的同步。

最後，為了可維護性考量，如果可以的話確保被 reference 的資料是不會經常改變的，並且盡量最小化重複的資料。

影片說明 (en)