MongoDB Schema Design Pattern - Computed Pattern

發表於 2025-04-06 更新於 2025-04-08 分類於 MongoDB 所需閱讀時間 ≈ 3 分鐘

用一句簡單的話說

在更新 document 時預先計算業務邏輯所需的資料並儲存起來，方便下次能快速讀取。

場景

有時候會需要將原始資料做些計算，如果每次讀取都計算一次可能會降低效能，尤其在 large data set 的時候，這時就能使用 computed pattern

computed pattern 很適合用在讀取頻率遠大於寫入頻率的應用，例如 comments in social media app, products in E-commerce app，取得留言的請求遠大於輸入留言，瀏覽商品評論的請求會遠大於寫入評論，所以我們願意犧牲部分寫入所花費的時間能換取更快的讀取速度

來看看實際上的資料處理流程

例如下面的一個 e-commerce app 其中的 products collection

// products collection
[
  {
    product_id: 1000,
    name: "a good product",
    rating: {
      rating_count: 892,
      avg_rating: 4.9,
    },
    price: 300,
    store: "online",
  },
  {
    product_id: 1001,
    name: "a normal product",
    rating: {
      rating_count: 120,
      avg_rating: 3.4,
    },
    price: 120,
    store: "brick-and-mortar",
  },
  {
    product_id: 1003,
    name: "another normal product",
    rating: {
      rating_count: 77,
      avg_rating: 4,
    },
    price: 1100,
    store: "online",
  },
]

而我們好奇每個商品的 avg_rating 是多少，與其在每次讀取的時候計算這個值，我們會更傾向在寫入評論時更新該值，以換取更快的讀取速度

// update document when writing
{
  product_id: 1000,
  name: 'a good product',
  rating: {
    rating_count: 892 + 1,
    avg_rating: (4.9 * 892 + 5) / (892 + 1)
  },
  price: 300,
  store: 'online'
}

有時候計算的頻率或許不用這麼頻繁，例如利害關係人想知道不同商店種類中所有商品的 rating count 總和，但允許每小時更新一次就好，可以用以下的 aggregation pipeline 產出，並且每小時執行一次就好

const get_store_rating_count_pipeline = [
  {
    $group: {
      _id: '$store',
        'store_total_rating_count': {
          $sum: '$rating.rating_count'
        }
      }
  },
  {
    $merge: {
      into: 'statics',
      on: '_id',
      whenMatched: 'replace',
      whenNotMatched: 'insert',
    }
  }
]

// output -> statics collection
{
  _id: 'online',
  store_total_rating_count: 969
},
{
  _id: 'brick-and-mortar',
  store_total_rating_count: 120
}

這樣的計算可以在任意時間執行，而且也不會影響到原始資料。

影片說明 (en)