Loading blog entries.. loading

MongoDB Schema Design and Common Practices

Friday, August 30, 2013 2:21 AM
Written by Wayne Ye
Font Size: S  M  L 


Exhaustive documentation: http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/

Mongo executables will be installed into /use/bin/, database files will under /data/db/

Log file location: /var/log/mongodb/mongodb.log



sudo service mongodb start/stop/restart or simply issue mongod to start MongoDB process
mongo to enter MongoDB Console
> show dbs  #Show all databases
> db             #Show current db
> db.help()   #View commands related with db

# "Create" a new database named foo_db
> use foo_db  # Mongo will create this DB virutally, if we save anything into any collections, it will really creates the db.
# Create a new document into a collection (implicitly)
> db.user_profiles.save({ first_name: "Wayne", last_name: "Ye" })
# Explicitly create a new collection ()
# Query the collection
> db.user_profiles.find()
{ "_id" : ObjectId("5216fa545b4a83d66587d397"), "first_name" : "Wayne", "last_name" : "Ye" }
> db.user_profiles.save({ first_name: "Wendy", last_name: "Shen", gender: "Female" })
> db.user_profiles.find()
# Update
> db.collection.update( { field: value1 }, { $set: { field1: value2 } } );
# View status
> db.stats()
> db.mycol.stats()
# Query subdocument using Dot Notation
> db.demo.insert({ "Items": [ { "Name": "Milk Powder", "Price": 9.9 }, { "Name": "Toy Car", "Price": 26 } ] })
> db.demo.find({ "Items.Price": { $gt: 20 } })
{ "_id" : ObjectId("5216fa545b4a83d66587d397"), "Items" : [  {  "Name" : "Milk Powder",  "Price" : 9.9 },  {  "Name" : "Toy Car",  "Price" : 26 } ] }
Batch administration from JavaScript
http://docs.mongodb.org/manual/reference/method/  (Mongo shell JS references)

mongo localhost:27017/mydb db_schema.js


load("scripts/myjstest.js") OR load("/data/db/scripts/myjstest.js")

Schema Design

Embedding (de-normalize data)

Store two related pieces of data in a single document.


  • There is a "contains" relationship between entities.
  • There is a "one-to-many" relationship, and the "many" objects always appear inline with the "one".
Example 1: Blog with comments

Denormalized blog with comments

 _id: 1,
 title: "Investigation on MongoDB",
 content: "some investigation contents",
 post_date: Date.now(),
 permalink: "http://foo.bar/investigation_on_mongodb",
 comments: [
   { content: "Gorgeous post!!!", nickname: "Scott", email: "foo@bar.org", timestamp: "1377742184305" },
   { content: "Splendid article!!!", nickname: "Guthrie", email: "foo@bar.org", timestamp: "1377742184305" }
Example 2: Dishes and Cheves

Normalized Dishes and Cheves

 _id: 1,
 name: "Kong Bao Ji Ding",
 price: 5.5,
 rate: 4.5,
 cheves: [ "Flora Zhang", "Cristina Wang" ]
 _id: 1,
 name: "Flora Zhang",
 age: 32,
 avatar: "http://www.gravatar.com/avatar.php?gravatar_id=dc654756c7c",
 dishes: [ "Kong Bao Ji Ding", "Knight Zhang Beef", "Ma Po Tou Fu" ]


Better performance for read operations
Request and retrieve related data in a single database operation.

Referencing (Normalize-data)

store references between two documents to indicate a relationship between the data represented in each document.


  • when embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication.
  • to represent more complex many-to-many relationships.
  • to model large hierarchical data sets.


  • Separation of Concerns
  • Data model independent of logic

Referencing provides more flexibility than embedding; however, to resolve the references, client-side applications must issue follow-up queries. In other words, using references requires more roundtrips to the server. 

Example 3: Books and publisher

Books and publishers

 _id: 1,
 name: "MongoDB Applied Design Patterns",
 price: 35,
 rate: 5,
 author: "Rick Copeland",
 ISBN: "1449340040",
 publisher_id: 1,
 reviews: [
   { isUseful: true, content: "Cool book!", reviewer: "Dick", timestamp: "1377742184305" },
   { isUseful: true, content: "Cool book!", reviewer: "Xiaoshen", timestamp: "1377742184305" }
 _id: 1,
 name: "Packtpub INC",
 address: "2nd Floor, Livery Place 35 Livery Street Birmingham",
 telephone: "+44 0121 265 6484",

Advanced Features


Mongo supports indexing subdocument's key, consider the above "Books and Publishers" collection, Mongo can index the reviewer key by telling him this:

db.books.ensureIndex({ "reviews.reviewer": 1 })

Aggregation framework

A MongoDB aggregation is a series of special operators applied to a collection. An operator is a JavaScript object with a single property, the operator name, which value is an option object. The core point of aggregation framework is the aggregation pipeline which is a framework for data aggregation modeled on the concept of data processing pipelines.

Aggregation was introduced in Mongo version 2.2, below is a table of comparison between Mongo and traditional relational DB from the aspect of aggregation functionalities:

SQL Terms, Functions, and ConceptsMongoDB Aggregation Operators
WHERE $match
GROUP BY $group
HAVING $match
SELECT $project
ORDER BY $sort
LIMIT $limit
SUM() $sum
COUNT() $sum
join No direct corresponding operator; however, the $unwind operator allows for somewhat similar functionality, but with fields embedded within the document.

For example, still using the above "Books and Publishers" example, image I want to query "a specific reviewer with the book(s) he/she reviewed", I can do this:

> > db.books.aggregate({ $unwind: "$reviews" }, { $match: { "reviews.reviewer": "Xiaoshen"} })
 "result" : [
   "_id" : 1,
   "name" : "MongoDB Applied Design Patterns",
   "price" : 35,
   "rate" : 5,
   "author" : "Rick Copeland",
   "ISBN" : "1449340040",
   "publisher_id" : 1,
   "reviews" : {
    "isUseful" : true,
    "content" : "Cool book!",
    "reviewer" : "Xiaoshen",
    "timestamp" : "1377742184305"
 "ok" : 1

Aggregation introduction: http://docs.mongodb.org/manual/applications/aggregation/

One caveat: Aggregation is running upon JavaScript VM, which means - V8 after MongoDB version 2.4, although V8 is deadly fast, it cannot compete with native compiled/optimized C/C++ implementation, refer:

Common Practices 

  • Denormalize data when frequently read together (one-to-one, one-to-many)
  • Normalize data when where are separated queries happened frequently for both entities; or when there are too many data duplications
  • Reduce collection size by always using short field names as a convention. This will help you save memory over time.
  • Avoid using DBRef! Why
  • Always test queries with .explain() to check that you’re hitting the right index.

Useful resources

The ultimate manual

Greate article explains differences between MongoDB and other famous Relational DBs:

Data Modeling Considerations for MongoDB Applications

Serialize Documents with the CSharp Driver

Schema Design --Indexes!!

Sharding and Mongo DB

MongoDB Operations Best Practices

Sharding and Replica Sets Illustrated


Permalink: http://wayneye.com/Blog/MongoDB_Schema_Design_And_Common_Practices 9647 Views  1 Comments
Tag: Category:Programming»Database Technology




  Wayne YeFriday, August 30, 2013 2:44 AM
No doubt it would be a sweet combination for .Net and MongoDB! If there is a legacy .Net app developed 5 years ago, there were 10 tables, considering denormalize part of them and migrate to Mongo, then the storage becomes hybrid, we will absolutely benefit from this hybrid model for much faster query response!

Your view point or opinion?
Nickname *
Gravatar *
Required (not shown), used only for displaying Gravatar and receiving future notification when new comment(s) posted on this blog.
Content *
Current length:     Maximum allowed: charactors