Concept of Sharding in MongoDB

A single set of data can be stored in multiple machines. MongoDB supports such an essential feature concerning to database. MongoDB fulfills the approach to meet the demand for data growth. In this chapter, you will learn about this MongoDB feature name - sharding.

What Is Sharding?

Sharding is an approach of distributing data across different machines. In other words, it can be said that the sharding concept is used for splitting large data sets into undersized data sets across several MongoDB instances. This concept is employed for supporting deployments of data having huge data sets that perseveres high throughput operations.

What Are Shards?

There are database situations where the data sets in MongoDB will grow so massive, that MongoDB statements, operations, and queries against such large data sets can cause a tremendous amount of CPU utilization on that particular server. Such situations can be tacked in MongoDB using this concept of "sharding", where the data sets are split across numerous instances of MongoDB. That extensive collection of data sets can be actually split across several small-sized collections called "Shards". But, logically, all the shards perform the work as a single collection.

Ways of Addressing System Growth

There is a parallel concept that is implemented for understanding sharding. System growth or scaling (which is used to increase the efficiency and working power of a system) can be addressed in 2 different ways. These are -

  1. Vertical Scaling engages escalating the ability of a single server by making use of a more powerful CPU, where additional RAMs are incorporated, as well as the amount of storage has also increased. Restrictions
  2. Horizontal Scaling engages segregating the data set of a single system as well as its workload over several servers, where similar servers are interconnected for increasing the capacity as per requirement. Here each server may not have high speed or capacity, but all of them together can handle the workload and provide efficient work than that of a single high-speed server.

Hence, sharding uses the concept of horizontal scaling to support the processing of large data sets.

Why the Sharding Concept Needs to Be Adopted

  • Data replication can be done where master node absorbs and stores all the new data
  • Simple queries can perform the task efficiently since data sets are segmented into small size
  • A particular replica set has a restriction of 12 nodes only
  • Memory cannot be outsized enough in case your dataset is large
  • It becomes too expensive to have a vertical scaling
  • Local disk won't be large enough and so multiple servers can be an alternative

Implementing the Concept of Sharding

Implementing the concept of sharding can be done with the use of clusters (which can be defined as a collection of MongoDB instances). Various shard components include:

  • The shard - which is a MongoDB instance that embraces a part or subset of the data.
  • Config server - which is a MongoDB instance holding the metadata regarding the cluster. It holds various MongoDB instances that are associated with shard data.
  • A router - is mainly accountable for re-directing the instructions sent from the client to the authoritative servers.

Example of Sharding Cluster

  1. Construct another database for your config server using the command:
    mkdir /data/configdb
  2. As configuration mode initiates the MongoDB instance. Let suppose; the server name is ServG:
    mongod -configdb ServG: 27019
  3. Initiate your mongos instance by identifying configuration server:
    mongos -configdb ServG: 27019
  4. Connect to the instance of Mongo from mongo shell:
    mongo -host ServG -port 27017
  5. When your other servers (Server K and Server R) are ready to be added with the cluster, type the command:
    sh.addShard("ServK:27017")
    sh.addShard("ServR:27017")
  6. Allow the sharding for your database. If you want to shard the database techwriterDb the command will be:
    sh.enableSharding(techwriterDb)
  7. You can also allow sharding for your collection like this:
    sh.shardCollection("db.techwriter" , { "writerID" : 1 , "writername" : "James Gosling"})

Scroll Back to Top