Production considerations – Mongo and xDB part four

There’s a massive amount of information regarding Mongo in production. The topic can get quite complex in a short space of time too, so I’ve attempted to distil some of it down into a manageable chunk. Whenever we think of moving to production we think of user security, network security, capacity or resource usage and allowing the system to scale. There are other topics too but this is a good place to start as the rest can be specific to each environment you’re working in.

Scaling MongoDB

Vertical scaling entails increasing the size of the server the Mongo instance is installed on. When the server is reaching its capacity, the server is taken down, re-provisioned and then brought back online. This can be done without downtime provided the server is part of a replica set.

Horizontal scaling entails taking a Mongo collection and spreading its data across multiple servers. Here the data is spread across multiple disks and so each server need not be as powerful at the expense of having more servers. This process is known as sharding. Sharded servers can be added and removed from a set of sharded servers as required. Each sharded server may also be a replica set in its own right. Additionally, a sharded system will also have three configuration servers to manage the meta data for that shard.

Its also possible to scale further and spread these replicated and sharded instances across multiple data centres. However if you’re reaching this stage then you should hopefully have engaged a reliable third party or even the Mongo company themselves to manage this for you.

Replication

Mongo has its own flavour of replication, data partitioning and failover tooling. Replication is achieved by creating what is known as a replication set. Sitecore traffic is recorded in a primary server and replicated across to the secondaries. Unlike other products, Mongo introduces some complexity here as it has a voting system for whenever there are problems with the replication set. So if the primary goes down, the secondaries have to vote and reach a majority before a new primary is elected.

So for a replicated server installation you will need at least four servers. One primary, two secondaries and an arbiter. The arbiter doesn’t record data but votes for one of the other secondaries.

Sharding

In a product like SQL Server, it is possible to move data in a database between disks and partitions by creating file groups. Mongo has its own version of this process called sharding. The concept is that there are multiple mongo instances that have a shard of the data set and then queries are routed to the correct group of shards. This requires even more servers including meta data servers and a routing server. It’s also possible in more complex scenarios that each of these shards is a replicated set in its own right.

This can create an explosion of server instances when taking replication into account as well.

Security

Security is always a tricky topic. Out of the box Mongo will install with local admin rights enabled so you have to specifically turn security on. The first incarnation for this is a simple username and password set up which I recommend you use in local dev environments anyway. Moving to production there are a number of options and it depends on your own network topology, internal requirements and whether you ultimately decide to outsource it.

In addition to usernames and passwords (called challenge response), MongoDB also supports LDAP, Kerberos and X-509 certificates. For ordinary developers this might seem overwhelming but for a production installation you’ll be working with network engineers and security staff who will all be, hopefully, familiar with this.

MongoDB also supports SSL though if you are using Windows you will need Mongo Enterprise v2.6+. On other platforms you’ll need the Enterprise version or you’ll need to compile it yourself.  Of course you can compile it on Windows too.

Finally for SSL you’ll need a certificate that has a minimum of key length of 128-bits. Setting up SSL is covered in a lot of detail in the documentation, start here.

For more on this refer to the security documentation.

You’ll notice I’ve not mentioned firewalls and the like. I’ve deliberately kept these out of these notes as these will be covered by specialists. If you do find yourself in the situation where you are also in charge of your IT infrastructure then it will be familiar to you anyway.

Using an external supplier

Taking all of this into account, it can make a lot of sense to outsource the MongoDB hosting requirements to a reliable third party such as MongoLabs. However keep an eye on the pricing options as this can get expensive quickly. So its important to evaluate your applications requirements, what sharding and replication you actually need. As a developer you may not be involved in the writing of the business case for the expenditure but you should be involved in advising what features your client application will use, how it uses the server and how much load you expect to generate.

Finally…

As I stated at the beginning, this is a large topic so I encourage you to dip into the docs and look at the offerings of third party hosts to get a feel of what is required. If you have the resources, I also recommend setting up a local VM and playing with your own ‘production’ system.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s