There are a few ways you can make a backup with Mongo. The most simple is to take a copy of the database files and put them somewhere safe. This is made easier if you’ve put each database into its own directory. However, this is a pretty naive method as you will have to stop the server and restart it. You also will be taking a full backup. That said, this method does have its uses in a scenario where you are messing around with a secondary but that’s beyond the scope of this short post.
Its possible to extend this a little by using mongodump against the file system. The tool will lock the files so the database cannot be in use at the time but it at least provides a binary export of the data.
For a developer scenario, its more likely that we’ll be using mongodump, mongorestore, mongoimport and mongoexport against our running database. We may need some data from a co-worker or from the live system or we may be the provider of some data to another member of our team.
In my previous post I talked about the impact of how Sitecore runs and the tradeoffs that has when using Sql Server. These aren’t bad points but they do mean you have to be able to peek under the hood to keep everything running as best as you can.
So now I’m going to share some of the procedures I use to help me maintain the databases I’m responsible for. Its worth noting that I’m no DBA. I’ve been using Sql Server for years and a combination of curiosity and my employers needing someone to step in and figure things out has motivated me to pick this topic up. I’m definitely standing on the shoulders of giants in this area but we all have to start somewhere.
In case you weren’t already aware, Sitecore is an object based database (the item). The result is that everything is expressed in the database as a ‘thing’ with fields, whether shared, unversioned or versioned. Whether we publish, make a package or replicate our data, we need to uniquely identify each one of these items and so use a UniqueIdentifier as our ID type.
Whilst this has some advantages, when it comes to Sql Server it has a few performance issues when used as a primary key. We can’t rely on a sequential ID (whether numeric or a psuedo guid) due to potential collisions between systems and so we have a clustered index on a random data set.