Securing and navigating around xDB – Mongo and xDB part two

Peeking around

There are a number of options for navigating your way around Mongo. Depending on what you’re most comfortable with, you may prefer using command line tools and batch files or you may prefer GUI tools like RoboMongo, MongoChef or MongoVue. That said, for the more complex operations you’ll have to be comfortable with the command line tools unless you’re hosting with a third party.

The Mongo eco-system is based on a Javascript ‘shell’. This means that all functions, loops and the like follow the normal Javascript syntax rules. It also means that reading any custom functions and profile scripts should be easy for anyone who is familiar with Javascript for web development. Before we continue and look at some simple operations, its useful to repeat some Mongo definitions. Naturally these are all available on the Mongo documentation website so I’ll be brief:

  • Database – the same concept as in a relational product like Sql Server.
  • Collection – similar to a table but doesn’t have a schema
  • Document – a record in a collection.
  • Index – the same concept as in a relational product

When we want to insert data into Mongo, we create a new document and add it to a collection. Mongo also allows us to embed one document inside another which can be useful but does require some care when constructing queries and designing indexes. Once we have some documents, we can pull them out using queries (as you’d expect). These queries can have a bit of a ‘linq’ feel to them due to the use of the Javascript shell. Here’s an example:

use myusertable
db.users.find({"Country": "New Zealand"}).sort( { "Surname": 1, "Firstname": 1 } )

Continue reading

Advertisements

Getting started – Mongo and xDB part one

With the release of Sitecore 7.5 we saw the introduction of NoSQL to the Sitecore ecosystem. Sitecores platform of choice is MongoDB, an open source document database that runs as a separate server. MongoDB stores all of its data as memory mapped files in a format very similar to JSON (called BSON).

By default, Sitecore Analytics now requires both SQL Server and MongoDB to operate. MongoDB provides much of the data capture and storage and SQL Server the aggregation of that data. Additionally, the ECM and WFFM for Sitecore 7.5 now also use the xDB platform and so also use MongoDB by default.

Its worth knowing this as any implementation of Sitecore 7.5 (or an upgrade to it) will need the infrastructure to support a Mongo installation. Reading through the xDB documentation, it isn’t immediately clear what is required of an xDB infrastructure and it does require quite a bit of background reading. This first post, of a series, is intended to give you a background in some of the items you’ll need to be aware of and also how this relates to Sitecore.

Continue reading

Lucinq Features – Date range searching

Pulling items by date or date range out of Lucene is quite a useful process. To do it accurately, we need to store the date values in a field that can be parsed appropriately. In Lucinq, our fluent Lucene wrapper, we currently achieve this by converting the DateTime value into an integer using the ticks property. This has a resolution down to 100ns which should be enough for anyones needs!

Using this, it allows us to create a range based search by comparing the ticks values between two dates. Then, for Lucene this is a simple integer comparison (taken directly from our test suite):

LuceneSearch luceneSearch = new LuceneSearch(IndexDirectory);

IQueryBuilder queryBuilder = new QueryBuilder();
DateTime february = DateTime.Parse("01/02/2013");
DateTime end = DateTime.Parse("28/02/2013");

queryBuilder.Setup(
	x => x.WildCard(BBCFields.Description, "food", Matches.Always),
	x => x.Filter(DateRangeFilter.Filter(BBCFields.PublishDateObject, february, end))
);

ILuceneSearchResult result = luceneSearch.Execute(queryBuilder);
List<NewsArticle> data = Mapper.Map<List<Document>, List<NewsArticle>>(result.GetTopItems());

This is similar to the way you’d think about doing it in SQL. Behind the scenes, SQL Server stores datetime values with very high precision.

Its worth noting that we’ve used the date based range as a filter in this example. We can also apply a date range directly and have it work like so:

LuceneSearch luceneSearch = new LuceneSearch(IndexDirectory);

IQueryBuilder queryBuilder = new QueryBuilder();
DateTime month = DateTime.Parse("01/02/2013");

queryBuilder.Setup(
	x => x.DateRange(BBCFields.PublishDateObject, month, month.AddDays(28)),
	x => x.WildCard(BBCFields.Description, "food")
);

ILuceneSearchResult result = luceneSearch.Execute(queryBuilder);
List<NewsArticle> data = Mapper.Map<List<Document>, List<NewsArticle>>(result.GetTopItems());

I think this is a good example of picking the correct field type for the job. By avoiding using a string we gain a lot more relevant operations without issues in casting the data. A case study for this would be a project I completed some time ago where cooking times needed to act as filters for a search. We were unable to achieve consistent results until the cooking time was stored as an integer value rather than a string. Which is something you would expect in code but potentially neglect in a search index.

Resources

Sitecore conventions – build your own pipelines

This post is a follow up the post on using workflow I wrote a little while ago. You can read it here.

When you look deeper under the hood you’ll see that Sitecore does quite a lot of things via pipelines. A brief scan of the web.config will show a wide selection ranging from logging in, begin request, rendering fields and so on. These pipelines make it very simple to define a process and to support customisation and patching of that process. As a consequence its something that can be quite useful to include in our own code when we need to perform our own custom steps.

In the previous article, I wrote about how I was talking with another developer about sending newsletters from Sitecore. The original third party had done this as a publish step and checkbox. So the editor would check the box, publish and then the custom code would uncheck the box, save the item and then send the email. However I discussed how there were a number of issues with this approach. In this article, we’ll build on this and see how using pipelines can help.

Continue reading

Unit Testing – Thoughts from an opinioniated developer

Getting started

Unit testing is one of those things that we know we should be doing. However, time and project pressures often result in various reasons why many don’t do it.

Having worked and consulted in a variety of teams I often find myself dealing with situations like this by asking the following questions:

  1. How do you know your software is working?
  2. How do you know if your software is broken?
  3. How do you know if someone elses change won’t break your hard work?

Generally, these are the three important questions that I try to answer when writing my own unit tests. Whether for Sitecore or not. The content of these tests is to try and prove it. Initially, writing a quality unit test can be hard. A quick search on Google will tell you that a unit test should be small, repeatable, focused and relevant.

That’s a useful guideline but we need a bit more information to get us started. To reinforce this, lets take an aside and consider a project I worked on for a major UK supermarket retailer.

In this project, the customer mandated 95% code coverage across the entire codebase. So the developers were focused on reaching this percentage and would be testing constructors and the like. More importantly they were jumping through hoops to test all their conditional branching, magic flags and so on. They were doing this
as they had code colouring turned on and could see which lines weren’t covered.

So, what to take away from this? Well firstly they weren’t writing their code to be testable. This should be our first rule:

Write your code to be testable

Next lets consider how they were getting code coverage. They were looking at how to test each individual line rather than consider what the code as a whole was attempting to do. If we stand back, we can see how the code is expected to operate under normal conditions. So that’s our second rule:

Test your code under normal conditions with expected parameters

By now we should have an idea of what our third rule should be! Now we start to think about how our code can break. What if a parameter is null? What if we have an unexpected value somewhere? We need to test the code under unexpected conditions and understand where and how it can break.

Test your code under unexpected conditions and with worse case scenarios

This is a very useful step. If for nothing else its something that you might have to cover at code review. It also means you can diagnose a production system more easily because in handling your unexpected conditions you have the opportunity to log, throw custom exceptions, raise alarm bells (and so on).

At this point, you should be in a position to have quite a reasonable level of code coverage but more importantly the tests in place are relevant, focused and they should be repeatable.

So lets consider the next rule of unit testing:

What am I actually testing?

No code sits in isolation. We access databases, save to files on disk, send messages, create pictures and so on. The list is endless. Because of this, we are always going to be calling third party code and integrating with third party systems.

So should we test them? No. Should we test our integration with them? Possibly. Should we test our calls to the integration layer? Yes. Provided we have abstracted our integration behind our own interfaces (such as a repository object or service object).

Encapsulate what ‘varies’ behind an interface, code to that interface and then write unit tests for code that depends on that interface according to the rules above. And if you only get 85% code coverage well the remaining 10% is a judgement call between you and your customer/employer!

So how do we write testable code?

Well written code is something that takes time and experience to produce. To start with testable code we could go back and cover writing code according to SOLID principles. Lets put that at the back of our minds for a moment and consider it in more abstract terms.

One method of writing testable code is to write the test first. This puts us into the mind set of thinking about how the contract will be used by calling code. Here we have an idea of our parameters, return types and the like. In fact many unit test advocates would argue that this is the only way to do it. Of course in reality it depends on where you work and how defined your designs/stories are.

Another tool on our development shelf is to consider how we might build or refactor the code internally. One example is switching on magic strings or enum values. Can we replace this with polymorphic calls and remove conditionals? If so, we can make our test cases easier to produce because we don’t have to setup our unit test to have mocked data that handles both sides of the conditional.

If we don’t do that, our test is inherently harder to write. So its slower to produce and as such a barrier to producing a unit test.

SOLID code

Part of SOLID design is to create classes that have single responsibilities. Then we create a contract for them and depend on that contract. Letting us change the behaviour of our application as long as the contract is maintained. In a unit testing scenario, this lets us insert fakes/mocks more easily.

In writing testable code its important to strike a balance point between breaking the problem up into contracts you depend on and having an explosion of classes. One code smell is a constructor with many dependencies. If you are in this situation its not always bad – but for greenfield code its certainly something you should look at and review with colleagues.

Not only does this make your unit test harder to setup it can also mean your interface contracts are not fit for consumption. At the lowest level in your API, this is fine but as you go up towards the consumer I strongly recommend that the interfaces define key application ‘operations’ rather than lots of smaller methods.

This is obviously project dependant and some projects you want a simple external interface and in others you don’t. But even so, its worth having at the back of your mind.

Summary

These are just some of my thoughts about unit testing and how it relates to software design. I’ll follow these up with some thoughts on designing your software and my take on bottom-up vs top-down design and how that affects your project scalability.

Dont Fight The Framework – Sitecore strategies for multilingual sites

Something I’ve been involved with quite a lot is making multi-lingual sites work in Sitecore. Building a multi-lingual site touches on quite a lot of parts of Sitecore and really pushes your understanding. So lets start with the basics:

  • Items – versions and specific attention to shared, versioned and unversioned fields
  • Deciding whether to have separate trees per language or blend them together in one tree (or even a mix)
  • Translating content and whether you need to fall back to a language if a version in the current language isn’t available
  • Other niggles we’ll come to

Continue reading

Dont Fight The Framework – Sitecore Events and Workflow

Nat and I have worked together for quite a while now and Don’t Fight the Framework has become a running meme between us almost since day one.

In this post I thought I’d take a side step on this and talk about an example of going against best practise. Admittedly, best practise is often convention based so it can be hard to know what is the right and wrong way. As in many cases there is the right way and the better way, which is usually the case in Sitecore.

Before I continue, I want to point out that there is nothing wrong with creating event handlers for saved and published events. Running code from the rules engine, moving items, creating items etc are all good examples (and I use them). But you must be aware of the consequences and code accordingly. That said, they can also used for the wrong reasons as we will discover.

Continue reading

Lucene – Support complex navigation using Collectors

Building a navigation system for a news or blog based on years and months (like an archive as an example)  requires that we know what years and months are available for:

  1. The top level – as in when we open the home page
  2. Expanded – whenever we are viewing a month or article page and need to expand accordingly
  3. Dynamically – when we need to build the navigation in one shot, perhaps as an accordion or similar.

Note – we are assuming the following structure:

Year/
    +-Month
    +-Month/
           Article A
           Article B
    +-Month
Year/  
Year/ 

Its possible to code around these by doing Axes calls on the current item and getting the parents and siblings but this doesn’t scale well. Especially if you need to start dropping into sibling months to hide and show depending if they’re empty or not. And we haven’t even talked about languages.

An alternative approach is to use your Lucene index and scan it for all your news items in the current language and group them by year and month and store counts for each. Then you can cache this for even more speed.
Continue reading

Sitecore – Maintaining Sql Server indexes

In my previous post I talked about the impact of how Sitecore runs and the tradeoffs that has when using Sql Server. These aren’t bad points but they do mean you have to be able to peek under the hood to keep everything running as best as you can.

So now I’m going to share some of the procedures I use to help me maintain the databases I’m responsible for. Its worth noting that I’m no DBA. I’ve been using Sql Server for years and a combination of curiosity and my employers needing someone to step in and figure things out has motivated me to pick this topic up. I’m definitely standing on the shoulders of giants in this area but we all have to start somewhere.
Continue reading

Sitecore – The impact of an object based CMS on Sql Server

In case you weren’t already aware, Sitecore is an object based database (the item). The result is that everything is expressed in the database as a ‘thing’ with fields, whether shared, unversioned or versioned. Whether we publish, make a package or replicate our data, we need to uniquely identify each one of these items and so use a UniqueIdentifier as our ID type.

Whilst this has some advantages, when it comes to Sql Server it has a few performance issues when used as a primary key. We can’t rely on a sequential ID (whether numeric or a psuedo guid) due to potential collisions between systems and so we have a clustered index on a random data set.
Continue reading