Following on in this series, I found another topic I thought I would cover off from my travels. I guess the theme for this is that ‘just cause its the normal way, doesn’t mean it’s the ONLY way’ with a particular focus on choosing the right tooling from the options that Sitecore ships with. The following is an example of a consistent trend I have encountered and goes a little something like this.
Tree Traversal vs Lucene (or Lucinq or SC7 Search)
Firstly on this point, I will say this – Why are developers so scared of Lucene?? its an amazingly fast solution to a raft of problems and hyper fast to boot. People are quite quick to be jumping up and down about NoSql solutions (which I also love), but as developers you can solve similar problems with Lucene, it is in essence a form of unstructured data store and very mature.
From my background in content management systems in particular, it has been used traditionally to provide the global site search functionality, but with the right approach it can be so much more. Plus, if you don’t like the verbose (java like – surprise!) nature of the API, there are lots of helper libraries to aid Lucene development – just do a search on Nuget ( off the top of my head – Flucene and Lucinq are both pretty straightforward), that said – a copy of the amazing ‘Lucene in Action’ from Manning Press – whilst outdated for Java is still in advance of the current .net version of Lucene and a thoroughly worthwhile read and will get you a reasonable grounding by the time you have read the first few chapters. All this means, even if you are not on Sitecore 7, then you can still get great speed out of your Sitecore Application.
Dis-spell the Myth
Ok, so now we have got it out of your head that Lucene is not solely a free text search mechanism (especially in the vein of google / bing / search engine of choice), and is in fact an ultra fast Document (and by this think Key Value Pair) storage solution with inherent analysis capabilities ( with a use case not entirely dissimilar any other NoSql solution ). We need to look to the case of Sitecore,. In this setting it is aware of just about all of your content to some degree, including link structure, template hierarchy, path etc. To a degree, there are newer solutions we can look at in order to provide much of this functionality (I am personally looking at graphing & mongodb in order to give hyper fast content), but sort of against everything we have been covering so far – in that it is quite a way removed from Sitecore’s shipped functionality thus, mostly right now it’s reserved for my own playing. (It is however a great example of Embracing and Extending Sitecore as I am hooking into ItemProviders which is out of the box functionality within the platform).
The traditional Sitecore approach
There are so many Sitecore solutions in the wild that utilise .Axes.GetDescendants() to retrieve content from the Sitecore database, and worse still – .Axes.GetDescendants().Where() or .FirstOrDefault(). Sitecore by its nature only has a basic linking structure awareness, so when you call GetDescendants() the only reasonable way it can get said descendants is to start tree traversal. It does this by following this kind of idea:
- Get my items children
- For each of my items children, get their children
- And on…. And On…
Way to kill a Sitecore solution (ok – so they are cached – but thats not a lot of help to the poor user who is waiting 10 – 15 seconds for Sitecore to mess about with content), and I ask you for future – please just think about the potential impact of any of the Axes methods and use them very sparingly, consider what their final potential payload will be in production. Yes its great you have fantastic performance in development (where you only have 200 news articles), but what about when the users have entered all their content? It’s very normal to end up with new article structures with tens of thousands of news articles in multiple languages with multiple versions each
So how else can we approach this problem?
You guessed it – Lucene, Lucinq (ok – another shameless plug), SC7 Search, Alex Shyba’s advanced search crawler. Everything in this list works on versions of Sitecore pre v7 (with the obvious exception of Sitecore 7 Search. Lucinq for example will go back to 6.6, and with minor amendments to the source, would even support down 6.2 (was an earlier version of lucene)), so the excuse that your solution is too old – just isn’t generally a valid one. Add to this, as is the title of this post – we are in fact utilising and extending what Sitecore already gives us in Lucene.
Remember – at its core – Database.GetItem() (in particular the id based variants) are relatively cheap, particularly when using the ID based variants. So utilising lucene / lucinq to search and using the ID stored in the returned documents, I have comfortably had Lucene (and lucinq) querying 200,000 + items worth of content returning the first 20 results in well under 50 ms on a regular basis. This alone often means that area’s you may previously have had to rely on an output cache, you can potentially leave alone.
To compound the issues, in many cases (in the Page Editor for example) during the editing experience, Sitecore actively (and rightly) ignores the Sitecore data cache. I remember one such instance (a mega nav component to be exact) that traversed sections of the tree to find the first item with appropriate descendants. Published – this was grand, the Cache took care of it nicely, however, the page editor took over 60 seconds to load. So in this instance, our end user experience was great, but the content editor’s life was hellishly frustrating! Implement this with Lucene, and the caching becomes unnecessary.
Be prepared to use any of the tools at your disposal – in the case of Sitecore – it ships with a whole bunch, not just the most obvious Lucene & Solr (for later versions of SC) but NVelocity, Log4Net, HtmlAgilityPack, Stimulsoft Reports etc. Sometimes it’s also really useful to leverage the power of these widely used tools directly in order to get the most from your solutions.
For another great example of Lucene being used for a real world Sitecore issue – please also check out Simon’s post on Navigation using collectors.
This article is part of the Don’t Fight The Framework Series