Lucene – Support complex navigation using Collectors

Building a navigation system for a news or blog based on years and months (like an archive as an example)  requires that we know what years and months are available for:

  1. The top level – as in when we open the home page
  2. Expanded – whenever we are viewing a month or article page and need to expand accordingly
  3. Dynamically – when we need to build the navigation in one shot, perhaps as an accordion or similar.

Note – we are assuming the following structure:

Year/
    +-Month
    +-Month/
           Article A
           Article B
    +-Month
Year/  
Year/ 

Its possible to code around these by doing Axes calls on the current item and getting the parents and siblings but this doesn’t scale well. Especially if you need to start dropping into sibling months to hide and show depending if they’re empty or not. And we haven’t even talked about languages.

An alternative approach is to use your Lucene index and scan it for all your news items in the current language and group them by year and month and store counts for each. Then you can cache this for even more speed.

In Lucene terms, we are changing the way we process and collect our results and so we need a custom collector. We can do this easily by subclassing ‘Collector’ and setting up some standard defaults for the methods we need to override.

Lets skip to the collector code:

/// <summary>
/// Collects counts for years and months in a Lucene index. 
/// </summary>
public class YearsAndMonthsCollector : Collector
{
    /// <summary>
    /// Standard date time format for loading dates from a Lucene index when stored as a string.
    /// </summary>
    private static readonly String DateTimeFormat = "yyyyMMddHHmmss";

    ///
    /// The field cache for the current index.
    /// 
    private String[] cache;
    
    /// <summary>
    /// The dictionary of dates. This assumes that the numeric date data is converted strings for years and months
    /// </summary>
    public Dictionary<String, Dictionary<String, int>> Dates { get; private set; }

    /// <summary>
    /// Total number of items available.
    /// </summary>
    public int Count { get; private set; }

    /// <summary>
    /// Initializes a new instance of the <see cref="YearsAndMonthsCollector"/> class. 
    /// </summary>
    public YearCounterCollector()
    {
        Dates = new Dictionary<String, Dictionary<String, int>>();
    }

    /// <summary>
    /// Standard date time format for loading dates from a Lucene index when stored as a string.
    /// </summary>
    public void Reset()
    {
        Count = 0;
    }

    /// <summary>
    /// Standard date time format for loading dates from a Lucene index when stored as a string.
    /// </summary>
    /// <param name="docId">The document offset in the index</param>
    public override void Collect(int docId)
    {
        Count = Count + 1;

        String temp = cache[docId].Replace("t", "");

        if (String.IsNullOrWhiteSpace(temp))
        {
            return;
        }

        DateTime result;
        if (!DateTime.TryParseExact(temp, DateTimeFormat, CultureInfo.InvariantCulture, DateTimeStyles.None, out result))
        {
            return;
        }

        String year = result.ToString("yyyy");
        String month = result.ToString("MM");
        if (!Dates.ContainsKey(year))
        {
            Dates.Add(year, new Dictionary<String, int>());
        }

        if (!Dates[year].ContainsKey(month))
        {
            Dates[year].Add(month, 1);
        }
        else
        {
            Dates[year][month]++;
        }
    }

    /// <summary>
    /// Standard date time format for loading dates from a Lucene index when stored as a string.
    /// </summary>
    public override void SetScorer(Scorer scorer) { }

    /// <summary>
    /// Indexes are split over multiple files. We load a field cache for each index and read in the 
    /// date data. 
    /// </summary>
    public override void SetNextReader(IndexReader reader, int docBase)
    {
        // Change the field name to what is in your index: 
        cache = FieldCache_Fields.DEFAULT.GetStrings(reader, "NAME_OF_DATE_FIELD");
    }

    /// <summary>
    /// Instructs the Lucene engine that we don't care about what order we get the data in. This
    /// allows for optimisations
    /// </summary>
    public override bool AcceptsDocsOutOfOrder()
    {
        return true;
    }
}

You’ll note that this isn’t doing the query! You still need to execute a Lucene search and pass in your path, language and template restrictions for the collector to work with. Something like:

BooleanQuery query = new BooleanQuery();

// .... clauses here

using (IndexSearcher searcher = new IndexSearcher("DIRECTORY", false))
{
    YearsAndMonthsCollector collector = new YearsAndMonthsCollector();
    searcher.Search(query, collector);
    
    // Do stuff with the collector....
}

From here you can choose to take the results verbatim or selectively hit the Sitecore db to load more info. Note that I’ve ignored security here to keep things simple (and so should you if you can!).

Using collectors is supported out of the box in Lucinq For another example of collectors, the Lucinq library has a demo DateCollector and associated test fixture. See the files:

In fact, you should see that the collector above is a derivative of the one in the Lucinq library.

Advertisements

One thought on “Lucene – Support complex navigation using Collectors

  1. Pingback: Don’t Fight the Framework Pt III – Sitecore | cardinalcore

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s