Something I’ve been involved with quite a lot is making multi-lingual sites work in Sitecore. Building a multi-lingual site touches on quite a lot of parts of Sitecore and really pushes your understanding. So lets start with the basics:
- Items – versions and specific attention to shared, versioned and unversioned fields
- Deciding whether to have separate trees per language or blend them together in one tree (or even a mix)
- Translating content and whether you need to fall back to a language if a version in the current language isn’t available
- Other niggles we’ll come to
With the hindsight of having built a few sites I’ll offer that I think the most important thing is whether to build the sites as separate trees or not. Generally, if you can guarantee that the tree is going to be mostly the same for all languages then you can argue that the same tree is a good start. However, if you think any language will be significantly different, that language should be in its own tree.
My main argument for that is that without writing custom fields, the link fields (multi-list, tree list etc) are not language aware. Recall that an item will appear in the tree for all languages and then you have to test if a version exists in your API code. This can result in authors linking to items that don’t have a version. That may be ok if you use language fall back but this can a fair bit of confusion. This in turn drives a few more decisions. One example would be data source items (such as for a carousel or accordion) being in language specific folders – though there are more ways to skin a cat :).
An additional consideration for how you structure your tree is what happens to how content is pushed out. Should you go for a combined tree approach then you’ll need to find a way of identifying the language version in the url. Built into sitecore, this can be done via the query string parameter ‘sc_lang’ but for SEO you probably need to go for a URL prefix. Examples include ‘en-EN’, ‘fr-FR’ and so on. This would look like http://www.example.com/en-EN/page.
Some projects require that the second part is omitted and you have ‘en’ or ‘fr’ but this isn’t strictly correct. French exists as a language in multiple countries such as Canada and France (to name a few). That’s ok but its not the only difference. Because we will be formatting numbers and dates according to the language as well and different places have different number formats, date formats and the like. So if you really need this then you’ll have to consider adding a custom language resolver to the begin request pipeline.
Its worth considering this anyway so that you can choose to ignore the accept language headers passed across by the browser and let the user choose their language instead (perhaps saving to a cookie).
You’ll also want to consider setting up your link provider to always embed the language code rather than set to ‘as needed’. For installations where multiple sites are running and have different strategy requirements, you’ll need something a bit more complex. In my own work I’ve had to create a site aware provider factory and so have settings per website host with a fallback if one isn’t configured. But this is normally for much larger or more complex installations.
Custom language codes
One interesting variation on all of this is using custom language codes. Consider the scenario where you would like to present content in non-english speaking region in English as well as the native tongue. Though a bit of an edge case it can happen and its relatively easy to do. Given that Sitecore is a .NET language, it has access to the language/region settings that .NET does. Great until you look it up and realise that .NET doesn’t have them all :). So edge case or not this is something you’ll need to know!
In all its quite simple, you’ll need to create a new region in Windows and then create an associated language entry in Sitecore based on that. The specifics of the formatting will go in windows and .NET can pick them up. See resources.
A different approach that I used in a recent MVC project was to read the incoming HTTP header and perform a lookup on the language code against a global dictionary. If it wasn’t supported, I had decided on a fallback region code. This can be a suitable half way house if you don’t have the time or budget or even access to the server.
Talking about versions, you’ll need to factor in making sure you are always checking that you are loading versions in the current language. That’s ok if you’re dealing with the context item but if you’re doing any axis based query or even just loading the children you’ll need to update your code (or write more!). With that in mind it’s also worth bringing up fast queries at this point. You won’t be able to use them to query between languages so if you’re using them, you’ll still need to do version checks. This alone can introduce performance problems if the application is always hitting the database to get the latest version.
In the projects I’ve worked on, I’ve alleviated a lot of this by using the Lucene index to help load items. Its not a total solution but for the more complex loading parts of the application, throwing in a template and language restriction as part of a query can, once you’re used to the syntax, be a real life saver.
With all of that, you’ll want to be thinking about the content. Hopefully you’ll have native language speakers for each language but if not there are a number of translation modules for Sitecore commercially available.
Regardless, I’ve made use of a simple dictionary system which helps with the translation of non content items such as labels, field text (placeholders) button text and the like. Though Sitecore ships with its own dictionary I much prefer to have dedicated folders in Sitecore that have content items with a single value field. Then use custom controls or Html helpers (webforms and mvc respectively). This is a lot simpler to maintain as its easier to hand over responsibility to content editors without worrying about overly complex security and publishing roles. Just ensure the item can’t be deleted and you can treat it as you would any other content.
Certainly if you’ve decided to have two core databases, one for authoring and one for live.
In fact I will be posting about this later this week in the ‘don’t fight the framework’ series where we’ll expand on this a lot further and also consider how we can use a dependency container (Autofac) to help bring these translations into our code.
Almost every Sitecore site I have built makes use of the modules from the marketplace. If you are making use of any, you will need to review the source code oto ensure language support is ok, assuming you don’t do this by default. Needless to say if any aren’t language aware then you’ll want to make sure that they are. Which is also a great way of contributing back to the community.
There’s a lot to languages and I’ve barely scratched the surface so check out the resources below to give yourself a head start:
- Language fallback module
- Custom cultures in Windows and .NET
- Custom language registration module (I have never used this)
- The SDN 🙂
Quick fire checklist
Here’s a check list for things you’ll need to understand. Don’t be put off!
- Versioned, unversioned and shared fields
- How to get the latest version of an item in the context language
- How to register a site in the web config
- How to modify the link provider settings both in the API and in the web config
- Creating standard values and remembering to do this for each language!
- Create help text for fields (and considering doing this in each language!)
Feel free to reply with your own opinions and recommendations. Or better yet share some of the sites you’ve worked on and how you’ve approached this :).
This article is part of the Don’t Fight The Framework Series