About the SEO-friendly URL resolver issue, StripLanguage pre-processor and the AlwaysStripLanguage setting...

14 maart 2014 om 00:00 by Ruud van Falier - Post a comment

I recently discovered a serious issue with our SEO-friendly URL module (which was fixed the next morning).
Solving that issue has taught me something new about the way Sitecore handles incoming requests, that Sitecore has recently made some changes to this behaviour and that you shouldn't use a certain combination of configuration.

Enough reasons to share this information with others!

Affected instances

Before I continue I would like to point out that the issue only affected instances that have the SEO-friendly URL module version 1.0.4 or later installed and configured the LinkProvider to use languageEmbedding="always" or languageEmbedding="asNeeded", in combination with languageLocation="filePath" and forceFriendlyUrl="true".
If your instance matches those criteria, you must upgrade to the latest version of the module as soon as possible!
In all cases, it's always recommended to upgrade to the latest version.

The issue

So what was the issue exactly?
We were using a version of the SEO-friendly URL module that was compiled in our own solution and wasn't updated in a while.
When I upgraded our site to Sitecore version 7.2 last week, I also changed to the latest version of our module.
All of a sudden all requests to our front-end pages triggered an infinite redirect loop.
I started looking for the cause by debugging our ItemResolver processor to see how the request was being handled.

Keep in mind that all our URL's are prefixed with a language code.
For example, the URL for the /sitecore/Content/Home/Blog item is either /nl/blog or /en/blog, depending on the context language.
So we have our LinkProvider configured to use languageEmbedding="always".

Breaking into the ItemResolver.Process method when requesting http://www.partechit.nl/en/blog, gave me these HttpRequestArgs values:

args.Url.FilePath = "/en/blog"
args.Url.ItemPath = "/sitecore/Content/Home/blog"
args.Context.Request.Url = "http://www.partechit.nl/blog"

That was not the data that expected; surely my request URL was http://www.partechit.nl/en/blog (language code embedded) and not http://www.partechit.nl/blog?

The item path is correctly resolved, so you would think there is no problem here, but our SEO-friendly URL module can be configured in such way that it forces all items to be requested using their friendly form.
So if I request /Blog.aspx, for example, it is redirected to /en/blog.

Now take another look at the debug data and see what happens:

  1. The args.Context.Request.Url is set to http://www.partechit.nl/blog...
  2. ...and is resolved to /sitecore/Content/Home/Blog,
  3. but actually the friendly URL for that item is http://www.partechit.nl/en/blog (because that is what our LinkManager returns due to the languageEmbedding="always" setting).
  4. So our module redirects the request to the friendly URL...
  5. ...and the requests hits the ItemResolver again where the args.Context.Request.Url value has yet again already been changed to http://www.partechit.nl/blog.
  6. The module again forces the friendly URL...
  7. and there we have our infinite redirect loop!

StripLanguage pre-processor

Okay, so it's clear why the infinite loop occurs, but why does the request URL appears to be different from the one that I really requested?
It turns out that the StripLanguage processor in the <preProcessRequest> pipeline is responsible for that.
I will explain a little more about this processor later on in this post, because its logic has recently been changed, but all you need to know for now is that:

  • The pre-processor extracts the language code from the URL,
  • uses the language code to set the context language,
  • then does an internal rewrite to the URL without the language code.

An internal rewrite means that, internally, a different URL is requested, but the user is not redirected.
So the URL in the address bar of the browser will still be /en/blog, but the URL in the args.Context.Request object has been changed to /blog.
That makes sense!
At least now I know why I'm getting the data that I was getting.

But I still didn't understand why this was never an issue before I upgraded and at first, I was convinced that it was caused by a change in Sitecore, because in my mind we didn't change anything in the module.
However, it turns out that we actually did change something in our module!

The method that is responsible for forcing the friendly URL to be used when requesting items, ItemResolver.ForceFriendlyUrl, is using Request.Context.Url.AbsolutePath from the current HttpContext to get the requested URL.
However, before release 1.0.4 of the module, it was using Request.RawUrl for that.

This was changed because Request.RawUrl contains the full URL, including the querystring, which we had to strip before we compared it to the friendly URL from the LinkManager.
It looked to me like Request.Context.Url.AbsolutePath contained the same value, just without the querystring, so that would save the trouble of stripping it.
But actually, Request.RawUrl contains the original URL of before the StripLanguage pre-processor has done its internal rewrite!
So that tiny change had quite a big effect on the way the module works.

Again, this is only an issue for instances that use our SEO-friendly URL module with languageEmbedding="always" or "asNeeded", in combination with languageLocation="filePath" and forceFriendlyUrl="true" configured on the LinkProvider.

The issue has been fixed in release 1.0.7, but the implementation has been changed a few times up until release 1.0.9, because it took a while before I figured out that the bug was introduced when we changed to using the AbsolutePath.

I initially fixed it by using the value from Sitecore.Context.Item["Sitecore.Web.RewriteModule.OriginalUrl"] that always contains the original URL without querystring as long as the Sitecore.Web.RewriteModule has been configured to be used in the web.config.
In the latest release I changed back to using Request.RawUrl because that has always proven to work in the past.

If anyone thinks it should be solved in a different way, I would love to hear it!

Changed StripLanguage logic

To get back on the subject of the StripLanguage pre-processor; Sitecore has changed the logic of that processor in version 6.6 Update-6.
That same change has been applied in versions 7.0 Update-1, 7.1 Initial release, 7.2 Initial release and all later versions.

Here is a description of the change:

After setting the LinkProvider's languageEmbedding property to "never", existing URLs containing an embedded language would stop working and return the "404 Not Found" error page. (334710)
To avoid such problems, the StripLanguage processor in the <preprocessRequest> pipeline will now parse languages from the URL, even when languageEmbedding is set to "never".

That change apparently caused issues for people that were using the language embedding, because they applied another change in the next update:

Starting with 6.6 Update-7, you can revert back to the earlier behaviour by setting Languages.AlwaysStripLanguage to false (ref. no. 390434).

So let's have a look at what they actually changed in the StripLanguage pre-processor.
This is roughly the logic of the old version of the pre-processor:

  • Check if languageEmbedding != "never" (ie. there might be a language prefix in the URL).
  • Extract the language from the URL and use it to set the context language.
  • Remove the language prefix from the URL.
  • Perform an internal URL rewrite to new URL (without language prefix).

And this is roughly the logic of the new version:

  • Check if the setting Languages.AlwaysStripLanguage = "true" (ie. completely ignore the LinkProvider's languageEmbedding setting).
  • Extract the language from the URL and use it to set the context language.
  • Remove the language prefix from the URL.
  • Perform an internal URL rewrite to new URL (without language prefix).

The reason why I'm showing you this because it illustrates why the following warning applies:

If you are using languageEmbedding="always" or "asNeeded" on your LinkProvider,
you must never use Languages.AlwaysStripLanguage="false"!

Although the release notes state that you revert back to earlier behaviour by setting AlwaysStripLanguage to false, it's not exactly true.
Setting it to false will completely skip StripLanguage, not perform the internal rewrite and result in item URLs not being resolved any more.
All requests will then return a 404 Page Not Found status.

I'm just pointing this out because it caused a lot of confusion for myself, while I was working on resolving the SEO-friendly URL module issue.

In my opinion, Sitecore should change the logic of the StripLanguage pre-processor and put back the check for languageEmbedding="never" (after checking for the AlwaysStripLanguage setting) before skipping the processor entirely.
That would also validate their release notes statement: "you can revert back to the earlier behaviour by setting Languages.AlwaysStripLanguage to false".

Latest SEO-friendly URL module release

You can always find the latest version of the module, including the source code, on GitHub.
Use the Sitecore Installation Package to install it on your environment.

Don't hesitate to contact us with any questions, comments or suggestions!