blogengine 2.5 doesn't support encoding unicode slugs, old issue link included.

Topics: Business Logic Layer
Jan 22, 2012 at 8:29 PM

Per the applicable standard, RFC 1378, URLs can only contain ASCII characters. Good explanation here, and I quote:

"...Only alphanumerics [0-9a-zA-Z], the special characters "$-.+!'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."

 

I had forked blogengine supporting encoded unicode slugs in database.

but I planned to upgrade my site.

I see that now version 2.5 supports unicode slugs, but doesn't encode them.

I am in doubt whether I should once again change codes to support unicode slug encoding support, or leave it as is.

attachement of this issue: http://blogengine.codeplex.com/workitem/10386 helped me implement support for that.

please someone tell me what is planned for the future. will it support encoded unicode slugs, or it will stay as it is.

Coordinator
Jan 23, 2012 at 10:56 AM

It is odd that the % character is removed within RemoveIllegalCharacters.  I agree that it would be better to not remove the % character as that really makes no sense to remove it.  The reason the % character should be there is to support URL encoding, as you and that thread have pointed out (removing the % breaks the URL encoding).  That code that removes the % sign is pretty old, and I'm not sure why that code was originally added.

The only reason I can think of why we might not want to remove it is that it will cause some broken links for people who have blog posts (created thru BE 2.5) where Slugs have the % sign removed and links to their blog posts are published on the web.  Probably we would need to support the old "% removed" URLs for backwards compatibility and also support the "% not removed" URLs.  I'll explore this some more and see if we can change it before the next release.

Jan 23, 2012 at 12:57 PM

when BE doesn't support unicode url encoding, then how can someone have a slug with % sign removed?

I think without the proper support till now, nearly everyone used ascii chars in slug. (there must be some easy solution for the minimum community)

and for BE 2.5, it doesn't encode unicode urls, and so stores them as they are in db. so no encoding happens, and no % sign is removed.