How to get rid of HTML entities?

Topics: ASP.NET 2.0
Aug 29, 2007 at 8:06 PM
I live in Czech Republic and unfortunately we use many national characters, accents etc. I was quite happy that BlogEngine.NET supports UTF-8 natively. I am a C# beginner so I would like to ask this community for help. I have just downloaded newest source 4244.

I would like to avoid encoding of national characters in output xml files. What I solved successfully:
1) I modified
this.def("entityencoding", "named");
to
this.def("entityencoding", "raw");
in tinymce.js and tinymce_src.js files so I am able to generate correct output files from BlogEngine.NET Control Panel

2) I modified client-side Javascript function GetSlug() in Add_entry.aspx so slug function always stripes accents from Title characters (IMO Slug function is useless with all these encoded characters with non-english titles)

3) I modified RemoveIllegalCharacters function in Utils.cs again so that all diacritics is removed if optional Slug is not used

Adding posts via Control panel now works as requested.

However I would like to use Windows Live Writer for publishing posts. And that's the moment I need your help.

When I open the output xml file uploaded via Live Writer I see HTML entities in <title></title>, <content></content> and of course <slug></slug> tags. LiveWriter is configured to use UTF-8 and I think it works as I need because when I select HTML code view in its menu, it does not contain any HTML entity.

What component is responsible for HTML encoding posts uploaded by Live Writer? And is it possible to change its behaviour so it is similar to TinyMCE switch ("entity_encoding", "raw")? As wiki says:
All characters will be stored in non entity form except these XML root entities: & = < > " '
This would be very nice to have output files human readable.

Many thanks

Radek Dolezel
Jul 4, 2008 at 5:53 PM
Well, I will reply myself after almost one year :)

I was using 1.2.0.0 for many months, skipped 1.3.0.0 and started to implement 1.4.0.0 two hours ago. Unfortunately unwanted Live Writer behaviour has appeared again. But this time I has found solution on http://forums.community.microsoft.com/en-US/writerbeta/thread/c3285a01-2e9f-4637-8ea3-d9961ae754cd/.

If you want to eliminate HTML entities from <title> and <slug> while using WLW for posting your articles, modify wlwmanifest.xml file in your BE.NET installation. Add this new line somewhere in <options> element.
<requiresHtmlTitles>No</requiresHtmlTitles>

Then delete your blog account in WLW and recreate it. If you use the latest WLW release (12.0.1370.325), do not forget to change manually your weblog settings. It is necessary to change Markup Type from Default (XHMTL) to HTML on the Advanced tab. Alternatively you can download the WLW tech preview (14.0.3913.522) at http://windowslivewriter.spaces.live.com/blog/cns!D85741BB5E0BE8AA!1508.entry. Many new features will appear in WLW soon - there is a nice article about it at http://personal.battleangel.org/2008/06/04/windows-live-writer-tech-preview/.

Radek Dolezel