Import Tool Source Code

Jun 28, 2007 at 8:03 PM
Is the import tool source code available? I would like to update it to support importing from my blog (serendipity)
thanks
Mark
Coordinator
Jun 28, 2007 at 9:06 PM
It is currently not available. Mads might make it available sometime, but as he is on vacation, I don't know how quickly you'll get an answer to that.

Can I ask what type of blog you will be converting from? I'm working on BlogML support and wonder if that will help you or not?
Jun 29, 2007 at 6:37 AM

Can I ask what type of blog you will be converting from? I'm working on BlogML support and wonder if that will help you or not?


Its called serendipity: www.s9y.org
They have talked about implementing BlogML sometime in the future, but I don't think its coming in the immediate future

Mark
Jul 19, 2007 at 7:25 AM
The Import Tool doesn't work for my dasBlog with 410 Posts. I found an dasBlog BlogML Exporter so i could export all my Posts. Is there a beta Version of BlogML implementation in BlogEngine so that i can import my Posts into BlogEngine.Net?
Jul 19, 2007 at 8:21 AM
I uploaded a patch to provide basic BlogML import support. It was against a previous version of the codebase, so I don't quite know how well it'll merge into the current codebase - it should be pretty painless though. You can get it from the patch list on the source code page.

Unfortunately, it is quite basic - it doesn't import attachments, and it doesn't fix up links within blog content to point to the new posts. I'm working on a better version that fixes this. I hope to spend some time on it over the next few days. Once I get it working, I'll upload another patch.

Cheers
Matt
Jul 26, 2007 at 2:34 AM
Hey Matt,

would be awesome if you do get it working... I've got a BlogML export of my old COmmunityServer blog I need to find a home for ;)

features I'd love in the importer:
1/ ignore "approved="false" comments (for some reason the export drags the spam along with!)
2/ flag images that were on the "old" server (so they'll break when the hosting stops) so I can go through and correct them
3/ Fix internal links to other posts on the same blog (might be hard, so #4 might solve this!)
4/ build a 404 page that redirects from the old blog links (in the blogML file) to the new URL (with a 302 so Google doesn't sulk)

Happy to give it a try... I was going to look at dasBlog but any platform I can get my content into is a winner ;)
Coordinator
Jul 26, 2007 at 1:27 PM
We plan on adding full BlogML support for the 1.2 release. Both import and export
Jul 26, 2007 at 2:15 PM
This is actually quite tricky to get just right. In an ideal world, I'd be able to import all data with no data loss and no broken links, either externally or internally.

That's not going to happen.

Here's what it does so far, complete with holes and assumptions.

(Firstly, deletes all content - posts, pages, comments, but not attachments)

1. Imports title and subtitle
2. Creates a mapping between the old and new urls of each post (including articles) and all attachments
3. Imports authors
a. Do we want to create users in the asp.net membership system? If so, how do we handle their passwords?
b. Create a mapping between each author in BlogML and BlogEngine.
i. Match on email (thinking that you get a new alias when you get a new blog, but your email may stay the same)
ii. Then username (thinking that you keep the same username, but your email may change)
iii. Use the default author in the blog settings
4. Import categories.
a. BlogML category IDs are simple strings, but BlogEngine requires guids, so I also create a mapping
b. Nested categories are simply flattened.
5. Import posts (not articles).
a. Again, maps string IDs to guids.
b. For each post, the content (and excerpt) has links fixed. All href/src attributes of "a"/"img" elements are:
i. Checked against the mapping made in #2 above, and replaced if found (all links to posts and attachments are now to the new site, at the new urls)
ii. Made relative to the site root (all links to the new site are in the form "/page.aspx" instead of "http://site.com/page.aspx")
c. Import post settings such as comments enabled/disabled/closed after x days, published, author based on author mapping, categories
6. Import comments and trackbacks.
a. Only approved comments are imported.
b. BlogMLs support for trackbacks is a bit poor, so we don't get the blog name or the excerpt, so just do the best we can.
6. Import attachments
a. BlogML doesn't provide the previous site's internal file name, so we have to make one up. I base this on the URL.
i. If there is a query string parameter, try and use it (eg. attachment.ashx?id=filename.zip gives filename.zip).
ii. If there is more than one query string parameter, use the last one (hey, your guess is as good as mine).
iii. No query string parameter - use the path part of the url (hopefully /whatever/filename.zip)
b. If the attachment is embedded in the blogml file, then save the file.
c. Try and make a web request to get the file. This could cause exceptions and timeouts. Don't know how to handle these error conditions...
7. Do the same for articles.
8. Save the mapping from old urls to new urls to be used in a url rewriter.
a. Don't know what format to use - asp.net's url rewriter is the ideal choice except it does server side redirects, and I'd rather do 30x permanent redirect

And after all that, you're going to have missed:

1. Images that aren't included in the blogml as attachments
2. Mapping of urls from old posts to new posts needs to be set up manually
3. Mapping of urls from old posts to new posts if you have more than one way to get to the post (e.g. by friendly title or by id)
4. Mapping of urls for category views
5. Mapping of urls for archive/date views
6. Mapping of urls for authors views
7. Mapping of urls for tag views.
8. Exceptions are not handled!

To help with this, I want to generate a report at the end, listing the url mappings and giving a check list of things that haven't been attempted. This report could include information about the images that weren't included as attachments, but you would have to download them, upload them and fix up the posts/pages manually.

I was thinking about a 404 page, but as a separate thing to this (I just want to get it finished!). I was thinking of a page that would take the title, url decode it and feed it into the search functionality. I hadn't thought about it doing the url mapping from old to new, that might be a very nice idea. It could even do a regex search for the post/article/comment id and redirect based on that. Would this be better served in the url rewriting module, or on a 404 page?

Is there anything I've missed?

Cheers
Matt
Coordinator
Jul 26, 2007 at 2:58 PM
Thanks Matt, that's quite a list. RazorAnt is already part of the BlogML team on codeplex and is in dialog with Keyvan and the rest of the bunch. I'm sure he'll figure it out somehow with their help.
Jul 26, 2007 at 3:38 PM
Wow Matt, when you said it was bare bones I wasn't expecting a list quite like that!

If you need someone to give it a quick test point me in the right direction ;)
Jul 27, 2007 at 7:14 AM


madskristensen wrote:
We plan on adding full BlogML support for the 1.2 release. Both import and export


Doh! That could have saved me some trouble :-)

I'll still upload my code as a patch, hopefully in the next few days.

Jul 31, 2007 at 7:02 AM
Is there an ETA on 1.2? Or even a version I can test?
I tried an upgrade on CommunityServer today and the whole thing went down in flames and I don't want to have to go through the whole pain of rebuilding.
I've got the blogml export of the blog ... so if someone has a tool that's not prime-time but can convert to BlogEngine from that and just give me back the app_data/posts directory that would be cool as well (and gives you some test data....)
Aug 14, 2007 at 8:50 AM
I've just uploaded a new patch containing an updated version of the BlogML import. It does everything listed above.

I've pulled as much as I can into a reusable base class that I'm going to submit to the BlogML project.

Hope it works for you!

Cheers
Matt
Aug 14, 2007 at 2:20 PM
sweet... but... is it something I can make use of at this point (what do I do with the patch XML?) or do I have to wait for it to roll into 1.2 ?

mattellis wrote:
I've just uploaded a new patch containing an updated version of the BlogML import. It does everything listed above.

I've pulled as much as I can into a reusable base class that I'm going to submit to the BlogML project.

Hope it works for you!

Cheers
Matt


Aug 14, 2007 at 8:27 PM
Yep, the import itself all works fine. All being well, you should have a blog with loads of content.

You don't have to do anything with the report if you don't want to. But it's there to give you a chance to set up a url rewriter (of your choice, there isn't one by default in asp.net or BlogEngine) to redirect requests for your old content to the new content. It also reports any links in the imported content that aren't a part of the new content, again allowing you to either add redirections, or fix up links. In short, it's up to you what you do with the report. If you're not bothered about 404's, then you'll be fine.

And I've just noticed that Mads has applied my previous patch, with a comment that RazorAnt is working on BlogML import and export for 1.2, so I don't know what the plans are there. But at the least, this new patch gives you a working import with fixed up links, and a fighting chance of fixing broken links.

Cheers
Matt
Aug 14, 2007 at 11:16 PM
Hey Matt,
Sorry I'm probably missing something really obvious here... when I look at the list of patches I see a patch.xml file that I guess I can download... is that something I can make use of somehow (copy it to a directory and have it magically applied) or do I need to run it through a processor to actually extract the files and apply the patches?
Coordinator
Aug 15, 2007 at 2:00 AM
OffBeatMammal, no that is a bug in CodePlex. For some reasons it fails sometimes if the patch is not zipped into a .zip file. I can't get to the code either.
Aug 15, 2007 at 8:23 AM
Sorry guys, probably should have explained.

I've been using the http://www.codeplex.com/CodePlexClient to get at the sourcecode in a more manageable way than just zip files. I've followed the instructions at http://www.codeplex.com/CodePlexClient/Wiki/View.aspx?title=HowToContribute&referringTitle=Home to create the xml patch file I uploaded and once you've downloaded a copy, you can follow the instructions at http://www.codeplex.com/CodePlexClient/Wiki/View.aspx?title=HowToAcceptContributions&referringTitle=Home to apply the patch to a copy of the source tree.

Let me know if this works out - I can always upload the changes again as a zip.

Cheers
Matt
Aug 21, 2007 at 7:52 PM
I'm obviously having a stupid week here.... CodePlex is obviously more than my addled brain can cope with!
Do you have a version of this I can drop into my current working version (ie the changes for the live site that are delta from the baseline) so I can try the import - I thought I'd run the download/patch process correctly but my test site seemed to run into some other problems (are there any pre-requisites for the patches)?
... or is it easier/quicker to wait for 1.2 (and news on a beta?)


mattellis wrote:
Sorry guys, probably should have explained.

I've been using the http://www.codeplex.com/CodePlexClient to get at the sourcecode in a more manageable way than just zip files. I've followed the instructions at http://www.codeplex.com/CodePlexClient/Wiki/View.aspx?title=HowToContribute&referringTitle=Home to create the xml patch file I uploaded and once you've downloaded a copy, you can follow the instructions at http://www.codeplex.com/CodePlexClient/Wiki/View.aspx?title=HowToAcceptContributions&referringTitle=Home to apply the patch to a copy of the source tree.

Let me know if this works out - I can always upload the changes again as a zip.

Cheers
Matt

Sep 28, 2007 at 9:46 AM
Well, one new baby and several weeks later, here's a more standalone patch for blogml import and even export. It's a simple zip file, and just requires copying a few files around (and a quick change to web.sitemap)

It's uploaded as patch 335, and the direct download link is:

http://www.codeplex.com/blogengine/Project/FileDownload.aspx?DownloadId=19398

Full instructions are in the readme.txt file.

Cheers
Matt
Sep 28, 2007 at 10:40 AM
Although I forgot to say:

  1. Make sure your new user is set up, and is added to the adminstrator/editors role before importing!
  2. Also make sure that your default author name (owner of blog) is set correctly - if your blogml file doesn't contain an author, that name will be used... (authorname element in ~/app_data/settings.xml)