Is this the wrong "Engine" for high traffic, clustered environment?

Topics: ASP.NET 2.0, Business Logic Layer
May 13, 2009 at 2:32 PM

I'm researching BLOG engines (open source) that will need to be highly customized.  My biggest concern is not modifying the existing code base, but the underlying architecture.  I would like some insight (if possible) into the performance of BlogEngine.

We will use server cache and proprietary caching technology coupled with SQL 2000/2005.

Traffic will be upwards of 30-40 million page views per month hosted on 3-4 web servers (depending on need).

Will “BE” handle this kind of traffic without re-writing most of the core engine?  I say/ask this without looking at any of the code, so please don’t take offense (to those who wrote/contributed to it).

Thanks in advance, Do

Coordinator
May 13, 2009 at 9:19 PM

I don't think BE.NET would perform any better or poorer than any other application with 30 - 40 million page views a month.  I don't personally have any benchmarks, however.

One area which could (or could not) be an issue is if you have a lot of blog posts.  Blog posts are cached in memory.  A few thousand blog posts shouldn't be any problem.  If you get into the tens of thousands of blog posts, scalability testing for this number of posts would be a good idea.

May 14, 2009 at 12:07 AM

To not have the application load all posts into memory, how much effort would it be to change this to an on-demand system.  I would rather cache pieces of data and utilize a cache utility then store everything in memory.

I'm just trying to gauge the effort involved and/or determine if it will basically be a re-write of all the core logic.

The site will typically get around 150+ posts per day (multiple authors) and upwards of 100+ comments per post.  So loading all of this in memory will not work. I have about a 6 week window and i'm just trying to gauge what the effort would be to get all this done.

May 14, 2009 at 12:31 AM

It's not just posts. Comments, pages, tags, categories, they're all loaded into static variables. BlogEngine.Net simlpy isn't going to fit well in a clustered environment. Updates aren't going to propagate across your cluster unless you reset the application (and cause a re-load of all content).

Coordinator
May 14, 2009 at 12:51 AM

Proffitt makes a good point about clusters and such.  One of the problems people with web farms report here is how new data on one server (new post, new comment, etc) doesn't show up on the other servers.

If you rewrote the engine to not cache this data and to look it up in the DB, then this wouldn't really be a problem.

I haven't looked into it, but rewriting BE so it queries the DB for the information each component needs (instead of using in-memory data) would be a pretty big undertaking.  So much of the code now relies on the object oriented behavior where any widget, page, extension, etc. can easily just reference all the posts, comments, categories etc in memory and use what's already there.  If you removed the data in memory, each one of these places in code would probably need to be responsible for obtaining its own set of data via a DB query.

May 14, 2009 at 2:21 AM

Thanks for the insight, I'll have to weigh my options - but knowing all of this ahead of time was a huge help.

I appreciate the comments.

-Do

May 14, 2009 at 2:37 PM
Edited May 15, 2009 at 7:12 PM

Hmmm - I have been modifying BlogEngine for our webfarm ... I am adding a cacheing model to it for the application with a configurable cache expiration.  I think nearly all of the static variables can be retrieved and written to the application cache so that anyone server would not be stale longer than the amount of time configured.  I think the real problem now is that an update of a post/comments uses what is in memory to delete and repopulate the data.  this is why everyone is having trouble ... because the methods in the provider first delete all comments and then reinsert them.  I have written a new MsSqlProvider (for Microsoft SqlServer 2005+) and am using stored procedures with an xml update/insert to update a post and it's comments.  That way one user doesnt overwrite another user on an different farm member or thread.  I then wrote a separate method to delete comments that deletes by guid and removed the delete PostComments line from the UpdateComments method.

Do you think that will take care of the issues?  We are about to deploy these mods to QA for testing in a load balanced environment ... did i forget something?

 

May 23, 2009 at 3:01 AM
Edited May 23, 2009 at 3:33 AM

I have two throw my opinion in here as it's valid. We have a site with nearly 50k articles on a dual proc server with 4gb ram and a dual proc sql server also with 4 gb of ram, BE is running just OK but only after Ben Amada made some serious changes for us, BUT it still seems to have regular problems with that amount of data. We now recycle APP pool every 30 mins, and have increased virtual memory to min 5768 max 12000, this and Ben's tweaks have improved the situation immensely since the inital switch to BE 1.5 'out of box', but things are still worrying with server fall overs and lots of "System.OutOfMemoryException" was thrown messages in event log. I'd suggest that this platform would be awesome for large sites if the Caching of data could be turned on or off according to user choice and SQL server takes up the processing grunt that currently CACHE performs. Incidentally we are serving around 500k unique visitors per month according to Google Analytics so the initial post on "30-40 million page views per month", i'd be having serious reservations.

May 26, 2009 at 10:06 AM

I love BE.NET, but the caching thing is really annoying. Especially on a hosted environment.

If it could be made to be optional that would be awsome.

May 26, 2009 at 3:06 PM

I think if your using XML as your data store, the caching makes perfect sense.  Either way though, the caching of all posts/comments effectively puts a cap on it's capabilities.

If I had the time, I would help with a full port to SQL.  Implement on-demand comments, web services, etc..  This would require most of the engine to be rebuilt though.

May 28, 2009 at 3:52 AM

For Farms or Load Balancing instances, why not refactor the DB layer into a web service, and deploy it as a seperate site with caching at this layer.  Then implement the service into the middle tier with the application.  Use seperate IPs for each site that host an instance of the DB and App.  Then load balance the application and the db layer by the IPs.  Then you have to somehow sync up the writes to the DB.