BE.NET on a Webfarm

Jan 8, 2009 at 6:08 PM
We are currently running BE.NET on a webfarm and running into many cache issues. Does anyone have suggestions on how to properly run BE.NET on a webfarm or is there a way to disable the ASP.NET cache?
Jan 20, 2009 at 6:41 PM
I have a similar question to yours.  I don't necessarily want to disable the cache since that really helps keep the load off our DB, but I would like to find out where you can change the amount of time something is cached.  Let me know if you find that option.  Here's a link to my question in case you know something about this:
http://www.codeplex.com/blogengine/Thread/View.aspx?ThreadId=44687
Jan 20, 2009 at 7:38 PM
I have yet to find anything. I have been very discouraged due to the lack of help from the community while using BE.NET--I think we are going to have to migrate to a different product due to the issues running on a webfarm and not being able to get any assistance.
Jan 20, 2009 at 9:55 PM
How ya doin', Madkidd!

First, there probably aren't many BE.NET deployments on a webfarm.

Second, this is Open Source, an all Volunteer Army doing it for fun and passion after hours.

Third, have you tried to get assistance from a certain commercial company you and I are acquainted with on how to setup their app on a web farm?  I didn't think so.

To be fair, web farming any app is not for the uninitiated.  I hope you don't get discouraged and are super successful on the BE.NET farm!

-Dave

Jan 20, 2009 at 10:16 PM
Hi Dave!

I hope I didn't sound too down on BE.NET--I didn't mean to be. I am using it in various places and very happy with it. I had high hopes for it in this particular launch and just bumbed it hasn't worked out because of the issues running on a a webfarm. I have made a couple of posts asking questions here with no response and thought a few more guys like yourself may have been able to help me out--the fact that I haven't gotten help isn't a negative on anyone's part as that is just life with dealing with open source products. :)

Since your reading, do you happen to know how to disable (if it is even possible) caching in BE.NET? That would fix most of our issues with the webfarm deployment.

Love what you are doing with Sueetie!
Jan 22, 2009 at 2:09 AM
Madkidd,

Sent you an email on this.

btw, when I mentioned a forums thread I was following, I inadvertently was referring to the "One Code Base, Multiple Clients" thread, which is also very interesting but not exactly what you're looking for.  My point was that you're probably absolutely right about the cache being the key.

Thanks again for the Sueetie comment.  I wish I could be spending more time with it...and BlogEngine.NET!

Dave
Jan 22, 2009 at 10:30 AM
Hi all,

recently I had some experiences with deploying BE.Net in WebFarm environment. To my knowledge, with current BE.Net (1.4.5) code architecture, it is not possible to successfully host single instance on 2 or more servers.

The problem is indeed somewhat related to caching. It lies in object layer consisting of classes like BlogEngine.Core.AuthorProfile, BlogEngine.Core.Post or BlogEngine.Core.Author.Page. Those classes use private static generic list to store all objects of given class. This list is filled from data provider at first access. Since the list is static, it persists on each server independantly (until the appliction or server is restarted) and reflects only the changes caused by HTTP requests to the server it resides on.

Although the changes from servers are populated to data provider, BE.Net does not really check if there were changes from different sources until the application is restarted.

As far as I can see it there is no easy way of countering this problem, though repopulating those private static list at the beginning of every request might be an easiest workaround - at expense of performance.

Regards


Coordinator
Jan 23, 2009 at 4:48 AM
I think Failedtestcase's description of the issue is quite accurate.  I haven't looked into the author profile's or pages, but the Post class does have a Reload() method you can call at any time to reload all the posts from the data store.  I just posted a message about Post.Reload() the other day.  I'd imagine in most blogs, pages are the most updated part of a blog, so this could definitely be helpful for those in web farms.
Jan 23, 2009 at 8:23 PM
Ben, I took a closer look at the code that failedtestcase mentioned as well as the code that you wrote the other day to use Post.Reload().  Page doesn't have the same corresponding fix, though you could call FillPages() I suppose.  I host several be.net instances on Mosso which is a "cloud" hosting environment.  It is basically a web farm, but I don't have control over the servers.  The code you provided is a really good help to that, but the problem is that lots of stuff that is in memory can change like settings, users, pages, etc.  It'd be nice if we had one method we could call that would basically flush everything out of memory.  Right now, we have to delete the web.config file and recopy it in order to make Mosso's servers replicate it and make the application go out of memory.

Now, on a side note that is related to all this.  Doesn't it seem like BlogEngine.NET would have scaling issues as a blog got bigger and bigger.  I know it may take a while, but it seems like loading all the data from the database is not a good idea once you get beyond a certain point.  I'm guessing it would be a pretty big deal to change to more of a cached system where if I was looking for a particular page/post and didn't have it, I could get it from the db and say it was valid for 10 minutes (or whatever).  That way old posts would only be queried when necessary and new/popular posts would only have to be queried once every 10 minutes.   This would make startup times for bigger blogs better when they first start up on a server (or get reloaded for whatever reason).

I hope all this makes sense.  I'd be glad to help with code.  I'm new to the be.net community though and didn't want to step on toes if this had already been discussed.

Thanks,
Micky
Coordinator
Jan 24, 2009 at 1:14 AM
@Micky, you're right Page.Reload() alone isn't enough as it doesn't cover pages, profiles, settings, etc.  The best solution would clearly be a "web farm" setting built into BE that could re-read from the data store on all servers in a web farm when something's changed.

One way this could be accomplished would be if there was a timestamp value in the data store for each major category (posts, pages, settings, profiles, etc).  When data in each of these categories is changed, the timestamp value linked to that category is updated in the data store.  Then if there was an HttpModule in BE that would check the 4 or 5 timestamps on every page request to see if any of the timestamps for a category are newer than the timestamps the server originally cached when it read the data, the server would clear and re-read the data for that category.  Since the HttpModule is only reading 4 or 5 timestamp values on every request (rather than re-reading all the data), this would be a fairly resource UN-intensive activity.

I never thought about the scaling issues you brought up.  I don't yet have a very large blog so haven't run into this, but I could definitely see there being some issues if the numbers of posts, comments, etc. started getting up there.  Getting data on-demand and doing caching like you suggest does sounds very optimized.  I've only been around BE for a few months now, but haven't seen anyone discuss improvements to how BE scales.  Not that it's anyone's job to contribute to projects like this, but the great thing about BlogEngine is that when there are contributions, many many people benefit!
Jan 24, 2009 at 8:20 PM
An HttpModule would be a good solution.  That way it is something you can add in without affecting the core code.  Instead of having it check on every single request, I would almost try to limit it similar to the way you did your code on the other post related to this.  It should probably be shorter than 10 minutes of course, but even better would be to have the timeout as a setting or something.  I'll try to work on this in the coming week.  Is there some place where I can post it for others to have access?  Also, where do you think a good place to store the "lastUpdated" timestamp for the different parts of the system would be?  Settings? 

I guess to truly separate it, I'll need to tap into the events to save these "lastUpdated" values so that the core code isn't touched there either.

Micky
Coordinator
Jan 25, 2009 at 12:34 AM
If it were me, I'd save the lastUpdated timestamps in my own SQL table or in my own XML file within the App_Data folder.

For capturing when changes are made, BE does allow you to add your own eventhandlers for certain events.  Like the Post.Saved event when a new post is created, edited or deleted.  There's also Page.Saved.  But I'm not sure if there's equivalent ways to subscribe to events when settings are changed.  It may be necessary to modify the BE core code to expose such events.  Another approach so the BE core code wouldn't need to be modified would be if data is being stored in SQL tables, you could add triggers to the SQL tables so you can update the lastUpdated timestamps when the triggers fire.  And if data is being stored on the file server in the App_Data folder, you could either monitor for changes to certain files/directories, or just periodically check the last modified timestamps of files in the App_Data folder to determine if something's changed.

Then once it's been determined that something's changed, you can reload posts with Post.Reload(), but there may not be equivalent "reload" methods exposed for other pieces of data like pages or settings.  So again here, modifying the BE core code may be necessary ... although I wouldn't expect exposing reload methods in the BE core code would be very involved.  Or you could go the brute force method of reloading the BE app so all cached data is cleared out.  This could be done by programmatically modifying the web.config file -- like maybe just adding a space to the end of the file.  Not a very eloquent solution of course.  Having reload() methods would be better, I think.
Apr 28, 2009 at 5:57 AM
I tried setting up a common .NET State server. But, I am not sure if it stores the cached items in the common state server. Could this be a possible solution?

I also tried getting the posts from the database each time the page loads, it works but makes it slower.
Coordinator
Apr 28, 2009 at 6:56 AM
Most of the data cached in BE, like Posts, are stored in static lists.  Static list data isn't session data that would be stored in a state server or other session storage location.

Even if BE used the .NET Cache instead of static lists, I'm thinking (but could be wrong) that items in the Cache are also not included in session storage.  Session storage locations (like state server) I think just store session data which is per-user data.  It doesn't store static data, Application or Cache data.

The most efficient method of getting the latest data would probably be what I explained here in this thread on January 24th with polling the data storage location to check for changes.  Polling would retrieve just a single piece of data when nothing has changed rather than unnecessarily re-retrieving all the data.  You would need to come up with some mechanism to record when data has changed and a mechanism to poll for that change.

You could also go the route of re-retrieving posts (and optionally other data too) every 10 minutes or so.  There's some code I posted for this in this other post.  This is not as ideal as polling, especially since someone might have a link to a new post, but then when they click on the link, they get a 404 if they're hitting one of the servers on the farm that hasn't yet re-retrieved the latest data.  To solve that issue, you could modify the UrlRewriter to re-load posts when an unknown post is asked for.  I posted code for this in this other thread back on March 16th.

Just some ideas ...
May 8, 2009 at 12:02 PM
Edited May 8, 2009 at 1:40 PM

Hmmm I've just discovered this limitation!

We have a blog on a web farm and there were a couple of weird gremlins when a user was posting an article.  I now suspect our load balancer had shifted a request to another box. 

I finally realised about these static lists when I was researching into getting BlogEngine.Net's search engine integrated with the rest of our site.

We have spent quite a bit of time getting BlogEngine integrated with our site (such as ASP.Net profile management integration), so it would be a real shame and a pain to move away now.....

If anyone has got any further in their investigations or support for a web farm then do let me know.

Thanks

 

Coordinator
May 10, 2009 at 8:19 AM

mercsd: That's a good tip, sticky IPs.  Hopefully it helps some people here.

Another problem some people have brought up, however, is when they create a post and the Newsletter widget sends emails out to the subscribers.  Subscribers get the email and click on the link to view the latest blog post, but they get a 404 error.  The sticky IP option may not help here since the person is coming in fresh from their email program.

Coordinator
May 10, 2009 at 8:24 AM

The web farm difficulties are a challenge.  In addition to some of the ideas that have been brought up in this discussion thread, I just created a Web Farm Extension.  It doesn't solve all the caching issues, and may not work for everyone.  Whether it works for you will depend on how the web farm is configured.  I want to say that I haven't been involved in configuring a web farm, so I created this extension making a couple of assumptions.

Web Farm Extension 1.0
http://allben.net/post/2009/05/10/Web-Farm-Extension-10.aspx

Jul 25, 2009 at 9:58 AM
In my experience we've found out that native state server or even using SQL Server for sessions is a very scary scenario as both have issues (mainly performance). By the way, we are also using sticky session.About SQL Server, you server will die very soon if you have enough number of hits coming in (I belive you have some hits already which yielded you to do Web Farm or you do it just for the sake of redundancy).Bottom line: We are evaluating Velocity because NCAchce is really expensive. However advantages are huge mayores bonos del casino
Apr 22, 2010 at 3:25 PM

Posted this on another thread but I thought it might help here also (sorry for the repost but I know how frustrating I was when I moved it to a farm)

Just started using BlogEngine and it is pretty good, but I too am having a lot of issues on the web farm.  So, I've posted some info on some ideas I've been experimenting with here:

http://aewnet.com/post/2010/04/21/BlogEngineNET-and-a-Webfarm.aspx

Ben's code above works great for Posts, so I've added some additional reloads to some other sections and tweaked some code in the settings to make some other areas get the data across the farm when it's posted.  Hope this help.