blog engine dot net can't handle ~50,000 posts?

Aug 21, 2009 at 4:08 AM
Edited Aug 21, 2009 at 1:22 PM

Hello,

I recently created a totally custum skinned blog.  When there are no posts, or very few posts, the site works fine.  Once you get up to around a few hundred or a few thousand, performance really starts to drag, but seems to get better over time (with caching?).  But when I imported all ~50k of my posts, the site completely died.  It just goes to an IIS error page every time.  If I delete ~40k posts from the be_posts table, the site starts working again.

What's up with that?  Any tips? I am using DB for data storage...

Thank you!!!!

Aug 21, 2009 at 2:19 PM

I did a test and at 10k posts it works (slowly till caching catches up then it works great),

20k posts it works (slowly till caching catches up then it works great),

at 30k posts in the be_posts table, it breaks completely... no slow loading & eventual catching up, simply "page cannot be displayed" type erros after 5-10 minutes of the page trying to load.

I think the solution is what Mads calls "lazy loading".  I need the post content for each post to not load or be cached until it is requested...  I am not a really advanced web programmer, I don't know how to implement this myself.

Can anyone advise here?  My blog is really bad ass, with mountains of helpful info that is going to help the world, if I can just get it to work :(

Coordinator
Aug 22, 2009 at 8:05 AM

One thing I would try doing is removing the "SearchOnSearch" control in the site.master file (in your theme folder).  The markup for this control will look something like:

<blog:SearchOnSearch runat="server" MaxResults="3" Headline="You searched for" Text="Here are some results for the search term on this website" />

Removing it could help a bit.  I would still expect a 4 to 5 minute initial load time for all the data to get loaded into memory.

Aug 22, 2009 at 1:08 PM

Wow thanks Ben!!!

I have not tried to import past 20k again yet, but making that change shaved about 20 seconds off the site loading for the 20k test... I will proceed to load many more thousands of posts and see how far I can get :)

Thanks again!

Aug 22, 2009 at 1:59 PM

this tip took me to the 30k posts mark... proceeding up to 50k

Aug 22, 2009 at 2:58 PM

at ~54k posts the site does this:

<!-- This row is for HTTP status code, as well as the divider--> <!-- What you can do --> <!-- Check Connection --> <!-- InfoBlock -->
 

Internet Explorer cannot display the webpage

 
 

What you can try:

 

  <button id="diagnose" onclick="diagnoseConnectionAndRefresh(); return false;">Diagnose Connection Problems</button>

 

More information

If I reduce it back down to 30k posts, it works again.... any tips?

Aug 23, 2009 at 2:01 PM

Even if you don't have tips about what I can change with BE to make this work - do you have anything I can tell my host (it's shared host)?

For example - could adding more memory fix the issue?  My guess is the problem is that at 50k rows in the BE_Posts table, the server does not have enough memory to cache them all?  Does this sound reasonable?  I need something to suggest to my host so that they can try fixing it...

Thanks!

Aug 23, 2009 at 2:53 PM

Quick note... I wrote a little macro to refresh the page every 5 min, and about 1 out 100 times the site will load

Do you know what could cause it not to load the other 99 times?  Does the web app just need more memory thrown at it?

Aug 23, 2009 at 4:33 PM

Just to confirm my suspicions, this only happens on the default.aspx page right? If you type in the permalink for any particular post, does the same thing happen?

I suspect one thing that might choke on 50k posts, is inside Default.aspx here : 

  else if (Request.QueryString.Count == 0 || !string.IsNullOrEmpty(Request.QueryString["page"]) || !string.IsNullOrEmpty(Request.QueryString["theme"]) || !string.IsNullOrEmpty(Request.QueryString["blog"]))
  {
   PostList1.Posts = Post.Posts.ConvertAll(new Converter<Post, IPublishable>(delegate(Post p) { return p as IPublishable; }));
   if (!BlogSettings.Instance.UseBlogNameInPageTitles)
    Page.Title = BlogSettings.Instance.Name + " | ";

   if (!string.IsNullOrEmpty(BlogSettings.Instance.Description))
    Page.Title += Server.HtmlEncode(BlogSettings.Instance.Description);
  }

Later on this calls dataProvider.FillPosts() which does a sql statement (in your case) to grab every posts. That list then sits in the cache so that it saves on subsequent calls to the DB.

So simple answer is: change default.aspx to be something else, maybe paginate it instead of caching the whole list. See caching the whole list makes sense to a certain point and once you reach that point you need to switch to pagination. I think 50000 posts is roughly that point (just a guess).

Complex answer: figure out a way to get more memory, make sure you are 64-bit everything (windows, sql, etc) and it should operate fine once the cache is populated.

Right now what is probably happening in your scenario, is that the cache is getting filled up with the post list, and then something else comes along and resets the cache for the post list. Once this happens, then you are constantly populating the post list.

Hopefully some of that makes sense.

-Justin

Aug 23, 2009 at 10:03 PM
Edited Aug 23, 2009 at 10:50 PM

I think you are correct... I think I will end up having to re-write some code... what I might end up doing is not using BE, and just go with what I'm familiar with (query strings "?q=some_parameters").  I was hoping to use BE to save time and not have to write a custom front end, but if it don't work it don't work 

To answer your question about default.aspx vs. permalink - it actually is the same problem for both.   The perma links don't load any better than default.aspx when there are 50k rows in the be_posts table

Aug 24, 2009 at 11:43 AM

Is this problem still valid with latest version of BlogEnginet.NET (version 1.5 FINAL)?
Which version of data provider did you use, XML or DB (MSSQL)?
Is this problem also occured when we used DB data provider?

Any solutions yet? BlogEngine.NET developers, any ideas/solutions?

Coordinator
Aug 24, 2009 at 12:31 PM

BE 1.5 and the next version is not going to be ideal when you have 50,000 posts (or even 30K or 40K).

Making BE more scalable is a future change that will probably be addressed at some point.

Aug 24, 2009 at 1:32 PM

Actually BE already has many features and extension, but this issue can caused a serious "damage" to BE (become unpopular).
There must be away to solve this issue, Ben. *** I believe that ***

ASSUMPTION: Let's say if I add 10 posts every day in a year, there must be 3650 posts per year,
If I will use this BE 1.5 for the next 5 years...., there must be 18250 posts per 5 years.
18250 posts are below your >30000 posts limit.
BE 1.5 is save for this scenario, and I think this issue already has a solution in the next 5 years (positive thinking mode on).

But the problem now is if I have to migrate my old posts into BE 1.5, and I already have more than 30K posts (accumulated from several years).... :(
BE 1.5 does not support multiple blogs in single installation yet (we must use multiple installations - let's say a virtual directory in web site). So multiple blogs in single installation will not have any impact into BE 1.5 right now.

CMIIW, thank's.

Is this problem cause by a CACHE?
Can we just remark in BE 1.5 source code or turn-off this in web.config file?
Will this solve this issue?

Thank's.

Aug 24, 2009 at 1:43 PM
Edited Aug 24, 2009 at 1:45 PM

Hey Henkyck,

I am using 1.5 and MS DB provider

I didn't even consider that maybe I could get two or three installs going, maybe seperate the 50k posts alphanumerically or something.  Other than like, sub domains or iframes, do you know of a good way to give the illusion of a single install when actually running 2 or 3 + BE instances?

If you know a slick way to make many instances appear as one, I would be very grateful to know about it... subdomains and iframes are not really good from an SEO perspective.  I would love to hear any input that you have!

Thanks much!!!

Coordinator
Aug 25, 2009 at 2:05 AM

Just my opinion, but for 99.9% of the bloggers out there, they are not creating 10 posts per day.  Even the most frequent bloggers might have 1 new blog post per day, on average.  It would take 82 years before 30,000 posts were reached at this rate.

The problem is loading ALL the posts into memory at once takes a long time when you have 50K posts.

I recently made some modifications for someone using BE.  Coincidentally, they also have 50,000 posts.  The posts are being stored in a SQL DB.  After making numerous customizations, the blog is now just as fast, if not faster, than a blog with 100 posts.

This was accomplished by (a) not loading and storing all the posts in memory, and (b) changing all the places in BE that rely on the posts being available via Post.Posts and having each one of these places have its own unique query to the DB.  This second part (part B) is what took a long time to do, and it's tailored specifically for MS SQL.  Each one of these queries returns data for JUST the posts that are needed at that point in code.  The data returned is used to populate the normal "Post" class/objects.  So the BE code still accesses all the posts, comments, tags, etc. via normal object-oriented methods/properties on the standard Post objects.

Hopefully sometime in the relatively near future, similar scalability improvements can be made to the main BE codebase.  Just lazy-loading the post content will help a lot.  It'll probably be a little more challenging when BE is changed because we'll need to come up with something that works well for different storage providers -- XML and the different databases (SQL Server, MySql, VistaDb, SqLite).

Aug 25, 2009 at 2:45 AM

Hi Ben,

I agreed with your opinion, but if we use BE 1.5 using multiple authors (users - let's say 50 users (team blog, like MSDN SharePoint blog or MSDN IE blog), but only 25 users is active), it is possible to achieve +/- 10 posts per day (IMHO), right?

Can you help me to point to which code in which class that load ALL the posts into memory at once?
So that I can still use BE 1.5, and no need to worry about this issue again. FYI, I will use MS SQL DB as my data store.

I can modify/customize it by myself or even I may give you an idea for this.

Thank's ben.


BenAmada wrote:

Just my opinion, but for 99.9% of the bloggers out there, they are not creating 10 posts per day.  Even the most frequent bloggers might have 1 new blog post per day, on average.  It would take 82 years before 30,000 posts were reached at this rate.

The problem is loading ALL the posts into memory at once takes a long time when you have 50K posts.

I recently made some modifications for someone using BE.  Coincidentally, they also have 50,000 posts.  The posts are being stored in a SQL DB.  After making numerous customizations, the blog is now just as fast, if not faster, than a blog with 100 posts.

This was accomplished by (a) not loading and storing all the posts in memory, and (b) changing all the places in BE that rely on the posts being available via Post.Posts and having each one of these places have its own unique query to the DB.  This second part (part B) is what took a long time to do, and it's tailored specifically for MS SQL.  Each one of these queries returns data for JUST the posts that are needed at that point in code.  The data returned is used to populate the normal "Post" class/objects.  So the BE code still accesses all the posts, comments, tags, etc. via normal object-oriented methods/properties on the standard Post objects.

Hopefully sometime in the relatively near future, similar scalability improvements can be made to the main BE codebase.  Just lazy-loading the post content will help a lot.  It'll probably be a little more challenging when BE is changed because we'll need to come up with something that works well for different storage providers -- XML and the different databases (SQL Server, MySql, VistaDb, SqLite).

Aug 25, 2009 at 3:00 AM
Edited Aug 25, 2009 at 3:06 AM

Hi Jones,

You can achive that using virtual directories, for example:

  1. http://www.yourdomain.com/blogs/woahjones
  2. http://www.yourdomain.com/blogs/benamada
  3. http://www.yourdomain.com/blogs/henkyck

After that you can create a user control that displays all of blogs within your site, and put this user control in your http://www.yourdomain.com/default.aspx (for example).

That's my two cents.

woahjones wrote:

Hey Henkyck,

I am using 1.5 and MS DB provider

I didn't even consider that maybe I could get two or three installs going, maybe seperate the 50k posts alphanumerically or something.  Other than like, sub domains or iframes, do you know of a good way to give the illusion of a single install when actually running 2 or 3 + BE instances?

If you know a slick way to make many instances appear as one, I would be very grateful to know about it... subdomains and iframes are not really good from an SEO perspective.  I would love to hear any input that you have!

Thanks much!!!

Coordinator
Aug 25, 2009 at 4:06 AM

All the posts get loaded into a static list, _Posts.  The code below is from the Post class in the BE Core.

http://blogengine.codeplex.com/SourceControl/changeset/view/28533#18672

If the posts aren't yet in the static list, it will call FillPosts() to fill the _Posts static list with all the posts.  FillPosts() is a function in each of the providers.  It's in the DbBlogProvider here.  And it's in the XmlProvider here.  Once the posts are in memory, they stay there.  If you try to simply eliminate storing the posts in memory, that would be even worse, because the DB would be re-queried for all 50K posts each the posts are needed.  This process takes about 4 to 5 minutes each time to load about 50K posts.

public static List<Post> Posts
{
    get
    {
        if (_Posts == null)
        {
            lock (_SyncRoot)
            {
                if (_Posts == null)
                {
                    _Posts = BlogService.FillPosts();
                    _Posts.TrimExcess();
                    AddRelations();
                }
            }
        }

        return _Posts;
    }
}

Aug 25, 2009 at 4:24 AM

im trying to follow you guys but I'm too tired lol... I did want to get on and thank you for responding... thank you very much. I'm not sure what approach I will try yet (virtual directories with user control to tie them together, or rewrite query code based on Ben's comments).  I am very novice C# coder and right now I am very sleepy from pulling all nighters the past few nights for work.... time to saw some logs *snore* zzzzzzzzzzzzzzz

Aug 25, 2009 at 10:17 AM

Ben,

I think you must use:

  1. Pagination, for example show only 10 recent posts from today posts in BE home page, display post list for every 10 post per page (with prev page, and next page).
  2. Use just in time query using QueryString like PostID, for example to view one post.

We can use another strategies for caching, rather than to load everything into collection (at memory) at once and refresh every n minutes/hours.

My two cents, Ben.

BenAmada wrote:

All the posts get loaded into a static list, _Posts.  The code below is from the Post class in the BE Core.

http://blogengine.codeplex.com/SourceControl/changeset/view/28533#18672

If the posts aren't yet in the static list, it will call FillPosts() to fill the _Posts static list with all the posts.  FillPosts() is a function in each of the providers.  It's in the DbBlogProvider here.  And it's in the XmlProvider here.  Once the posts are in memory, they stay there.  If you try to simply eliminate storing the posts in memory, that would be even worse, because the DB would be re-queried for all 50K posts each the posts are needed.  This process takes about 4 to 5 minutes each time to load about 50K posts.

public static List<Post> Posts
{
    get
    {
        if (_Posts == null)
        {
            lock (_SyncRoot)
            {
                if (_Posts == null)
                {
                    _Posts = BlogService.FillPosts();
                    _Posts.TrimExcess();
                    AddRelations();
                }
            }
        }

        return _Posts;
    }
}

Aug 25, 2009 at 10:10 PM

Perhaps there is a way to split/branch the BE.NET code base to use core functionality where practical and focus on NextGen stuff in a new branch.  I don't want to re-write BE.NET for my needs then try to add new features and/or fixes as they become available.  I'd love to see what many of us want such as MultiBlogs and Scalability.  Without real support I don't see this as practical.  If there was a con call or some other discussion to debate architecture, priorities, etc. maybe a branch can unite 2 or more divergent camps so the single, simple, out of the box BlogEngine.NET works one way, and the multi-blog, high capacity, performance conscious camp can have theirs.

I'm using "BlogEngine.Net for SQL Server" and keep updating both to keep in sync.  This requires a SINGLE SQL Server database for all blogs and a Single application folder in IIS.  I still need a new Web application for each blog, but that is due to BE.NET architecture and the need for host headers to determine the blog.  These can both be corrected.  The SQL Server data model fits on top of the DB Provider model, but could just as easily be a direct replacement by adding a column to several tables and a couple additional tables.  This provides multi-blogs but does not address everything.  Caching could be configurable based on performance requirements and resource constraints.  If SQL Server is the DB of choice, so be it.  If support for SQL, Oracle, Access, MySql, SQL Lite, etc. are required, OK.  If nothing else, I would really appreciate a sounding board to find out where people are headed so I know what I can expect and even where I can offer help (grunt work or advice.)

The last thing I want to do is to start over or just grab what everyone else has done and start a new project.  Look where that leads by doing a Search of "BlogEngine" on CodePlex.

ps - What about MVC, WCF, Silverlight?  Also, I'd rather be writing enhancements, widgets and eventually themes, but my BlogEngine/SqlBlogProvider source still needs some fixes.

Thanks for any reply...I'll move this post/reply if necessary.

Aug 26, 2009 at 12:19 AM
Edited Aug 26, 2009 at 2:10 AM
BenAmada wrote:

Just my opinion, but for 99.9% of the bloggers out there, they are not creating 10 posts per day.  Even the most frequent bloggers might have 1 new blog post per day, on average.  It would take 82 years before 30,000 posts were reached at this rate.

The problem is loading ALL the posts into memory at once takes a long time when you have 50K posts.

I recently made some modifications for someone using BE.  Coincidentally, they also have 50,000 posts.  The posts are being stored in a SQL DB.  After making numerous customizations, the blog is now just as fast, if not faster, than a blog with 100 posts.

This was accomplished by (a) not loading and storing all the posts in memory, and (b) changing all the places in BE that rely on the posts being available via Post.Posts and having each one of these places have its own unique query to the DB.  This second part (part B) is what took a long time to do, and it's tailored specifically for MS SQL.  Each one of these queries returns data for JUST the posts that are needed at that point in code.  The data returned is used to populate the normal "Post" class/objects.  So the BE code still accesses all the posts, comments, tags, etc. via normal object-oriented methods/properties on the standard Post objects.

Hopefully sometime in the relatively near future, similar scalability improvements can be made to the main BE codebase.  Just lazy-loading the post content will help a lot.  It'll probably be a little more challenging when BE is changed because we'll need to come up with something that works well for different storage providers -- XML and the different databases (SQL Server, MySql, VistaDb, SqLite).

 

Hi Ben,

I am going to attempt this.  Do you have any documentation on where to find all the places where I will need to write a unique query to the DB to replace the cached post.posts references?  I haven't even looked yet, but could I find all of those places by searching for post.posts within the project?

 Thanks,

-Woah

Coordinator
Aug 26, 2009 at 1:40 AM

I actually simplified this by saying Post.Posts.  It's true that all the posts are stored in Post.Posts.  But all the comments, tags, post notifications, and categories (sort of) are stored in Post.Posts.  A lot of the code in BE accesses this data not through Post.Posts, but via various properties.  For example, Category.Posts.

I think the best, or only, approach is to visit every place in code that outputs data and see where that data comes from.  This means, going to all the widgets, extensions, handlers, modules, archive page, search page, etc. and writing queries for all this.  It's not at all a light task.

Aug 26, 2009 at 2:27 AM

ah OK... I will probably not try that right now :p

Although I would love the learning experience, by the time I finished you'd probably have a new version out that addresses all these issues LOL

Aug 26, 2009 at 3:00 AM

what ive decided to do is to go live with small sections of my posts at a time... each week I will add 100 posts or so... search engines will like the changing & updating content, and it will take the blog a while to grow... when a version of BE comes out that can handle the 50k posts... i will go full speed, but for now i will just kind of go slow with it i spose