This project is read-only.

Suggestions to Improve Scalability of DbBlogProvider

Topics: Business Logic Layer
Jan 5, 2011 at 3:27 PM

The FillPages and FillPosts methods in the DbBlogProvider perform many queries unnecessarily. For example, If you have 1000 blog posts then there are 1001 database queries performed to select the data for them. I have a couple suggestions to improve the DbBlogPProvider:

  1. Select all data fields when FillPages or FillPosts are executed, so that only 1 database query is performed. Too many queries will compound any latency between the web server and database server.
  2. Implement Paging of items within the SQL code that selects the data. This way if you are to show only the first page of 10 posts, you don't get ALL posts from the database, then loop through and list only 10.

Another thing that would drastically improve the data access layer for BlogEngine is to convert it over to using the "Repository Pattern" where the DbBlogProvider object would return IQueryable's instead of Lists. This would allow for Entity Framework to be implemented for the DbBlogProvider, thus supporting both of the above suggestions easily along with pushing all query "where" type parameters down to SQL instead of loops within .NET once data is loaded.

Also, I haven't forgotten about the XmlBlogProvider. The "Repository Pattern" with returning IQueryable's would still work work the XML provider. The only difference is that the XML provider would probably still work the same as it does now by returning List's of objects. It just would perform an "AsQueryable()" on those lists before returning them.

I am very interested in discussing this with members of the BlogEngine team. I am also willing to help implement these changes.

Chris Pietschmann
Microsoft MVP - Windows Live Platform

Jan 5, 2011 at 4:27 PM

That sounds very interesting. We are preparing to support multiple blogs on single install in the next version which will require significant redesign in data access layer. Performance improvements will be much needed as multiple blogs per install almost by default means scaling up. Chris, if you willing to help please contact me or Ben Amada by email and we can discuss this in depth.

Jan 5, 2011 at 8:09 PM

Regarding XmlBlogProvider - I believe it would be line with Chri's idea, if BE used XML queryable classes (XDocument, XElement, XNodeect) instead of XML DOM. The queryable classes provide are simpler and cleaner programmer interface and method result are returned as IQueryable.

Personally, I find XML DOM clumsy and unintuitive and I would prefer XDocument even as a navigational interface.


Jan 5, 2011 at 10:22 PM

I'd like to see the dependency injection principal put into the code base as well. A good example of where this would provide benefit is the caching implementation that is currently tightly coupled to the Active Record style classes. There is currently no flexibility for people to push in their own dependency slices to do things like caching via other mechanisms, instrumentation and other extensibility purposes.

For example, changing the caching design would have a massive impact on the current code base. If DIP was used, I could remove the current in memory cache and push in something like an on-demand caching implementation for posts, comments and pages using a sliding expiration. Such an implementation would be a better balance between memory usage and performance.

Jan 6, 2011 at 1:37 AM

I know a little about Linq to XML, but am not sure if it can, or how it works with multiple XML files.  I'm thinking about the Posts.  For example, if there was a large XML file named Posts.xml, and all the Posts were in that single file, I know you can use Linq to XML to query the <post>s in there.  However, BE stores each post as a separate XML file.  So if you were using Linq or EF, is it capable of querying all the separate Post XML files when running a query for posts?

I definitely agree that not storing all the data in memory at once is the way to go.  At the same time, if EF is able to include multiple XML files while querying posts, it seems like the process would be noticeably slower than what we have now where all the posts are in memory.  I think EF and IQueryable is particularly efficient for databases since DB are geared for queries, allow indexes, etc.  Querying many XML files (or even just a few XML files) that are not somehow pre-parsed or pre-cached seems expensive.

I remember hearing that EF works for MySQL -- that is good.  And if of course can be used for SQL Server.  But does it work for the other DBs BE currently supports ... SQLite & SQLCE ?

Jan 6, 2011 at 2:54 AM

I understand the concerns about the current way BE stores posts in multiple XML files. That would be why I mentioned that you could leave the XML code largely the same. Also, only smaller sites will use the XML provider, and any larger sites will surely use a database back-end.

By converting the data access layer to returning IQueryable's it would give a much larger option for data providers. For example, you could refactor the current Db provider to return IQueryable's without sacrificing the support for SQLite and SQLCE; if EF doesn't support those. Then a separate EF based provider could be made to support databases that Entity Framework supports.

According to the following link, it looks like EF supports many more databases than just SQL Server and SQL Azure.

Jan 6, 2011 at 3:35 PM

Using LiINQ to XML would be in fact an incremental change. If you're using LINQ with XML seems only natural to use LINQ to XML. It simpler and it results in more maintainable code.  Can be used with multiple XML file with only small changes. If it helps to unify storage access, it's even better.

The idea of a queryable access layer sounds very good.  How about using LINQ to SQL. it seems more like an incremental change than using EF. 

Jan 6, 2011 at 9:27 PM

LINQ to SQL only supports SQL Server.  EF supports MySQL and other databases too.

Jan 6, 2011 at 9:58 PM

True, I stand corrected.

Jan 13, 2011 at 2:44 PM

What version would you guys want to implement this type of change? It is a drastic change and would have fit great in the v2 release. I'm not sure you would want to make such a drastic change to the architecture of BE in a v2.1 type release.

What are your thoughts on this?

Jan 13, 2011 at 4:05 PM

IMO, 2.1 should be all about multiple blogs on single install. If proposed is not a huge overhead - it can be build alongside with multi-blog features. If it is too risky - we better put it on hold. In any case, we would need to branch it out and do prototype proving concept before jumping in. That would let us to examine new design and evaluate amount of work needed.

Mar 19, 2011 at 5:11 AM

I just spent some time looking at the code to see what would need to be changed to modify BE to use the repository pattern via IQueryable<> and have found that basically the entire data access layer will need to be rewritten. It doesn't look to be as easy as I originally thought. Here's a couple design change suggestions that would be either be required or extremely useful when implementing the repository pattern:

  • Modify the BlogProviders to return IQueryable<> instead of List<>
  • Modify the "core" object types (such as Post, Referrer, etc.) used through out to be passed around as Interfaces (IPost, IReferrer, etc.) instead of base classes. This would make the DAL much more pluggable that it currently is.

An additional change that would make everything more flexible would be to return IList<> instead of List<> from methods that couldn't or wouldn't be converted to IQueryable<>. If modified in the current BE, IList<> would allow for custom BlogProviders that may not use List<>. I know it would be a pain to change now, but using IList<> when initially creating the app doesn't take any more effort that List<> and adds the benefit of making things more flexible.

One last thing that would be extremely helpful with BE, as it is with any project, is Unit Tests.

I'm not meaning to imply that the architecture of BE is bad, but rather point out some areas where it could possibly be improved. So please take the above suggestions/comments constructively. :)