Hello there! It looks like this might be your first time to my website. You should...
Subscribe to my
RSS feed
Follow me
on Twitter.
Check out
.NET Dev Buzz

MongoDB vs. SQL Server 2008 Performance Showdown

Thursday, April 29, 2010 10:41:30 AM (Pacific Standard Time, UTC-08:00)


This article is a follow up one I wrote last week entitled “The NoSQL Movement, LINQ, and MongoDB - Oh My!”. In that article I introduced the NoSQL movement, MongoDB, and showed you how to program against it in .NET using LINQ and NoRM.

I highlighted two cornerstone reasons why you might ditch your SQL Server for the NoSQL world of MongoDB. Those were

1. Ease-of-use and deployment
2. Performance

For ease-of-use, you’ll want to read the original article.

This article is about the performance argument for MongoDB over SQL Server (or MySql or Oracle). In the first article, I threw out a potentially controversial graph showing MongoDB performing 100 *times* better than SQL Server for inserts.

“A potentially controversial graph showing MongoDB performing 100 times better than SQL Server”

We’ll see source code, downloadable and executable examples and you can verify all of this for yourselves. But first, here’s a new twist on an old proverb:

“Data is money”

If your application is data intensive and stores lots of data, queries lots of data, and generally lives and breathes by its data, then you’d better do that efficiently or have resources (i.e. money) to burn.

Let’s imagine you’re creating a website that is for-pay and data intensive. If you were to attempt to plan out your operating costs per user to help guide the pricing of your product then the cost of storing, querying, and managing your data will likely be a significant part of that calculation.

If there is a database that is 100 times faster than SQL Server, free, easy to administer and you program it with LINQ just as you would with SQL Server then that is a very compelling choice.

When you have such a database, it means you can run your system on commodity hardware rather than high-end servers. It means you can have fewer servers to maintain and purchase or lease. It means you can charge a lot less per user of your application and get the same revenue. Think about it.

“It means you can charge a lot less per user of your application and get the same revenue. Think about it.”

One more story before we see the statistics. Kristina Chodorow from 10Gen gave a talk a few weeks ago at San Francisco’s MySQL Meetup entitled “Dropping ACID with MongoDB”. You can watch the recording here:

http://www.ustream.tv/recorded/6146875

[The audio and video isn’t too hot, but the content is. Skip the first minute without audio.]

During this talk, Kristina describes SourceForge’s experience moving from MySql to MongoDB. On MySql, SourceForge was reaching its limits of performance at its current user load. Using some of the easy scale-out options in MongoDB, they fully replaced MySQL and found MongoDB could handle the current user load easily. In fact, after some testing, they found their site can now handle 100 times the number of users it currently supports.

Not convinced of this NoSQL thing yet? Fair enough. Here are some graphs, some stats, and some code.

The scenario:

Model a data intensive web application aiming to support as many concurrent users as possible. There will be users from the web application itself. But there will also be users from an API and external applications. Users will interact with the data by having nearly as many inserts as they do queries. Their inserts are all small pieces of data and are all independent of each other.

Let me just get this out of the way and I mean the following in the nicest of ways: I don’t care about your scenario or use-case. The scenario above is what I’m trying to model. I’m not trying to do bulk-inserts or loading large files into databases or anything like that. MongoDB may be great for these. SQL Server may have specialized features around your use-case, etc. They don’t apply in my scenario. So please don’t wonder why I’m not using bulk inserts or anything like that in the examples below.

Insert Speed Comparison

It’s the inserts where the differences are most obvious between MongoDB and SQL Server.

These inserts were performed by inserting 50,000 independent objects using NoRM for MongoDB and LINQ to SQL for SQL Server 2008. Here are the data models:


MongoDB basic class


SQL Server basic class

I ran five concurrent clients hammering the databases with inserts. Here’s the screenshots for running against MongoDB and against SQL Server. Let’s zoom into the most important result with the output from one of five concurrent clients:

MongoDB:

SQL Server:

That’s right. It’s 2 seconds verses 3 1/2 minutes!

Now to be fair, this was using LINQ to SQL on the SQL side which is slow on the inserts. After discussing these results with some friends, I re-ran the tests using raw ADO.NET style programming and saw a 1.5x-3x performance improvement for SQL. That still leaves MongoDB 30x-50x faster than SQL.

Query Speed Comparison

Now let’s see about getting the data out using the same objects above on the indexed Id field for each database.

Here MongoDB still kicks some SQL butt with almost 3x performance. If we were to leverage the mad scale-out options that MongoDB affords then we could kick that up to many times more.

“If we were to leverage the mad scale-out options that MongoDB affords then we could kick that up to many times more.”

Complex Data and the Real World

Feel like that was an overly simplified example? Here’s some real world data with foreign keys and joins. Below is the complex data model.

MongoDB:

SQL Server:

It shouldn’t surprise you that MongoDB does even better here without its joins.

The Hardware

All of these tests were run on a Lenovo T61 on Windows 7 64-bit with a dual-core 2.8 GHz processor using the 64-bit versions of both SQL Server 2008 Standard and MongoDB 1.4.1. You can even see a picture of the computer here: http://twitpic.com/hywa8

Your Turn

If you want to see the entire set of data above as an Excel spreadsheet, you can download that here:

http://www.michaelckennedy.com/Downloads/sql-vs-mongo.xlsx

You can also download the sample code. Before you do, realize I haven’t done a bunch of work to make it super easy to run. But you should be able to figure it out. Just turn the knobs on the PerfConstants class for the number of inserts and queries. Then comment or uncomment sections of the code in the clients for your scenarios.

The expected use is that you’ll start the launcher application then use it to launch five concurrent clients at exactly the same time.

Download Sample:

http://www.michaelckennedy.com/Samples/SpeedOfSqlVsMongoDBAnddotNetSample.zip

Got feedback? Write a comment or contact me on Twitter: @mkennedy or find me in any of these other ways.

Thanks!

Some thanks are in order for all the help I got bouncing around ideas as well as trying different scenarios. Thanks to

Eric Cain @arcain
Jim Lehmer@dullroar
Karl Seguin @karlseguin


Tweet this Follow me on Twitter Post this to dotnetshoutout.com Digg this Submit this to Stumbleupon email this post

The NoSQL Movement, LINQ, and MongoDB - Oh My!

Thursday, April 22, 2010 1:01:01 PM (Pacific Standard Time, UTC-08:00)

Maybe you’ve heard people talking about ditching their SQL Servers and other RDBMS entirely. There is a movement out in the software development world called the "No SQL" movement and it’s taking the web application world by storm.

“Insanity!” you may cry, “for where will people put their data if not in a database? Flat files? Tell me we aren’t going back to flat files.”

No, but in the relational model, something does has to give. The NoSQL movement is about re-evaluating the constraints and scalability of data storage systems in the light of the way modern web applications generate and consume data.

The outcry about flat files above is meant to highlight an assumption developers often have about building data-driven applications: Data goes in the database (SQL Server, Oracle, or MySql). Just maybe, if we are really cutting-edge, we might consider storing our data in the cloud, but the choices generally stop there.

The NoSQL movement asks the question:

“Is the relational database (RDBMS) always the right tool for data storage and data access?”

Starting from an RDBMS is virtually an axiom of software development. However, those of us who are excited about NoSQL believe that relational databases are not always the answer. I think this highlights one of the reasons this NoSQL thing is called a movement. People are realizing they have a choice where they thought they had none.

The converse is, of course, also true. The NoSQL databases are also not always the right choice either. If you look carefully however, you will find that they are a good choice much of the time. Don’t take my word on it. Ask Facebook, Twitter, Digg, SourceForge, WebEx, Reddit and a bunch of other companies here and here that are using NoSQL databases.

This move towards NoSQL is driven by pressure from two angles in the web application world:

  • Ease-of-use and deployment
  • Performance - especially when there are many writers as compared to the number of readers (think Twitter or Facebook).

Choosing NoSQL for Ease-of-Use and Deployment

I cover the programming model in detail as well as introduce the actual database server below. For some vague motivation, let me just give you a quick look at how you define the data model and maintain it.

  1. Define your classes in C# (largely) without regard to putting them in a database. Related classes? Easy - one has a collection of the others.
  2. Create a simple DataContext-like class which exposes each top-level type that is to be stored in the database. This is only a few lines of code per collection (think of this as a table).
  3. Interact with the database using LINQ. This creates the collections (think tables), sets the schema, etc.
  4. Maintain the database and evolve it by maintaining your classes from step 1. *

Why, in the name of all that is right, do we have to model our system twice? Once in the database and once, in parallel, in code? With NoSQL, you have one place to do that - in your C# classes.

* You may have to run a transformation tool if you’re making radical data changes, but that’s true in SQL systems as well.

Choosing NoSQL for Performance

When the number of concurrent clients using your application - and thus your database - is reasonably small (let’s say 500 users as a baseline) RDBMS can work great. But what if that number grows? And if you are writing a web app, you definitely want that number to grow. At 50,000 users, can you still run on a single instance of SQL Server or MySql? How powerful does your hardware have to be to handle that? What about at 500,000 or 5,000,000 users, still good?

I’m sure there are some of you out there thinking, “What a minute now! There are plenty of systems with tons of users built upon relational databases.”

It’s true, there are. But how much expensive hardware and software do these require? How easy is it to leverage *commodity* hardware and free software? A basic SQL Server cluster might run you $100,000 just to get it up and running on decent hardware. Rather than leveraging crazy scaling-up options, the NoSQL databases let you scale-out. They make this possible (dare I say easy?) by dropping the relational aspects of a database. Some NoSQL systems such as MongoDB get even better scalability by loosening some of the durability guarantees – which they backfill somewhat with redundancy (more on MongoDB shortly).

“Ok, ok. So it’s cheaper and simpler,” you say. “How much faster than the finely tune system that is SQL Server 2008 can these open source NoSQL systems be?”

The answer is: MUCH MUCH FASTER. Here’s a simple comparison of running a bunch of concurrent inserts into SQL Server 2008 and MongoDB on the same computer.

Looks like under heavy load, I’d say it’s about 100 times faster. I’m sure there going to be tons of second guessing this graph and so on. Hold your comments please! I’ll be posting a full performance comparison with source code soon. Let me just say that I think the comparison was fair - I’ll back that up in a later post.

NoSQL and a New Programming Model

If we do not have joins and primary / foreign key relationships, how do we associate related data? In NoSQL, there is a way to mimic foreign keys for certain relationships. However the main answer is that you do not disassociate your data in the first place.

I’m sure that you’ve all heard of the object-relational impedance mismatch. A large part of that mismatch comes from the fact that we normalize the data in our database to the extreme and then use joins to reassemble that data. Not only does that cause this so-called impedance mismatch, but those joins can be really slow and they can be the death of any scale-out solution. The key to many of the NoSQL databases’ scalability is that they do not use joins. You simply save large swaths of your data as a single blob (which in MongoDB’s case, is still deeply queriable).

Shortly we’ll look at an example where we build out a disconnected, offline RSS reader that uses MongoDB and LINQ to store its data. But just think about how you might structure your data storage if you could save entire object graphs and still query them? Your "row" might be a Blog object which has an array of BlogEntries which contain the entry text, link, date, etc. Then your *entire* query to pull all the details of a single blog would hit a single “table” in the database. That might look like this query which has one result:

var blog = 
       (from b in ctx.Blogs 
       where b.Id == requestedBlogId 
       select b).FirstOrDefault();

There are no joins or anything like that because you’re saving objects not columns and those objects contain their collections already (e.g. RssEntries). There is an important distinction to make here. These NoSQL databases generally are *not* the same as object databases. They are what are known as document databases. There’s actually a big difference between the two.

Introducing MongoDB

The NoSQL database we are using in this example is MongoDB. This is free, open-source database which runs on Windows, Linux, and Mac OS X systems. You can access it from many platforms including .NET, Ruby, Java, PHP, and so on.

We’ll be using .NET and C# of course. You have several options when choosing how to access MongoDB from .NET but generally that means using LINQ and a light-weight object-mapper on top of MongoDB itself. Note that common terminology might categorize the object mapper that moves objects into and out of the database as an ORM. While that’s OK, there is technically no "R" in this ORM because MongoDB is not relational. Hence I’m calling simply an Object-Mapper (OM).

In MongoDB nomenclature, theses libraries are called drivers. My favorite .NET driver is called NoRM. It’s being actively developed and was created by Karl Seguin, Andrew Theken, Rob Conery, James Avery, and Jason Alexander. You can find NoRM on GitHub and discuss it in its related Google Group.

If you want to learn more about MongoDB you should listen to these Podcast interviews:

Michael Dirolf also has a great book in the works. You can catch a preview of it on Safari Books Online. Here’s the amazon page:

MongoDB: The Definitive Guide.

NoSQL in Action

Let’s write some code. The first step typically in a data-driven application is to spec out the database. Then we’d use LINQ to SQL or Entity Framework to generate the ORM classes. MongoDB is different. MongoDB has no schema or rather its schema is flexible and defined via usage rather than being predefined in the database. So our first step is to define the classes we’d be storing in the DB via NoRM.

We’re going to define 3 classes: Blog, RssEntry, and RssDetail. The Blog object will contain a collection of RssEntry objects. In practice you might just go with the Blog and RssEntry classes. But I wanted to model both the embedded case (Blog + RssEntry) and the loosely defined foreign key style relationship that mimic joins (RssEntry + RssDetail). That way we can demonstrate both use-cases.

Here’s a taste of the Blog class:

public class Blog
{
	public ObjectId _id { get; set; }
	public string Name { get; set; }
	public string Url { get; set; }
	public string RssUrl { get; set; }
	public List<RssEntry> Entries { get; set; }
      // ...
}

Notice that it contains a collection (List<T> really) of RssEntry objects. That’s the relationship supported by nesting. The Blog class just has this collection as part of its data model.

The RssEntry class has the summary info for a blog entry:

public class RssEntry
{
	public ObjectId _id { get; set; }
		
	public Guid UniqueId { get; set; }
	public DateTime PostedDate { get; set; }
	public string Title { get; set; }
	public string RssGuid { get; set; }
}

And the larger data is stored in the RssDetails class (for example the text of the post):

public class RssDetails
{
	public ObjectId _id { get; set; }

	// this is kinda like the foreign key.
	public Guid RssEntryId { get; set; }

	public List<string> Categories { get; set; }
	public string Link { get; set; }
	public string Text { get; set; }
	// ...
}

Let’s see how we insert an entire set of Blog data into the database. We begin by generating the objects (Blog, RssEntry, etc) in memory and then serializing them via NoRM to MongoDB much as you would in LINQ to SQL. The difference is this will actually generate the collections (analogous to tables) if they don’t already exist and it will define the implicit schema to match our objects:

void SaveBlogToMongoDb(
	string rssUrl, XElement root, RssDataContext ctx)
{
	Blog blog = new Blog();
	blog.RssUrl = rssUrl;
	blog.Name = GetBlogName(root);
	blog.Url = GetBlogUrl(root);

	blog.Entries = ParseEntries(root);
	IEnumerable<RssDetails> details 
		= GetDetails(blog.Entries, root);
			
	foreach (RssDetails detail in details)
	{
		ctx.Add(detail);
	}

	ctx.Add(blog);
}

Here we are using a class called RssDataContext which we wrote manually. It is very similar to what LINQ to SQL and Entity Framework use to do the object-relational mapping. Want to do a query? Do you know LINQ? Well then you’re all set:

var results = 
    from b in ctx.Blog 
    where b.Name.Contains( "MongoDB" ) 
    select b;

How do you add a new entry to an existing blog and update it in the database?

void AddEntry(Blog blog, RssEntry entry)
{
	blog.Entries.Add(entry);
	ctx.Save(blog);
}

We leverage the fact that the blog.Entries collection is a List and just add to it. Then save will update the record in the DB.

All this works great and is highly performant. But do be careful as not all the LINQ operations are fully implemented yet in NoRM and some (like join) may never be added because MongoDB doesn’t support it.

To get started, download MongoDB the tools and server here:

http://www.mongodb.org

You unzip the zip file and run the mongod.exe program. Be sure that you have created the C:\data\db folder. It appears at first that you have to run MongoDB in a console window. But you can register it as a Windows Service:

Here’s some helpful advice on installing MongoDB as a Windows Service (there is a small bug you have to work around):

http://www.deltasdevelopers.com/post/Running-MongoDB-as-a-Windows-Service.aspx

There’s also a management console (and I mean "console"):

It’s a little different. You’ll get used to it. The means of interaction with the server is through JavaScript rather than T-SQL and the storage format is a binary form of JSON as you can see.

For a project I’m working on I’ve built a Windows Forms UI that lets me manage the database easily by just adding an object data source and doing some drag-drop magic in Visual Studio. Generally I look down upon that sort of development, but for an admin tool it’s just fine.

Now It’s Your Turn!

Try it out for yourself. Download MongoDB and the NoRM driver and build some apps. You may also want to check out the source code for my demo app:

Download Sample: RssMongoSample-Kennedy.zip

Got feedback? Write a comment or contact me on Twitter where I'm @mkennedy or find me in any of these other ways.

Recommended Reading:

Here are some other blogs on this subject.


 
Tweet this Follow me on Twitter Post this to dotnetshoutout.com Digg this Submit this to Stumbleupon email this post

Handy Web Development Technique

Thursday, February 25, 2010 9:32:15 PM (Pacific Standard Time, UTC-08:00)
I'm working on a fantastic website that I hope will have significant impact when it's ready. I'm planning on launching in roughly one month.

I came across what I think is an awesome technique for seeing how your web page will look as you edit it. This is WAY beyond WYSIWIG:
  1. Load the page you're working on in ALL the browser you care about. I'm using Chrome 4, FireFox 3.6, and IE 8.
  2. If you have the monitor space, cascade these browsers side-by-side.
  3. Add a meta-refresh tag to the header of that HTML file you're working on (or which consumes the CSS you're building)

       <meta http-equiv="refresh" content="5" />

  4. Now here's the sweet part:

    Edit the page in Visual Studio, notepad, whatever. When you press save all browsers reload their view in a few seconds!

  5. Now you get real WYSIWIG on real browsers.
That's it. The technique is totally low tech and would have worked for years. But I found it really helpful. Hope you do too.

Be sure to keep watching here. I promise a cool site will be announced soon!

Cheers!
Michael
@mkennedy

Tweet this Follow me on Twitter Post this to dotnetshoutout.com Digg this Submit this to Stumbleupon email this post

ASP.NET WebForms + Routing Video and Downloads

Wednesday, December 09, 2009 11:38:45 AM (Pacific Standard Time, UTC-08:00)

I recently did a webcast for DevelopMentor on using the routing framework introduced in ASP.NET MVC within ASP.NET WebForms based applications to build more modern websites without a major rewrite of existing web applications. The talk was called "Building Modern Websites with ASP.NET WebForms".

Here's all the related downloads. We had some microphone troubles so I want to apologize in advance for the sub-optimal sound quality.

    Watch streaming video (WMV HQ)

   Watch streaming video (WMV HQ)    Download WMV Video Listen to MP3 Streaming Download MP3

You can also download the slides and peepleocity.com sample website built during the presentation.

Tweet this Follow me on Twitter Post this to dotnetshoutout.com Digg this Submit this to Stumbleupon email this post

Webcast: Building Modern Apps in ASP.NET WebForms

Thursday, November 05, 2009 8:20:00 PM (Pacific Standard Time, UTC-08:00)

At DevelopMentor we have been running a bunch of free webcasts. Last month it was TDD and Agile. This month we are running 4 webcasts celebrating the announcements around .NET 4.0, Visual Studio 2010, and PDC 2009.

Join me Monday, November 23rd and register here:

     http://bit.ly/aspwebforms

We’ll talk about integrating ASP.NET’s routing infrastructure into existing an ASP.NET WebForms application. This allows you to build SEO websites with URLs like

      http://dotnet.ubbuzz.com/tag/.NET+4.0

while still taking advantage of all the productivity features of WebForms such as post-backs, controls, UpdatePanel, and so on.

We have room for a couple hundred more attendees so please register and be part of the fun. I promise lots of demos and somedisdainful comments about PowerPoint!

Share it with your friends (social, virtual, real, and other types) using the widgets below!

ASP.NET MVC: What’s that, you’d rather hear about ASP.NET MVC, not this creaky old WebForms stuff? That’s Brock Allen’s talk: http://bit.ly/intromvc

WF 4: Is WF 4 and visual programming your thing? Check out Maurice de Beijer’s WF 4 talk: http://bit.ly/meetwf4

New Parallel Extensions your thing: Check out Andy Clymer’s PFX talk. (link to follow soon).

Cheers,
Michael

Tweet this Follow me on Twitter Post this to dotnetshoutout.com Digg this Submit this to Stumbleupon email this post

ASP.NET Routing in Windows Azure Using WebForms

Wednesday, May 27, 2009 1:53:18 PM (Pacific Standard Time, UTC-08:00)

I'm a huge fan of ASP.NET Routing. It gained popularity as the part of ASP.NET MVC which channels requests for a given URL to the right controller action. In a wise move, Microsoft moved the routing infrastructure out of ASP.NET MVC and into its own assembly with the release of .NET 3.5 SP1.

With ASP.NET Routing you can construct search engine optimized and human friendly URLs such as these:


Here part of the URL (tag or user) selects the page and part of the URL (everything or codinghorror) are effectively query parameters to the page.

This is well documented in the ASP.NET MVC world running on your server - you can't get anything done without it in MVC. But what about Windows Azure? What if you don't want ASP.NET MVC? What if you're a traditional type of person and want all the goodness that comes with what is now called ASP.NET WebForms (aka "normal ASP.NET")?

In this brief post, I'll cover how to use ASP.NET routing and ASP.NET WebForms in Azure. The sample project can be downloaded if you want to follow along. Phil Haack has written a good post on using routing alongside ASP.NET WebForms so I won't cover too much background information.


How does this change for Azure?

The short answer is that it doesn't. If you get routing working for IIS 7 in your web app, you can effectively deploy it to Azure. But the steps always felt convoluted to me when reading others' write-ups on this. So let's run through converting a Windows Azure Web Role (essentially a "stock" ASP.NET WebForms app) to use routing in Azure.

First you'll need the Azure SDK and Visual Studio tools:


  1. Next, create a new solution in Visual Studio by choosing Cloud Service->Web and Worker Cloud Service.

  2. Add a new Global.asax file to your web role project.

  3. Add a reference to System.Web.Routing and System.Web.Abstractions in your web role project.

  4. Define a custom class that derives from IRouteHandler which will map URL parameters into the HttpContext for use in your pages:
    internal class CustomRoute : IRouteHandler
    {
        public CustomRoute(string virtualPath)
        {
            VirtualPath = virtualPath;
        }
    
        public string VirtualPath { get; private set; }
    
        public IHttpHandler GetHttpHandler(RequestContext requestContext)
        {
            foreach ( var aux in requestContext.RouteData.Values )
            {
                HttpContext.Current.Items[aux.Key] = aux.Value;
            }
    
            return BuildManager.CreateInstanceFromVirtualPath(
                       VirtualPath, typeof( Page ) ) as IHttpHandler;
        }
    }
    
  5. Register these routes in the Application_Start method of your Global.asax:
    protected void Application_Start(object sender, EventArgs e)
    {
        RouteTable.Routes.Add( "ShowName",
               new Route(
                "naming/show/{name}",
                new CustomRoute( "~/ShowName.aspx" )
                ) );
    
        RouteTable.Routes.Add( "CreateAccount",
               new Route(
                "account/begin",
                new CustomRoute( "~/Account.aspx" )
                ) );
    
        RouteTable.Routes.Add( "Home",
               new Route(
                "home",
                new CustomRoute( "~/Default.aspx" )
                ) );
    }
    
    Now if you run your app, you might expect the routing infrastructure to work. Inside the ASP.NET Dev Server (aka cassini) this will likely work. But in the Azure Development Fabric you'll see this:



    The problem is you need to tell IIS 7.5 to get out of the way and let the request get to ASP.NET.

  6. We'll define a class to short-circuit the IIS validation
    class Iis7RoutingHandler : UrlRoutingHandler
    {
        protected override void VerifyAndProcessRequest(
            IHttpHandler httpHandler, HttpContextBase httpContext)
        {
        }
    }
  7. Modify the web.config by adding a handler and module to the system.webServer section:
    
    
    ...
      
    
    
      
    
    
    
  8. Finally, we need to recover the data passed to the page. For example, in the sample project we have:

    route: /naming/show/{name}
    example: /naming/show/michael-kennedy

    How will our page access the value of name? Recall that our custom route stashes the values in HttpContext.Current.Items. We'll just pull them back out as follows in our Page_Load method of our ASPX class:
    LabelName.Text = (string)HttpContext.Current.Items["name"];
That's it! You can see our routes working in our WebForms app running in Azure (well, technically the screenshot is the dev fabric - but it works in the cloud as well):

Download the source and try it for yourself: AzureRoutingSample.zip (136 KB)

Tweet this Follow me on Twitter Post this to dotnetshoutout.com Digg this Submit this to Stumbleupon email this post

Article: Azure Storage

Wednesday, April 08, 2009 12:03:23 PM (Pacific Standard Time, UTC-08:00)

I recently wrote an article for DevelopMentor's Developments newsletter entitled Azure Storage. Read it at the DevelopMentor website here:

http://www.develop.com/content/newsletters/aprilazure

I've republished here for my readers. Enjoy!


Developments: Azure Storage

by Michael Kennedy

[Listen to this article as a podcast: Azure-Storage-Article-Kennedy.mp3]

October 27th 2008, Los Angeles CA - It's 9 AM and Microsoft is hosting PDC (their most forward looking developer conference). Ray Ozzie and company are introducing Windows Azure: A new platform which is their first foray into the nascent world of large-scale utility computing. This scalable and reliable platform-as-a-service functionality is commonly referred to as "Cloud Computing" because it runs somewhere out there on the Internet.

Computing platforms that rival the reliability of the utility grids (e.g. electric and gas) which we daily take for granted have long been the stuff of dreams.

A few companies have realized this dream - Google and Amazon come to mind as a couple of the rare exceptions who have accomplished this goal. These companies' web properties seem to handle unbounded amounts of traffic with zero down time. The data centers, redundancies, software engineering and operations know-how required to make this happen are exceedingly expensive. Some reports have Google spending over $2.4 billion (that's 2,400 million dollars) on data centers in 2007 alone.

Prior to large-scale cloud computing efforts (circa 2005), most of us could only dream of such scalability and reliability. Today we have at least three highly reputable companies offering some kind of pay as you go cloud computing platform - Microsoft, Amazon, and Google.

Microsoft's Azure is a new comer to the industry. But for .NET developers, it is not to be ignored. Azure allows you to use your existing skills to build essentially the same .NET applications you are familiar with and "deploy them to the cloud."

These scalable, reliable, and geographically-replicated applications that run on Azure depend on data of course. Virtually all applications we write will be nothing without their underlying data. But if we simply use the tried and true methods of data storage such as the file system or a (single) database server our data is not all that scalable or reliable. Because we cannot have a scalable and reliable application without data, we need a new mechanism for storing and accessing data from our Azure applications.

Enter Azure Storage

Azure storage is the storage component of the Azure platform. It is actually three data services in one:

  • Blob Storage - stores unstructured data essentially as a file, limited to 50 GB of data per blob.
  • Table Storage - stores structured data that is somewhat like a database. For full database capabilities there is a high level feature called Sql Data Services (SDS).
  • Queues - provides interprocess communication functionality between various web and worker roles in your hosted services or even applications running outside of Azure. Queues can pass small xml or binary messages - less than 64 kb per message.

In this article, we will cover just the basics of the three storage services of Windows Azure. I want to give you a sense for what it's like to program against Azure Storage. At the base level all access to Azure Storage uses pure REST APIs. This means that you can access it from any HTTP enabled platform / language. For example, to download the blob data called "config.xml" in the container called "settings" for the Azure project "kennedy" you would simply issue a GET to the Uri:

   http://kennedy.blob.core.windows.net/settings/config.xml

To save data in a blob you do HTTP POSTs and PUTs in a similar fashion. However, real life is full of edge cases, error handling, security, and serialization which makes the pure HTTP model error prone. Thus, a sample library serves as the de facto .NET API to Azure Storage and ships with the Azure SDK. It is called StorageClient and can be found in default installs here:

   C:\Program Files\Windows Azure SDK\v1.0\samples\StorageClient

We will examine working with each of the storage services from the perspective of the StorageClient library - but keep in mind that ultimately this library is a wrapper around a basic and open RESTful API.

Setting The Stage: The Sample Application

To explore Azure Storage I have written a simple photo sharing distributed application. These set of applications allow users to upload photos to a photo sharing site. These photos must be reviewed and approved by moderators of the site. Once approved, the general public can view and interact with the photos. For a concrete example, you could imagine writing a distributed version of the wallpaper sharing site InterfaceLift and deploying it on Azure in this fashion.

You can download the sample application and follow along if you want to see the full source code and try it out yourself. Just be sure to start the Development Storage utility that comes with the Azure SDK before running the application.

Our distributed application consists of three parts.

  • The Uploader: A Windows Forms application that lets contributors upload images to the site.
  • The Reviewer: A Windows Forms application that lets moderators view image submissions and either approve or reject them.
  • The Website: An ASP.NET website for viewing the photos - this is our public facing application.

A typical use case might be as follows (see diagram below).

  1. We upload a photo submission with our uploader application. The photo is uploaded to Azure blob storage and a message is sent via an Azure Message Queue to all available reviewer applications. Additional information about the submitter is associated with the photo in Azure table storage.
  2. The reviewer application watches the message queue for new messages. When one arrives, the photo is added to a list of pending submissions. The reviewer can either reject (delete) the submission or approve it - move it to a permanent blob storage location where it will be publicly viewable.
  3. Users visit our website and can view all approved photos. This list will change in real-time because it is driven by the reviewer application. The web application simply pulls all photos from the approved photo container in Azure blob storage.

Saving Data: Creating Azure Blobs

To save data to Azure Blob Storage, you must realize blob storage follows the ACE pattern (Authority, Container, Entity) to describe a blob. Authority is simply your Azure solution name. Containers are analogous to folders. And entities are analogous to files.

The listing below is essentially the code that runs when the uploader application uploads a pending image submission to blob storage.

Listing 1:

Sending Notifications: Azure Queuing

In addition to uploading the image to the pending images container in blob storage, we will send a message to a message queue to notify any active or future reviewers of the new submission.

Listing 2.

Saving (More) Data: Structured Storage and Azure Tables

Finally, for the upload application, we must also save some information about the contributor. In Azure Storage we have two reasonable places to store this information.

First, we could save this information directly in blob storage as meta-data associated with the blob itself. This is straightforward and easy. But there is a big limitation: information in this meta-data is not queryable. Suppose I want get all images associated with a single contributor. There is no way in Azure Blob Storage to say give me all the blobs with this filter on the meta-data. You would have to pull the properties of every blob and do the comparison client-side. That's tantamount to filling a DataSet with "SELECT * FROM PendingImages" and it's a bad idea.

Instead we will use the third type of Azure Storage: Azure Table Storage. Table Storage allows us to store data with up to 256 properties and query this data as if it were a database. It is exactly what we need for the contributor information. However, you must realize this is not a database. A better mental picture is a durable collection of Dictionary object (as in Dictionary from System.Collections.Generics) with querying built on top. I say this because there is no schema or relational constructs in Azure Table Storage. If you need that, then you'll want Sql Data Services - a service on top of the core Azure platform.

The code to add an entry to Azure Table Storage does not fit into a single method as it's driven through the interaction of several classes we must define. Azure Table Storage can be accessed via ADO.NET Data Services (client-side) and this is the method we will use.

First we'll define a client-side schema for our entry by creating a class called Contributor which derives from the class TableStorageEntity (from the StorageClient library).

Listing 3.

Additionally we must define the tables and queries available to ADO.NET Data Services by created a class derived from TableStorageDataServiceContext and we do that below. We simply have one table called Contributors.

Listing 4.

With those two items in place, we can insert a "row" into Azure Table Storage as follows:

Listing 5.

As for querying Azure Table Storage that is very straight-forward. Because we are using ADO.NET Data Services, querying can be done via LINQ as in "from c in svc.Contributors select c.Name". Ultimately ADO.NET Data Services is also built on a RESTful API so this translates to the underlying HTTP REST calls. Alternatively, you can use that REST API directly from .NET or any other platform.

Waiting on Queues: The Reviewer Application's Code

Next, let's look at how we monitor and pull messages from Azure Queuing. Ultimately we must poll the queue using a RESTful HTTP request. But the StorageClient resurfaces this to us as simple events.

Listing 6.

We won't cover how we move a blob from the pendingImages blob container to the approvedImages blob container which happens when a reviewer approves an image. You can look at the sample to see how that is done.

Ultimately It's About the Website

Finally, let's look at the web application that actually displays the approved images. We don't do anything fancy such as paging or error handling that you'd see in a real application. But this will give you a good idea how to work with the blob data as a collection.

Here we'll create a BlobStorage object and access the BlobContainer approvedImageContainer as we have been in most of the listings. But then instead of saving or reading blobs, we use the ListBlobs method to simply list all the approve images in that container. In order to show the images on our webpage, we just use the BlobProperties.Uri and directly reference that in our HTML. Our ASP.NET application does not touch the data. Rather the consumers (IE, Firefox, Chrome, etc) of the HTML pull the image data directly from blob storage as they would from any web server.

Listing 7.

Now you have a good idea of the concepts and motivation behind Azure Storage. You have seen some typical usages of each of the three storage features: blob storage, table storage, and queuing. Our samples made use of the sample storage API library called StorageClient. Underlying this library we saw that Azure Storage is entirely accessed via RESTful APIs.

Want to get started? Visit http://www.azure.com and choose "Try It Now" to register for a CTP Azure account. You'll need to download the various SDK's listed on that same page. They will install the Visual Studio projects required for working with Azure as well as the Development Storage and Development Fabric so you can develop and debug your applications before deploying them to the cloud.

If you want some intensive, expert-lead training on Azure and associated .NET 4.0 topics be sure to contact DevelopMentor. Or call 800.699.1932 to find out what classes we have available today.

Tweet this Follow me on Twitter Post this to dotnetshoutout.com Digg this Submit this to Stumbleupon email this post

dotNetDevBuzz on Channel 9 Last Week

Sunday, March 08, 2009 8:28:19 PM (Pacific Standard Time, UTC-08:00)
Any of my technically savvy friends know that I'm a big fan of Channel 9. If you want the raw, inside view of Microsoft's developer world, it's a great place to start. That's why it was a big honor for me that my .NET Developer community website

   http://dotnet.ubbuzz.com

was featured on "This Week on Channel 9" last week.


  (see segment 7:45 - 10:25)

Thank you Dan Fernandez for covering our site! If you're unfamiliar with .NET Dev Buzz, then what are you waiting for? You'd better check it out!

Tweet this Follow me on Twitter Post this to dotnetshoutout.com Digg this Submit this to Stumbleupon email this post


Just a site note: I'm doing my part to rid the world of IE 6. Visit this site with IE 6 and you'll get a shameful message telling you to "Stop Living in the Past".