Hello there! It looks like this might be your first time to my website. You should...
Check out ChatPast
 Read and search you
 chat history in the cloud
Subscribe to my
RSS feed
Follow @mkennedy
on Twitter.

MongoDB vs. SQL Server 2008 Performance Showdown

Thursday, April 29, 2010 10:41:30 AM (Pacific Standard Time, UTC-08:00)


This article is a follow up one I wrote last week entitled “The NoSQL Movement, LINQ, and MongoDB - Oh My!”. In that article I introduced the NoSQL movement, MongoDB, and showed you how to program against it in .NET using LINQ and NoRM.

I highlighted two cornerstone reasons why you might ditch your SQL Server for the NoSQL world of MongoDB. Those were

1. Ease-of-use and deployment
2. Performance

For ease-of-use, you’ll want to read the original article.

This article is about the performance argument for MongoDB over SQL Server (or MySql or Oracle). In the first article, I threw out a potentially controversial graph showing MongoDB performing 100 *times* better than SQL Server for inserts.

“A potentially controversial graph showing MongoDB performing 100 times better than SQL Server”

We’ll see source code, downloadable and executable examples and you can verify all of this for yourselves. But first, here’s a new twist on an old proverb:

“Data is money”

If your application is data intensive and stores lots of data, queries lots of data, and generally lives and breathes by its data, then you’d better do that efficiently or have resources (i.e. money) to burn.

Let’s imagine you’re creating a website that is for-pay and data intensive. If you were to attempt to plan out your operating costs per user to help guide the pricing of your product then the cost of storing, querying, and managing your data will likely be a significant part of that calculation.

If there is a database that is 100 times faster than SQL Server, free, easy to administer and you program it with LINQ just as you would with SQL Server then that is a very compelling choice.

When you have such a database, it means you can run your system on commodity hardware rather than high-end servers. It means you can have fewer servers to maintain and purchase or lease. It means you can charge a lot less per user of your application and get the same revenue. Think about it.

“It means you can charge a lot less per user of your application and get the same revenue. Think about it.”

One more story before we see the statistics. Kristina Chodorow from 10Gen gave a talk a few weeks ago at San Francisco’s MySQL Meetup entitled “Dropping ACID with MongoDB”. You can watch the recording here:

http://www.ustream.tv/recorded/6146875

[The audio and video isn’t too hot, but the content is. Skip the first minute without audio.]

During this talk, Kristina describes SourceForge’s experience moving from MySql to MongoDB. On MySql, SourceForge was reaching its limits of performance at its current user load. Using some of the easy scale-out options in MongoDB, they fully replaced MySQL and found MongoDB could handle the current user load easily. In fact, after some testing, they found their site can now handle 100 times the number of users it currently supports.

Not convinced of this NoSQL thing yet? Fair enough. Here are some graphs, some stats, and some code.

The scenario:

Model a data intensive web application aiming to support as many concurrent users as possible. There will be users from the web application itself. But there will also be users from an API and external applications. Users will interact with the data by having nearly as many inserts as they do queries. Their inserts are all small pieces of data and are all independent of each other.

Let me just get this out of the way and I mean the following in the nicest of ways: I don’t care about your scenario or use-case. The scenario above is what I’m trying to model. I’m not trying to do bulk-inserts or loading large files into databases or anything like that. MongoDB may be great for these. SQL Server may have specialized features around your use-case, etc. They don’t apply in my scenario. So please don’t wonder why I’m not using bulk inserts or anything like that in the examples below.

Insert Speed Comparison

It’s the inserts where the differences are most obvious between MongoDB and SQL Server.

These inserts were performed by inserting 50,000 independent objects using NoRM for MongoDB and LINQ to SQL for SQL Server 2008. Here are the data models:


MongoDB basic class


SQL Server basic class

I ran five concurrent clients hammering the databases with inserts. Here’s the screenshots for running against MongoDB and against SQL Server. Let’s zoom into the most important result with the output from one of five concurrent clients:

MongoDB:

SQL Server:

That’s right. It’s 2 seconds verses 3 1/2 minutes!

Now to be fair, this was using LINQ to SQL on the SQL side which is slow on the inserts. After discussing these results with some friends, I re-ran the tests using raw ADO.NET style programming and saw a 1.5x-3x performance improvement for SQL. That still leaves MongoDB 30x-50x faster than SQL.

Query Speed Comparison

Now let’s see about getting the data out using the same objects above on the indexed Id field for each database.

Here MongoDB still kicks some SQL butt with almost 3x performance. If we were to leverage the mad scale-out options that MongoDB affords then we could kick that up to many times more.

“If we were to leverage the mad scale-out options that MongoDB affords then we could kick that up to many times more.”

Complex Data and the Real World

Feel like that was an overly simplified example? Here’s some real world data with foreign keys and joins. Below is the complex data model.

MongoDB:

SQL Server:

It shouldn’t surprise you that MongoDB does even better here without its joins.

The Hardware

All of these tests were run on a Lenovo T61 on Windows 7 64-bit with a dual-core 2.8 GHz processor using the 64-bit versions of both SQL Server 2008 Standard and MongoDB 1.4.1. You can even see a picture of the computer here: http://twitpic.com/hywa8

Your Turn

If you want to see the entire set of data above as an Excel spreadsheet, you can download that here:

http://www.michaelckennedy.com/Downloads/sql-vs-mongo.xlsx

You can also download the sample code. Before you do, realize I haven’t done a bunch of work to make it super easy to run. But you should be able to figure it out. Just turn the knobs on the PerfConstants class for the number of inserts and queries. Then comment or uncomment sections of the code in the clients for your scenarios.

The expected use is that you’ll start the launcher application then use it to launch five concurrent clients at exactly the same time.

Download Sample:

http://www.michaelckennedy.com/Samples/SpeedOfSqlVsMongoDBAnddotNetSample.zip

Got feedback? Write a comment or contact me on Twitter: @mkennedy or find me in any of these other ways.

Thanks!

Some thanks are in order for all the help I got bouncing around ideas as well as trying different scenarios. Thanks to

Eric Cain @arcain
Jim Lehmer@dullroar
Karl Seguin @karlseguin


Tweet this Follow me on Twitter Post this to dotnetshoutout.com Digg this Submit this to Stumbleupon email this post
Thursday, April 29, 2010 2:24:28 PM (Pacific Standard Time, UTC-08:00)
Intersting stuff mate, I've been hoping to see some hard data especially on mongodb vs linq to sql.

Just wondering what mongodb driver did you use?
Thursday, April 29, 2010 2:28:28 PM (Pacific Standard Time, UTC-08:00)
Thanks Luke! I'm glad you liked the article. I'm sure people will disagree with parts of it. But they have the source code to try it themselves. :)

-- Michael
Thursday, April 29, 2010 6:51:10 PM (Pacific Standard Time, UTC-08:00)
Thank you very much,I have read it now.
Do you like Timberland Shoes? it includes Timberland Shoes,Timberland Store,Timberland Footwear,Timberland

Mens Boots,Timberland Chukka Boots,Timberland Roll-Top Boots,Timberland Womens Boots,Timberland High Top

Boots,imberland Roll-Top Boots.
you can read:www.fashiontimberland.com.
Thursday, April 29, 2010 7:01:53 PM (Pacific Standard Time, UTC-08:00)
got curious and wanted to play. I didn't have Mongo, but I do have SQLServer 2008 sitting here.
I kicked off 10000 inserts 5 times in parallel, my results:

SQL Server Client - ADO.NET Style
Warming up ...
Building insert data...
Running!
Finished with 10000 SQL inserts in 102.984 sec.
Done

Something to note is that we actually inserted 50,000 rows in that 102 seconds, because all 5 tasks finished within a few Ms of each other.
However, I wanted to know what would happen if we actually did something a little more real world, and used a stored procedure. I moved your query to a sproc, and changed the code to invoke the sproc instead of parsing a manual query...Lets take a look:

SQL Server Client - StoredProcs
Warming up ...
Building insert data...
Running!
Finished with 10000 SQL inserts in 70.259 sec.
Done

Look at that, a very tiny change inserted 50,000 rows (10000 x 5) and shaved 30 seconds off the job.
Another interesting fact is that you used a CHAR(24) for your ID, but are storing a GUID in it? Did you purposely use a data type that makes a horrible key for your tests instead of using the native UNIQUEIDENTIFIER type? Lets find out.
Fixing the table to use the proper type as its ID results in:

SQL Server Client - Stored proc using the proper datatype for a ID
Warming up ...
Building insert data...
Running!
Finished with 10000 SQL inserts in 10.904 sec.
Done

Hot damn, so far, in a span of 5 minutes, I've taken the original run of 102 seconds from running your benchmark code on my machine, and made it insert 50,000 rows in 10.904 seconds, all because I knew what I am doing with SQL.

However, relational databases are going to be slower at insertions than a glorified hashtable, its because of that pesky ACID thing, but here is what is interesting....Why are we launching 5 processes? Lets totally saturate my laptop (Which oddly enough, is has much less power than yours, yet performs better...hmmmmm..... Dell D630 with 2 gigs of ram).
I'm going to run 20 inserts jobs at the same time.

SQL Server Client - Stored proc using the proper datatype for a ID
Warming up ...
Building insert data...
Running!
Finished with 10000 SQL inserts in 62.281 sec.
Done

So, despite adding 15 more clients in parallel blasting 10,000 queries each, the load did not increase linearly at all.... In fact, despite the fact that I'm inserting 200,000 rows, it still finished a full minute before your benchmark, and 40 seconds before your benchmark running on my machine.

All of this aside, this is not a benchmark the reflects reality at all. You rarely have a single client waiting for a database to do 200,000 inserts, instead you have 200,000 clients all doing a single insert at the same time. And that situation, each client is going to get a response from the DB in a matter of milliseconds. In the real world, if you do need to write 200,000 rows, you use a bulkloader and a single query, and you'll see that load take less than a second.

So the TLDR here is, make smart decisions with your database, see decent performance, do really stupid things, see really stupid performance.
FlySwat
Friday, April 30, 2010 6:19:07 AM (Pacific Standard Time, UTC-08:00)
1. Where's DB2 in this mix?
2. SQL Server isn't a DB Engine. It's a platform built around data for integrating, reporting, analyzing, HA, DR etc...
3. This is great for a very small business, or a guy writing software in his garage
4. The context for that example as about as narrow as a hair width as compared to context of data with a business
5. Support?
6. Now what counters can I use in windows to gauge my Mongo's performance?
7. The orders example was overly simple for mongo and one table too many for SQL Server - that showed bias
8. I'd tell him to go get Linux and use nothing but open source
9. When's the next release of Mongo? Who is writing it?
10. I can make a better burger in my kitchen than McDonalds. But I don't have to support an enterprise, stand up kitchens all over the world and be utterly standardized in my approach.
11. How much money does Mike stand to make if Mongo is more prevalent.
12 How much of this is driven by plain old pissed off at MS attitude?

Seems like it might have it's niche - though.
Troy Thompson
Saturday, July 10, 2010 10:00:54 AM (Pacific Standard Time, UTC-08:00)
So, if you don't want to use MongoDB you don't have to. At the end of the day it seems like people get really offended and passionate about defending SQL Server or other RDBMS systems more because they have so much of their personal time invested in them than for any hard facts. I think relational systems are awesome at doing relational things that require rock-solid ACID compliance...but they are pretty terrible at doing other things like storing document style data and handling massive amounts of asynchronous writes. Anyway, I say use the right tool for the job and sometimes it's SQL Server or MySQL, or Oracle...and sometimes it's MongoDB, CouchDB, Reddis, or a plain old text file.
Comments are closed.