Thursday, April 29, 2010 10:41:30 AM (Pacific Standard Time, UTC-08:00)
This article is a follow up one I wrote last week entitled
“The NoSQL Movement, LINQ, and MongoDB - Oh My!”. In that article I introduced
the NoSQL movement, MongoDB, and showed you how to program against it in .NET using
LINQ and NoRM.
I highlighted two cornerstone reasons why you might ditch your SQL Server for the
NoSQL world of MongoDB. Those were
1. Ease-of-use and deployment
2. Performance
For ease-of-use, you’ll want to
read the original article.
This article is about the performance argument for MongoDB over SQL Server (or MySql
or Oracle). In the first article, I threw out a potentially controversial graph
showing MongoDB performing 100 *times* better than SQL Server for inserts.
“A potentially controversial graph showing MongoDB performing 100 times better than
SQL Server”

We’ll see source code, downloadable and executable examples and you can verify all
of this for yourselves. But first, here’s a new twist on an old proverb:
“Data is money”
If your application is data intensive and stores lots of data, queries lots of data,
and generally lives and breathes by its data, then you’d better do that efficiently
or have resources (i.e. money) to burn.
Let’s imagine you’re creating a website that is for-pay and data intensive. If you
were to attempt to plan out your operating costs per user to help guide the pricing
of your product then the cost of storing, querying, and managing your data will
likely be a significant part of that calculation.
If there is a database that is 100 times faster than SQL Server, free, easy to administer
and you program it with LINQ just as you would with SQL Server then that is a very
compelling choice.
When you have such a database, it means you can run your system on commodity hardware
rather than high-end servers. It means you can have fewer servers to maintain and
purchase or lease. It means you can charge a lot less per user of your application
and get the same revenue. Think about it.
“It means you can charge a lot less per user of your application and get the same
revenue. Think about it.”
One more story before we see the statistics. Kristina Chodorow from 10Gen gave a
talk a few weeks ago at San Francisco’s MySQL Meetup entitled “Dropping ACID with
MongoDB”. You can watch the recording here:
http://www.ustream.tv/recorded/6146875
[The audio and video isn’t too hot, but the content is. Skip the first minute without
audio.]
During this talk, Kristina describes SourceForge’s experience moving from MySql
to MongoDB. On MySql, SourceForge was reaching its limits of performance at its
current user load. Using some of the easy scale-out options in MongoDB, they fully
replaced MySQL and found MongoDB could handle the current user load easily. In fact,
after some testing, they found their site can now handle 100 times the number of
users it currently supports.
Not convinced of this NoSQL thing yet? Fair enough. Here are some graphs, some stats,
and some code.
The scenario:
Model a data intensive web application aiming to support as many concurrent users
as possible. There will be users from the web application itself. But there will
also be users from an API and external applications. Users will interact with the
data by having nearly as many inserts as they do queries. Their inserts are all
small pieces of data and are all independent of each other.
Let me just get this out of the way and I mean the following in the nicest of ways:
I don’t care about your scenario or use-case. The scenario above is what I’m trying
to model. I’m not trying to do bulk-inserts or loading large files into databases
or anything like that. MongoDB may be great for these. SQL Server may have specialized
features around your use-case, etc. They don’t apply in my scenario. So please don’t
wonder why I’m not using bulk inserts or anything like that in the examples below.
Insert Speed Comparison
It’s the inserts where the differences are most obvious between MongoDB and SQL
Server.


These inserts were performed by inserting 50,000 independent objects using NoRM
for MongoDB and LINQ to SQL for SQL Server 2008. Here are the data models:

MongoDB basic class

SQL Server basic class
I ran five concurrent clients hammering the databases with inserts. Here’s the screenshots
for
running against MongoDB and
against SQL Server. Let’s zoom into the most important result with the
output from one of five concurrent clients:
MongoDB:

SQL Server:

That’s right. It’s 2 seconds verses 3 1/2 minutes!
Now to be fair, this was using LINQ to SQL on the SQL side which is slow on the
inserts. After discussing these results with some friends, I re-ran the tests using
raw ADO.NET style programming and saw a 1.5x-3x performance improvement for SQL.
That still leaves MongoDB 30x-50x faster than SQL.
Query Speed Comparison
Now let’s see about getting the data out using the same objects above on the indexed
Id field for each database.


Here MongoDB still kicks some SQL butt with almost 3x performance. If we were to
leverage the mad scale-out options that MongoDB affords then we could kick that
up to many times more.
“If we were to leverage the mad scale-out options that MongoDB affords then we could
kick that up to many times more.”
Complex Data and the Real World
Feel like that was an overly simplified example? Here’s some real world data with
foreign keys and joins. Below is the complex data model.
MongoDB:

SQL Server:

It shouldn’t surprise you that MongoDB does even better here without its joins.


The Hardware
All of these tests were run on a Lenovo T61 on Windows 7 64-bit with a dual-core
2.8 GHz processor using the 64-bit versions of both SQL Server 2008 Standard and
MongoDB 1.4.1. You can even see a picture of the computer here: http://twitpic.com/hywa8
Your Turn
If you want to see the entire set of data above as an Excel spreadsheet, you can
download that here:
http://www.michaelckennedy.com/Downloads/sql-vs-mongo.xlsx
You can also download the sample code. Before you do, realize I haven’t done a bunch
of work to make it super easy to run. But you should be able to figure it out. Just
turn the knobs on the PerfConstants class for the number of inserts and queries.
Then comment or uncomment sections of the code in the clients for your scenarios.
The expected use is that you’ll start the launcher application then use it to launch
five concurrent clients at exactly the same time.
Download Sample:
http://www.michaelckennedy.com/Samples/SpeedOfSqlVsMongoDBAnddotNetSample.zip
Got feedback? Write a comment or contact me on Twitter:
@mkennedy or find me in
any of these other ways.
Thanks!
Some thanks are in order for all the help I got bouncing around ideas as well as trying different scenarios.
Thanks to
Eric Cain @arcain
Jim Lehmer@dullroar
Karl Seguin @karlseguin
Posted in Articles | ASP.NET | NoSQL | Open Source | web2.0 |
Thursday, April 22, 2010 1:01:01 PM (Pacific Standard Time, UTC-08:00)
Maybe you’ve heard people talking about ditching their SQL Servers and other RDBMS
entirely. There is a movement out in the software development world called
the "No
SQL" movement and it’s taking the web application
world by storm.
“Insanity!” you may cry, “for where will people put their data if not in a database?
Flat files? Tell me we aren’t going back to flat files.”
No, but in the relational model, something does has to give. The NoSQL movement
is about re-evaluating the constraints and scalability of data storage systems in
the light of the way modern web applications generate and consume data.
The outcry about flat files above is meant to highlight an assumption developers
often have about building data-driven applications: Data goes in the database (SQL
Server, Oracle, or MySql). Just maybe, if we are really cutting-edge, we might consider
storing our data in the cloud, but the choices generally stop there.
The NoSQL movement asks the question:
“Is the relational database (RDBMS) always the right tool for data storage and data
access?”
Starting from an RDBMS is virtually an
axiom of software development. However, those of us who are excited about
NoSQL believe that relational databases are not always the answer. I think this
highlights one of the reasons this NoSQL thing is called a movement. People are
realizing they have a choice where they thought they had none.
The converse is, of course, also true. The NoSQL databases are also not always the
right choice either. If you look carefully however, you will find that they are
a good choice much of the time. Don’t take my word on it. Ask Facebook, Twitter,
Digg, SourceForge, WebEx, Reddit and a bunch of other companies
here and
here
that are using NoSQL databases.
This move towards NoSQL is driven by pressure from two angles in the web application
world:
- Ease-of-use and deployment
- Performance - especially when there are many writers as compared to the number of
readers (think Twitter or Facebook).
|
Choosing NoSQL for Ease-of-Use and Deployment
I cover the programming model in detail as well as introduce the actual database
server below. For some vague motivation, let me just give you a quick look at how
you define the data model and maintain it.
- Define your classes in C# (largely) without regard to putting them in a database.
Related classes? Easy - one has a collection of the others.
- Create a simple DataContext-like class which exposes each top-level type that is
to be stored in the database. This is only a few lines of code per collection (think
of this as a table).
- Interact with the database using LINQ. This creates the collections (think tables),
sets the schema, etc.
- Maintain the database and evolve it by maintaining your classes from step 1. *
|
Why, in the name of all that is right, do we have to model our system twice? Once
in the database and once, in parallel, in code? With NoSQL, you have one place to
do that - in your C# classes.
* You may have to run a transformation tool if you’re making radical data changes,
but that’s true in SQL systems as well.
Choosing NoSQL for Performance
When the number of concurrent clients using your application - and thus your database
- is reasonably small (let’s say 500 users as a baseline) RDBMS can work great.
But what if that number grows? And if you are writing a web app, you definitely
want that number to grow. At 50,000 users, can you still run on a single instance
of SQL Server or MySql? How powerful does your hardware have to be to handle that?
What about at 500,000 or 5,000,000 users, still good?
I’m sure there are some of you out there thinking, “What a minute now! There are
plenty of systems with tons of users built upon relational databases.”
It’s true, there are. But how much expensive hardware and software do these require?
How easy is it to leverage *commodity* hardware and free software? A basic SQL Server
cluster might run you $100,000 just to get it up and running on decent hardware.
Rather than leveraging crazy scaling-up options, the NoSQL databases let you scale-out.
They make this possible (dare I say easy?) by dropping the relational aspects of
a database. Some NoSQL systems such as MongoDB get even better scalability by loosening
some of the durability guarantees – which they backfill somewhat with redundancy
(more on MongoDB shortly).
“Ok, ok. So it’s cheaper and simpler,” you say. “How much faster than the finely
tune system that is SQL Server 2008 can these open source NoSQL systems be?”
The answer is: MUCH MUCH FASTER. Here’s a simple comparison of running a bunch of
concurrent inserts into SQL Server 2008 and MongoDB on the same computer.
Looks like under heavy load, I’d say it’s about 100 times faster. I’m sure there
going to be tons of second guessing this graph and so on. Hold your comments please!
I’ll be posting a full performance comparison with source code soon. Let me just
say that I think the comparison was fair - I’ll back that up in a later post.
NoSQL and a New Programming Model
If we do not have joins and primary / foreign key relationships, how do we associate
related data? In NoSQL, there is a way to mimic foreign keys for certain relationships.
However the main answer is that you do not disassociate your data in the first place.
I’m sure that you’ve all heard of the
object-relational impedance mismatch.
A large part
of that mismatch comes from the fact that we normalize the data in our database
to the extreme and then use joins to reassemble that data. Not only does that cause
this so-called impedance mismatch, but those joins can be really slow and they can
be the death of any scale-out solution. The key to many of the NoSQL databases’
scalability is that they do not use joins. You simply save large swaths of your
data as a single blob (which in MongoDB’s case, is still deeply queriable).
Shortly we’ll look at an example where we build out a disconnected, offline RSS
reader that uses MongoDB and LINQ to store its data. But just think about how you
might structure your data storage if you could save entire object graphs and still
query them? Your "row" might be a Blog object which has an array of BlogEntries
which contain the entry text, link, date, etc. Then your *entire* query to pull
all the details of a single blog would hit a single “table” in the database. That
might look like this query which has one result:
var blog =
(from b in ctx.Blogs
where b.Id == requestedBlogId
select b).FirstOrDefault();
There are no joins or anything like that because you’re saving objects not columns
and those objects contain their collections already (e.g. RssEntries). There is
an important distinction to make here. These NoSQL databases generally are *not*
the same as object databases. They are what are known as document databases. There’s
actually a
big difference between the two.
Introducing MongoDB
The NoSQL database we are using in this example is
MongoDB.
This is free, open-source database which runs on Windows, Linux, and Mac OS X
systems. You can access it from many platforms including .NET, Ruby, Java, PHP,
and so on.
We’ll be using .NET and C# of course. You have several options when choosing
how
to access MongoDB from .NET but generally that means using LINQ and a light-weight
object-mapper on top of MongoDB itself. Note that common terminology might categorize
the object mapper that moves objects into and out of the database as an ORM. While
that’s OK, there is technically no "R" in this ORM because MongoDB is not relational.
Hence I’m calling simply an Object-Mapper (OM).
In MongoDB nomenclature, theses libraries are called drivers. My favorite .NET driver
is called NoRM. It’s being actively developed and was created by
Karl Seguin,
Andrew Theken,
Rob Conery,
James Avery, and
Jason Alexander.
You can find
NoRM on GitHub and discuss it in its related
Google Group.
If you want to learn more about MongoDB you should listen to these Podcast interviews:
Michael Dirolf also has a great book in the works. You can catch a preview of it
on
Safari Books Online.
Here’s the amazon page:
MongoDB: The Definitive Guide.
NoSQL in Action
Let’s write some code. The first step typically in a data-driven application is
to spec out the database. Then we’d use LINQ to SQL or Entity Framework to generate
the ORM classes. MongoDB is different. MongoDB has no schema or rather its schema
is flexible and defined via usage rather than being predefined in the database.
So our first step is to define the classes we’d be storing in the DB via NoRM.
We’re going to define 3 classes: Blog, RssEntry, and RssDetail. The Blog object
will contain a collection of RssEntry objects. In practice you might just go with
the Blog and RssEntry classes. But I wanted to model both the embedded case (Blog
+ RssEntry) and the loosely defined foreign key style relationship that mimic joins
(RssEntry + RssDetail). That way we can demonstrate both use-cases.
Here’s a taste of the Blog class:
public class Blog
{
public ObjectId _id { get; set; }
public string Name { get; set; }
public string Url { get; set; }
public string RssUrl { get; set; }
public List<RssEntry> Entries { get; set; }
// ...
}
Notice that it contains a collection (List<T> really) of RssEntry objects.
That’s the relationship supported by nesting. The Blog class just has this collection
as part of its data model.
The RssEntry class has the summary info for a blog entry:
public class RssEntry
{
public ObjectId _id { get; set; }
public Guid UniqueId { get; set; }
public DateTime PostedDate { get; set; }
public string Title { get; set; }
public string RssGuid { get; set; }
}
And the larger data is stored in the RssDetails class (for example the text of the
post):
public class RssDetails
{
public ObjectId _id { get; set; }
// this is kinda like the foreign key.
public Guid RssEntryId { get; set; }
public List<string> Categories { get; set; }
public string Link { get; set; }
public string Text { get; set; }
// ...
}
Let’s see how we insert an entire set of Blog data into the database. We begin by
generating the objects (Blog, RssEntry, etc) in memory and then serializing them
via NoRM to MongoDB much as you would in LINQ to SQL. The difference is this will
actually generate the collections (analogous to tables) if they don’t already exist
and it will define the implicit schema to match our objects:
void SaveBlogToMongoDb(
string rssUrl, XElement root, RssDataContext ctx)
{
Blog blog = new Blog();
blog.RssUrl = rssUrl;
blog.Name = GetBlogName(root);
blog.Url = GetBlogUrl(root);
blog.Entries = ParseEntries(root);
IEnumerable<RssDetails> details
= GetDetails(blog.Entries, root);
foreach (RssDetails detail in details)
{
ctx.Add(detail);
}
ctx.Add(blog);
}
Here we are using a class called RssDataContext which we wrote manually. It is very
similar to what LINQ to SQL and Entity Framework use to do the object-relational
mapping. Want to do a query? Do you know LINQ? Well then you’re all set:
var results =
from b in ctx.Blog
where b.Name.Contains( "MongoDB" )
select b;
How do you add a new entry to an existing blog and update it in the database?
void AddEntry(Blog blog, RssEntry entry)
{
blog.Entries.Add(entry);
ctx.Save(blog);
}
We leverage the fact that the blog.Entries collection is a List and just add to
it. Then save will update the record in the DB.
All this works great and is highly performant. But do be careful as not all the
LINQ operations are fully implemented yet in NoRM and some (like join) may never
be added because MongoDB doesn’t support it.
To get started, download MongoDB the tools and server here:
http://www.mongodb.org
You unzip the zip file and run the mongod.exe program. Be sure that you have created
the C:\data\db folder. It appears at first that you have to run MongoDB in a console
window. But you can register it as a Windows Service:
Here’s some helpful advice on installing MongoDB as a Windows Service (there is
a small bug you have to work around):
http://www.deltasdevelopers.com/post/Running-MongoDB-as-a-Windows-Service.aspx
There’s also a management console (and I mean "console"):
It’s a little different. You’ll get used to it. The means of interaction with the
server is through JavaScript rather than T-SQL and the storage format is a binary
form of JSON as you can see.
For a project I’m working on I’ve built a Windows Forms UI that lets me manage the
database easily by just adding an object data source and doing some drag-drop magic
in Visual Studio. Generally I look down upon that sort of development, but for an
admin tool it’s just fine.
Now It’s Your Turn!
Try it out for yourself. Download MongoDB and the NoRM driver and build some apps.
You may also want to check out the source code for my demo app:
Download Sample: RssMongoSample-Kennedy.zip
Got feedback? Write a comment or
contact me on Twitter where I'm @mkennedy or find me in
any of these other ways.
Recommended Reading:
Here are some other blogs on this subject.
Posted in Articles | ASP.NET | NoSQL | Open Source | Talks | Tools | web2.0 |
Saturday, January 30, 2010 9:46:04 PM (Pacific Standard Time, UTC-08:00)
In this article I'm going to give you a simple, step-by-step overview of how to create
a Windows 2008 server image in Amazon's Elastic Cloud Compute (EC2)
infrastructure. Now I must admit I'd rather have found a good tutorial on The Internets
or even in a book. Feel free to send me any I missed. My experience is they are either
dated or about Linux and so on...
First, briefly why does one care about EC2? Well maybe you are buying into the whole
cloud computing story which lets you cheaply out-source your computer hardware for amazingly
cheap prices (staring around $0.20 / hour for a dedicated machine). That's a great reason
and Microsoft and
Google have interesting plays there too.
Personally I just want a simpler way to create virtual machines. We'll
have full admin access over remote desktop to our system to install whatever we want.
I'm putting Visual Studio 2010 Beta on mine to play around with that software without
'polluting' my real system.
Here we go. If you don't delay I suspect this would take you about 20 minutes
from start to login! Subsequent virtual machines are much faster to create
and launch because the can be based on pre-configured images.
1. Create an Account
Register for an Amazon Web Services account at http://aws.amazon.com/.
2. Enable EC2 Features
Enable Elastic Compute Cloud for your AWS account at http://aws.amazon.com/ec2/.
3. Launch a New Instance
Use the AWS Management Console to launch and manage your virtual images:
http://aws.amazon.com/ec2/home.
As the console says, choose "Launch Instance" under the "Getting Started" section. You
will be presented with a list of pre-configured images. We'll start with a stock
Amazon Windows 2008 server image.
4. Choose a Base Image
Now you'll be presented with a list of pre-configured virtual disk images.
This time we'll setup a 64-bit Windows 2008 Server (Data Center Edition).
Just choose "select" out of the list below:
5. Use the Request Instances Wizard
Use the Request Instances Wizard to configure the newly created instance
which includes configuring the security, choosing an encryption key,
opening ports in the firewall, and kicking off the new instance. Below you'll
see the encryption key step - be sure to download the key pair as you'll
need it for retrieving the administrator password.
6. Launch!
Here's what you can expect for the review screen of the Request Instances Wizard.
Press launch and you're almost there.
7. Launching... (AKA Wait 5 Minutes)
After you launch you're instance you'll get a confirmation screen to
show you it's being prepared and allow you to configure durable storage
and IP addresses (both entirely optional).
8. Back to the Management Console
Now if you choose "View your instances..." you'll see that your instance
is being prepared - it has a yellow pending status. This screen doesn't
always refresh on its own so use the refresh button in the upper right
of the console (rather than your browser's refresh button).
9. Running!
After a few minutes your instance with the yellow icon will turn
green and be in the running state. Note that at first this really
means booting up so you can't get to it right away. Give it another
minute or two...
10. Login Part 1: Getting the Credentials
Now you'll want to login. Of course, the system was created with
an administrator account which has a strong password. You'll need to
retrieve that password using the "Instance Actions -> Get Windows
Admin Password" option.
11. Login Part 2: A Little Hasty
You're probably excited to get this thing running and
if you try right away you'll get another message telling you
to be patient and try again in a few minutes. Just keep trying.
12. Login Part 3: Using Private Key
Eventual the new system is up and running and you can get the
password. The first step here is to pass in your encryption key
from the wizard step before.
13. Login Part 4: Administrator Account and Password
Pass in the encryption keys and you'll see the username and password
(don't get excited, I already changed the password!).
14. Login Part 5: Finding the Machine Address
When your instance starts, it'll be given an Internet visible DNS name
that you can use to connect via Remote Desktop. You'll find it in
several places. One of them is highlighted below. Note that this address
changes as you start and stop your instance.
15. Connected!
Now just fire up Remote Desktop, use the Administrator account and password
from step 13 to log in. Now you have full access to your Windows 2008 machine.
You can do with it what you will, install software, start serving web pages,
etc.
16. A Word of Caution
If your intent is to run a web server, then let it run. But if you are just
using this for your own purposes and don't need it when you're not logged in to
the machine, be sure to return to the Management Console and stop the instance. You can
alternatively do that by choosing "Shutdown" instead of logging out of your
Remote Desktop instance.
I hope you found this walk-through helpful. I just learned most of this myself
so I figured I'd blog it and everyone can learn from it.
Cheers!
Michael
Posted in Articles | Tools |
Monday, January 04, 2010 10:49:40 AM (Pacific Standard Time, UTC-08:00)
Rob Barry and Jack Vaughan interviewed me for their article on SearchCloudComputing.com entitled Azure cloud on horizon: The devil is in the data architecture detailsHere's a small excerpt. If you're interested in Windows Azure and Cloud Computing, read on...Microsoft did a good job when they designed Azure, according to Kennedy. "The company encourages you to build scalable reliable systems by basically making it really hard to do the stuff that makes systems unreliable," he said.
There are many developers curious about cloud computing, but most are being rather cautious. Directions on Microsoft's Sanfilippo said he's talked to more developers that are concerned about building on top of their existing work than re-coding everything to work in the cloud.
"There's still an education bit that has to happen about what kind of applications are appropriate for Azure. But I think there's a lot of curiosity about Azure," Kennedy said.
Still, he continued, "I don't know many projects that are betting the bank on Azure yet."
Note: This is a little dated as it was publish in July 2009 - some how I missed the original publication - but it's still an interesting read. Thanks Rob and Jack for the article and conversation.
Posted in Articles | Azure |
Wednesday, December 16, 2009 2:36:23 PM (Pacific Standard Time, UTC-08:00)
I recently wrote another article for DevelopMentor's Developments newsletter (not subscribed yet? see top-right of this page). This one is entitled
10 Features in .NET 4.0 that made Me Smile
Read it on the DevelopMentor website: http://www.develop.com/tenfeaturesdotnet4
I am republishing it below for you all to enjoy on your RSS readers.
Cheers, Michael
10 Features in .NET 4.0 that made Me Smile
I have been reviewing some of our upcoming classes at DevelopMentor this week. One
of those classes, What’s New in .NET 4.0,
left me excited for things to come.
There are a bunch of small but
wonderful features discussed in that class. I thought I’d take this opportunity
to write a few of them up and share the joy. I bet some of them make you smile too.
- The Parallel Extensions for The .NET Framework will be built into mscorlib.dll.
The fact that PFx will be part of the core .NET library says a lot about how much
faith and support it’s getting within Microsoft. BTW, here are some really great
demos for PFx in .NET 4.0.
- PFx introduces a new threading construct: Barrier<T>.
Barrier lets you define
rendezvous points in your code where multiple concurrent operations can automatically
sync-up. Here’s an example.
- Code contracts.
Code contracts allow you to assert truths about your code as if
you are writing a unit test. But these assertions live within your production code
and are both verified by the compiler as well as the runtime. Here’s the original
research project that lead to this feature on Microsoft Research.
- The WPF and Silverlight designers mostly work.
Now this shouldn’t be a point to
make me smile or get excited about, but it is. The pain and suffering around the
Visual Studio support for WPF and Silverlight designers has been so bad that a mostly-working,
and sometimes truly innovative design-time experience within Visual Studio gives
me real hope for these technologies. I’m actually excited about them now.
- Support for the MVVM pattern across both WPF and Silverlight.
Speaking of that XAML
stuff, if you write WPF or Silverlight code and don’t know MVVM, stop reading this
article and learn about it here. I’ll wait.
Ok, now you too should be excited to hear that there is improved support for MVVM
across Silverlight and WPF in a unified way. Smiles baby!
- WF (Windows Workflow Foundation) has an AsyncCodeActivity class.
While WF has traditionally supported both synchronous (activities that execute immediately) and asynchronous activities (for long running senarios where the workflow becomes idle and is unloaded from memory [click here for more details]), there has been an unserved middle ground. If you want to use threading in your activity and allow the workflow to go idle without it being unloaded from memory you were basically out of luck. This is the problem solved by the AsyncCodeActivity. WF 4 now has a
class which has a BeginExecute / EndExecute pair of methods which much more closely
models the regular .NET async design patterns.
- WF has a rehostable designer (really, they mean it this time).
There are some great
uses for giving regular users a WF designer experience with the right granularity
of activities. Now it’s much easier. Here’s an app that rehosts the designer:

- Configuration-free WCF Hosting.
Hosting WCF services is now like hosting ASMX web services
if you like the defaults. Just throw out a service + contract + address and it’s
up and running. That’ll save a bunch of
<system.serviceModel>
configuration goo. Smiles!
- No more *.svc in our RESTful urls in IIS.
With the ASP.NET routing framework
and WCF REST introduced in .NET 3.5, we can create beautiful, expressive Uri’s for
our websites. For example:
http://lookatthiswith.me/watch/intro
But this falls apart
with WCF REST when we host it in IIS. Our service Uri’s look like this:
http://lookatthiswith.me/services/lookieservice.svc/lookup/json/cf7
And now we have this ugly .svc part-way through our Uri! Ick. Well, in .NET 4 that
Uri is much more customizable and the .svc is gone. Smiles!
- ASP.NET MVC has wicked JavaScript support.
JQuery is there by default. That’s awesome.
But there is also a class similar to the Html class (for HTML helpers) called Ajax.
This static class has functions like Ajax.ActionLink and effectively brings the
functionality of UpdatePanel to MVC!
Well there you have it. 10 awesome things in .NET 4 that made me smile this week.
I hope you find some to be welcome additions yourself! If you want to learn more
about .NET 4.0, check out our recorded webcasts here: http://www.develop.com/dotnet4webcasts.
Also have a look at my article from last month
Six Things That’ll Surprise You About
.NET 4.0.
Finally, if you have some training funds laying around, I’d love to spend a week
talking about these ideas with you in our What’s New in .NET 4.0 What’s New in .NET 4.0 class.
Michael Kennedy is an instructor for DevelopMentor where he specializes in core
.NET technologies as well as agile and TDD development methodologies. Keep up
with Michael via his Web site and blog at http://www.michaelckennedy.net or on
Twitter: @mkennedy.
Posted in Articles | DevelopMentor |
Wednesday, November 11, 2009 4:23:00 PM (Pacific Standard Time, UTC-08:00)
I recently wrote an article for DevelopMentor’s Developments entitled “Six Things That’ll Surprise You About .NET 4.0” You can read the entire article (republished just below this introduction) or if you’d rather see it as a quick set of 6 sides, you can see those here: Six Things That’ll Surprise You About .NET 4.0 In this article, we will explore some of the new features of the .NET 4.0 as well as Visual Studio 2010. Some of these features are well-known, but others haven't gotten the press that they deserve. I've combed through .NET 4 to pull out the cool features that maybe didn’t get all the press – but should have. Read on and be pleasantly surprised!
#1 Visual Studio 2010
You may have already known...
code-oriented features are a major focus of the improvements for VS 2010. For example, one style of development where developers sketch out a scenario in code involving a set of classes before they are completely written was painful in VS 2008 (e.g. TDD).
In this style of working, intellisense did all it could to get in your way and the IDE offered little to help move you forward. This gap was filled by 3rd party tools, most notably Resharper (http://www.jetbrains.com/resharper/).
In VS 2010, this capability will be built into the IDE -- can you say "CTRL-." anyone? See the sequence below for details.
(click image for full size view) But did you know...
that VS 2010 was rewritten in WPF and as part of that rewrite now has true multi-monitor support?
Yes, previously you could drag a build output window to a second monitor or the properties window to the side. But the part you really needed to split up, the code and designers, were solidly grounded in the one IDE window.
Not in 2010, you can pull these free and put them on your second, or third monitors. This aids both when working on UIs (you can see the designer and code-behind) as well as when doing TDD (you can see the test code and production code side-by-side).
(click image for full size view) #2 ASP.NET
You may have already known...
ASP.NET MVC is now integrated into VS 2010 and ships as part of .NET 4.0. In case you haven't heard of it, ASP.NET MVC is an alternative to the WebForms model that has been the backbone of ASP.NET for so many years. You can build well-factored, testable, and clean web applications more easily in MVC. for the very first time a true open source project will become and integral and supported part of Visual Studio and .NET? JQuery will now be part of all web projects created by VS 2010 by default. That goes for ASP.NET WebForms, not just MVC projects. In fact, you can even open a support ticket with Microsoft concerning JQuery. You can read Scott Guthrie's original announcement for more details here.
#3 WF 4.0
You may have already known...
that Windows Workflow 4 has been completely rewritten for .NET 4.0. It's not even backwards compatible with .NET 3.5's version of WF. Basically WF 3 was a good try, but suffered from a couple of major problems that could not be overcome by simply refactoring the library. WF 4 has a nice GUI workflow building designer that is part of the VS 2010 tools and moreover that designer is rehostable in your own Windows Forms or WPF applications.
Why might you do this? Consider an application where there is a scriptable aspect that is for non-developer types of users. Instead of giving them a scripting language such as Python, you can build WF activities and provide them with the designer to wire them together. This would give your application essentially a visual programmability.
(click image for full size view)
#4 Base-class Libraries
You may have already known...
.NET 4.0 has threading constructs like Parallel.For which are designed for leveraging parallelism and multi-core hardware in CPU-bound situations. These are a perfect compliment to things like the ThreadPool class which are intended for parallelism when latency in external systems (databases, web services, file IO, etc) is the bottleneck. there will be a new collection namespace called System.Collections.Concurrent?
Here you will find lock-free, thread-safe collections such as ConcurrentQueue . As we move from single core systems, to multi-core systems, and then into many-core systems (say 64 cores) these types of lock free objects will become increasingly important.
#5 WPF
You may have already known...
that the WPF and Silverlight designers in VS 2010 are greatly improved. You now have some features available that were only available in Expression Blend previously (e.g. data-binding). (click image for full size view)
But did you know...
the text rendering stack in WPF 4 has been completely rewritten.
Now text looks as clear in WPF as it does in GDI+ with ClearType enabled. This includes a host of edge cases, such as when the text is re-rendered via a VisualBrush, is used in animations, or even 3D text. Once again Scott Guthrie comes through with a great WPF 4 writeup on his blog.
#6 CLR and Base-class Libraries
You may have already known...
that .NET 4 ships with an entirely new runtime. There have been a number releases of .NET lately (2.0, 3.0, 3.5, 3.5 SP1) but all of these versions of .NET have run on the 2.0 runtime. For the first time since 2005, .NET 4 we'll have a completely updated runtime with new GC modes, side-by-side in-process execution of the 2.0 and 4.0 runtime, the loosening of COM interop rules with the No PIA feature (no Primary Interop Assemblies required). there are new numerical types including BigInteger which supports arbitrarily big integers and ComplexNumber for modeling systems with advanced mathematical formulas which involve the complex number system.
You'll have to look around a bit to find these types however as they are not referenced by default. They are in the new System.Numerics library.
.NET 4 is going to be an exciting release with a some very polished libraries and tools. This article just touched on a few of them.
- Visual Studio 2010 Multi-monitor Support
- JQuery is now part of ASP.NET
- WF 4 Has a Rehostable Designer
- BCL has New Thread-Safe Collections
- WPF has Real Text Support
- CLR and BCL has new Numerical Types
Posted in Articles | DevelopMentor | Visual Studio |
Friday, August 07, 2009 4:40:13 PM (Pacific Standard Time, UTC-08:00)
Avoiding 5 Common Pitfalls in Unit Testing by Llewellyn Falco and Michael Kennedy When I started out with unit tests, I was enthralled with the promise of ease and security that they would bring to my projects. In practice, however, the theory of sustainable software through unit tests started to break down. This difficulty continued to build up, until I finally threw my head back in anger and declared that "Unit Tests have become more trouble than they are worth." So we stopped. Not all once, but over the months our unit tests died a quiet death. When tests would stop working, we just ignored them. When new features were reported, they were developed without unit testing. At first, it seemed great. We were able to move without the baggage of maintaining the old tests! But soon all the original problems of having a system without tests came back to us. Things keep breaking, deadlines were increasingly pushed back. Releases came with an extraordinary amount of stress, late nights & weekends. The final straw came when we were forced to rush out an immediate update, and ended up taking down the company for 2 days straight. Our new motto became: "Unit Testing: you're damned if you do, you're damned if you don't." In the end, we decided that despite the hardship caused by maintaining unit tests, it just wasn't feasible to operate without them. So we started down the road to re-incorporate testing into our software development process. As the months went by, however, we discovered that the hardships we remembered had not returned. Looking back, we realized that we had made many mistakes the first time around. The second time around we were smarter. So you, too, can enjoy the benefits of unit tests here are the 5 major pitfalls we encountered the first time around, and how you can avoid them. Pitfall #1: Tests are hard to maintain. Because tests were only there to service and support the production code, they became second class citizens. We would spend time carefully choosing method names, refactoring our code to keep our classes and methods small, and so on. But we never applied these same principles to our test code. As a side-effect of adding back the old tests, we reviewed and cleaned them up with the same level of scrutiny we gave to our "real" code. Suddenly the tests were easier to maintain. While this should not be a surprise to anyone, it wasn't util this moment that we realized why they had been so hard to maintain in the first place: Our tests were hard to maintain because we weren't maintaining them. Solution: Going forward, we expect the same quality of code (or higher) in the unit tests as we do for our production code. That means - We remove duplication
- We carefully consider method names
- We create convenience functions for testing features
- We keep our methods short
- We code-review our unit tests
| Pitfall #2: Tests are lot of work to write. We found that in order to test even simple things we would have to write lots of code to setup and execute the scenario. Even something simple like "create a new user, and receive welcome email" would turn into 40-50 lines of step by step instructions. Not only was this a pain to write the 1st time, it became a nightmare to maintain. Little changes would mean re-reading those functions to detect if the test was failing because we broke something, or simply because we had changed something. Once that was discovered, we would then have to update the now out of sync test code. The solution we actually found may surprise you. We found that writing out our tests in English and then translating each line into 1 line of code naturally created the appropriate levels of abstraction and readability. For example, let's consider testing the following scenario: Who are you receiving the most email from? - Create you - the user
- Mike sends you 3 emails
- Mary sends you 4 emails
- Joan sends you 2 emails
- Verify your greatest "emailer" is Mary
This will naturally lead us to write the following test method and helper method: [TestFixture]
public class AccountTests
{
private MockMailServer mockMailServer = new MockMailServer();
[Test]
public void WhoAreYouReceivingTheMostEmailFrom()
{
User you = User.CreateNew( "you" );
User mike = SendEmailHelper( CreateUser( "mike" ), you, 3 );
User mary = SendEmailHelper( CreateUser( "mary" ), you, 4 );
User joan = SendEmailHelper( CreateUser( "joan" ), you, 2 );
Assert.AreEqual( mary, you.GetGreatestEmailer() );
}
private User SendEmailHelper(User from, User to, int quantity)
{
for ( int i = 0; i < quantity; i++ )
{
EMail mail = new EMail()
{
To = to,
From = from,
Body = "Sample",
Subject = "test"
};
mail.SetFormat( Formats.Html );
mockMailServer.Send( mail );
}
return from;
}
Notice that to a programmer, the lines of code in the WhoAreYouReceivingTheMostEmailFrom test are as easy to read as the lines of English were. We were naturally motivated to create the "SendEmailHelper" function because that was required by one-to-one correlation between the lines of English and the lines of test code. However, without that helper, our test would have become an unreadable rat's nest. This also naturally removes some duplication, increases maintainability, and allows for some reuse of the test convenience functions. This won't be the only test that requires us to send email; for example, we may want to test that we can find out to whom you sent the most email.
Let's compare this to how our tests would look if we had just hacked out the scenario:
[Test]
public void WhoAreYouReceivingTheMostEmailFrom()
{
User you = User.CreateNew("you");
User mike = CreateUser("mike");
for (int i = 0; i < 3; i++)
{
EMail mail = new EMail{To = you,From = mike, Body = "Sample", Subject = "test"};
mail.SetFormat(Formats.Html);
mockMailServer.Send(mail);
}
User mary = CreateUser("mary");
for (int i = 0; i < 4; i++)
{
EMail mail1 = new EMail{To = you,From = mary,Body = "Sample",Subject = "test"};
mail1.SetFormat(Formats.Html);
mockMailServer.Send(mail1);
}
User joan = CreateUser("joan");
for (int i = 0; i < 6; i++)
{
EMail mail2 = new EMail {To = you, From = joan,Body = "Sample",Subject = "test"};
mail2.SetFormat(Formats.Html);
mockMailServer.Send(mail2);
}
Assert.AreEqual(mary, you.GetGreatestEmailer());
}
Because we wrote the first version in English it's also easier to detect a mistake. You may have noticed that the second example had a typo at line 31, making Joan the biggest emailer. In general long methods have the disadvantage of obscuring intent. Unfortunately the 'follow a script' aspect of testing lends itself to writing long methods. By writing the tests in English and then doing a 1-to-1 conversion to code we can counter this vulnerability.
Write the tests in English before you code them.
Pitfall #3: Adding a new feature breaks a lot of tests that I then need to adjust.
I always dreaded adding new features because I knew it meant the existing tests were going to complain about the changes. It seemed like the tests themselves were resisting change to my system, rather than supporting it. As I made changes and the tests broke, I was always trying to figure out if those changes were "expected" because of the new feature, or unintended bugs I had introduced into my software.
Nowadays, we always prep the system for the new feature. This allows us to isolate 'expected' changes from unintended bugs. Furthermore, once we have finished prepping for the new feature, we find it extremely easy to actually add that new feature. Best of all, if the unit test breaks now, we know it's because of unexpected side effects of our changes.
This 'prepping' period falls under the title of 'refactoring' and requires the simple rule that during this stage you do not change the behavior, only the implementation. This sounds straight forward and simple, but in practice it requires a great deal of discipline.
Personally, I still find it a challenge to NOT fix a bug discovered during refactoring. I have to force myself to leave it alone and wait until I have finished refactoring before changing (Fixing) this behavior, but this discpline has paid off many, many times.
During this period, the support provided by your unit test suite really shines. Those tests allow me to rework my code with confidence. Afterwards the architecture in place has been custom designed to support the addition of this particular new feature, thus making its implementation quite straightforward. In our experience the 'prepping' work tends to actually take more time than we spend adding the actual feature itself, but the total time to implement is much less.
By spliting the work into two phases, we can emphasize the fact that the unit tests are supporting the existing system allowing its architecture to evolve so that extending it does not become increasingly difficult.
Before I would ask myself "How can I add this new feature?" Now I ask "How can I make it so this new feature will be easy to add?"
Prep the system for the new feature first.
Pitfall #4: When I change something a whole bunch of tests break even though I haven't broken the system.
There are many ways to solve the same problem. In the past, we tended to test a specific implementation of a solution instead of testing that we had a solution. Because we were focused on the specifics of implementation, changes to our solutions kept breaking our tests, even though we still had a valid solution. Moreover, because the tests were closely tied to implementation, rediscovering the intent and separating it in the tests became cumbersome. As we became more proficient at writing tests in English, we were able to create unit tests that described the expected behavior. This conveys a higher level of intent, and made the tests much less brittle.
Let's look at an example:
[Test]
public void TestGatewayCallSuccessful()
{
var gateway = new Gateway {Mask = "ExampleCode.*"};
var enviroment = new Dictionary();
enviroment.Add("path", "ExampleCode.HelloWorld");
string result = gateway.ExecuteRequest(enviroment);
Assert.IsTrue(result.Contains("Hello World"));
}
[Test]
public void TestGatewayBlocksInvalidMasks()
{
Assert.IsFalse(Gateway.IsValidForMask(
"Example.*", "ExampleCode.HelloWorld"));
Assert.IsFalse(Gateway.IsValidForMask(
"ExampleCode.*.Extras.*", "ExampleCode.HelloWorld"));
Assert.IsTrue(Gateway.IsValidForMask(
"ExampleCode.*", "ExampleCode.HelloWorld"));
}
In this example, we wrote our second test TestGatewayBlocksInvalidMasks so we could easily test a few examples to make sure our implementation was correct. In doing so we exposed a method IsValidForMask, which was an implementation detail and was only made public in order to make testing easy and intentional. We did this because actually executing something to get the failure was much more involved as evidenced by the first test ( TestGatewayCallSuccessful).
Let's take a look at the specific solution we've come up with:
public class Gateway : IRunner
{
public string Mask { get; set; }
public String ExecuteRequest(Dictionary environment)
{
string path = environment["path"];
AssertValidClass(path);
IRunner instance =
(IRunner)Activator.CreateInstance(Type.GetType(path));
return instance.ExecuteRequest(environment);
}
private void AssertValidClass(string path)
{
if (!IsValidForMask(Mask, path))
{
throw new Exception(String.Format(
"Invalid Path '{0}' for Mask '{1}'", path, Mask));
}
}
internal static bool IsValidForMask(String mask, String path)
{
mask = mask.Replace(".", "\\.").Replace("*", ".*");
Regex regex = new Regex(mask);
return regex.IsMatch(path);
}
}
As we can see, even though we are only creating this gateway once each time a call to ExecuteRequest is made we have to recreate the regex expression (line 31 & 32). It would be nice to do this just once. Let's take a look at a more efficient solution:
public class Gateway : IRunner
{
private string mask;
private Regex regex;
public string Mask
{
get { return mask; }
set
{
mask = value;
regex = new Regex( mask.Replace( ".", "\\." ).Replace( "*", ".*" ) );
}
}
public String ExecuteRequest(Dictionary environment)
{
string path = environment["path"];
AssertValidClass(path);
IRunner instance = (IRunner) Activator.CreateInstance(Type.GetType(path));
return instance.ExecuteRequest(environment);
}
private void AssertValidClass(string path)
{
if (!regex.IsMatch(path))
{
throw new Exception(String.Format("Invalid Path '{0}' for Mask '{1}'", path, Mask));
}
}
}
Unfortunately this refactoring breaks the first set of tests. Notice that the function we are calling no longer even exists. However, let's look at what happens if we write our tests for the behavior rather than the implementation:
[TestFixture]
public class GatewayBehaviorTests
{
[Test]
private void TestGatewayCallSuccessful()
{
string result = Run("ExampleCode.*", "ExampleCode.HelloWorld");
Assert.IsTrue(result.Contains("Hello World"));
}
[Test]
public void TestGatewayBlocksInvalidMasks()
{
AssertValidForMask(false, "Example.*", "ExampleCode.HelloWorld");
AssertValidForMask(false, "ExampleCode.*.Extras.*", "ExampleCode.HelloWorld");
AssertValidForMask(true, "ExampleCode.*", "ExampleCode.HelloWorld");
}
private static String Run(string mask, string path)
{
var gateway = new OptimizedGateway {Mask = mask};
var enviroment = new Dictionary();
enviroment.Add("path", path);
string result = gateway.ExecuteRequest(enviroment);
return result;
}
private void AssertValidForMask(
bool exceptionExpected, string mask, string path)
{
Exception found = null;
try
{
Run(mask, path);
}
catch (Exception ex)
{
found = ex;
}
Assert.AreEqual(exceptionExpected, found == null);
}
}
There are a few things to notice now:
- This test not only works for the new code, but the old code as well. This is because the behavior has not changed, just the implementation.
- We have not sacrificed readability or clarity of intent. In fact the first test TestGatewayCallSuccessful has actually gained readability.
- There is the introduction of helper methods in the unit test. We find that this is a side effect of writing tests for readability and intention.
|
In the end, we realized there is a particular code smell for this problem.
If a different implementation of a solution requires different tests, you are testing to the wrong level.
Pitfall #5: There are just too many possibilities to test
When we first began unit testing, we felt that we had to test as many inputs as possible because we believed the purpose of the unit tests was to ensure complete quality of our code. What we have learned is that the world is not black and white, and neither is testing. It is not the case that we either have verified code or unverified code. There are levels of protection. In fact there is a level at which you get diminishing returns from new inputs and, surprisingly, that number is often very small.
For example: Imagine the following scenarios :
| Scenario 1 |
You have a method
public int doSomething(int a, int b) {/*...*/})
Does this method work?
Will it blow up if I run it?
On a scale of 1-10 what is your confidence level?
|
Confidence Level 2: In our case, our confidence started out at 2. All we know is that it compiled. Any number of things could be wrong.. |
| Scenario 2 |
Now, assume you have an invocation of the method
doSomething(2,3);
When you run this, it does not crash although you have no way to check its result.
What's your confidence level now? |
Confidence Level 6: As soon as it's been executed, our confidence jumps up to a 6. We know that most bugs come from incorrect wiring, or null pointers, and so on. Now we know it's not blowing up, but still don't really know that it's working |
| Scenario 3 |
Now imagine that you have a test
assertEquals(8, doSomething(2,3));
This test passes.
What's your confidence level now?
|
Confidence Level 8: Just a single confirmation pulls us all the way up to a confidence level of 8. Notice that we are still just at 1 test case. A few more and we'll be in the 9's, but how many more cases would you need to say with absolute confidence that this works? (hint: 2^32 * 2^32). |
Tests are like seatbelts: just because they won't guarantee your survival in all crashes, it doesn't mean you shouldn't wear them. Take the extreme case of a motorcycle helmet. You are only protecting a small part of your body, but you are significantly improving the odds of survival if something goes wrong.
A general rule of thumb for the number of cases to tests is " 3 is a big number".
- Test the happy path
- Test an edge case
- Test an error case, if you have one
Start with the happy path. If you still are worried, try an edge case. Wait until a problem presents itself before you test further.
Spend your time where it counts.
Posted in Articles | DevelopMentor | Unit Testing |
Friday, August 07, 2009 12:22:02 PM (Pacific Standard Time, UTC-08:00)
I recently wrote an article for DevelopMentor's Developments newsletter entitled
Building a Twitter Application in .NET.
You can read it at the DevelopMentor website:
http://www.develop.com/twitternetapps
I've republished here for my readers. Enjoy!
Building a Twitter Application in .NET
by Michael Kennedy ( @mkennedy)
http://www.michaelckennedy.net
Twitter has become one of the web's hottest properties. It is a central part of
mainstream news programs such as CNN's Anderson Cooper 360, congressional debates,
and talk shows. In fact, it grew at a rate of 1400% this past year [bit.ly/jG9BG].
If your company wants to interact with your customers in a modern and engaging experience,
you need to be on Twitter. In fact, if you have customers that really like or dislike
you, they are probably talking about you on Twitter. You should be part of that
conversation.
In this article, we will explore how to build a rich interactive experience on Twitter
that goes beyond just creating a new Twitter account. We will build a .NET application
that uses the Twitter API (a free service) alongside other cool technologies such
as the WCF REST Starter Kit [http://bit.ly/v8mBb]
and LINQ to fully leverage the Twitter experience.
Whether you want to build a community around your brand or you want to build the
next real-time, social community website like .NET Developer Buzz [http://dotnet.ubbuzz.com/], this article will cover the
technologies required to get the job done.
If you want to download the sample application to follow along, you can get it here:
http://www.michaelckennedy.com/Samples/SampleStatusUpdater.zip
While you're at it, be sure to follow DevelopMentor on Twitter:
http://twitter.com/dm_the_company
Let's Start Small
I will show you how to fully leverage the Twitter API, but many tasks can be accomplished
using simpler tools and you may be better starting there. Let's look at a few things
that we can do without the Twitter API.
Case 1. You want to display your latest Twitter messages on your website.
Your tweets[1], as they're called, can be consumed as a simple RSS feed. So you
may want to simply pull this feed into your website rather than digging into the
Twitter API and consuming custom XML or JSON formats. An example of consuming Twitter
in this fashion can be found on my website's front page [http://www.michaelckennedy.net/].
To get your RSS feed, just visit your profile page and get the RSS tweet link, e.g.
"RSS feed of mkennedy's tweets" [http://bit.ly/guhZU]
[1] Tweet - these are what the individual messages sent on Twitter are called. If the name sounds weird, I'm sure you'll get used to it. Remember that there was a time when Google was just a noun.
Case 2. You want to watch and manage multiple accounts at the same time.
Most Twitter clients only support a single user. But there are a couple of good
tools that allow you to manage multiple accounts. My current favorite is one called
bDule and you can get for free at
http://www.sobees.com/bdule.
Beyond the Small
There are times when you want to do more than simply syndicate your Twitter stream.
Let me give you an example. At DevelopMentor, we have had many instructors on Twitter
talking about their own interests. But we didn't have a corporate Twitter presence.
We decided to create our corporate presence by pulling all our instructors individual
tweets and rebroadcasting them from our DevelopMentor Twitter account:
@dm_the_company
We wanted to keep a sense of the original instructor who wrote the message, so we
append on an attribution. For example:
"Software Transactional Memory is released! (via @mkennedy)
and we wanted to do this in a flexible way. In short, we needed more functionality
than Twitter provides.
There are actually three services that do this sort of thing and they looked promising.
http://www.connecttweet.com/ http://www.grouptweet.com/ http://cotweet.com/
But in the end, nothing completely matched our requirements. So we decided to write
our own application to publish everyone's tweets under the DevelopMentor banner.
There are few simple steps involved as well as a lot of details we won't go into
yet.
- Take a list of Twitter accounts and download everyone's statuses.
- Determine which messages we haven't seen before.
- Publish these new statuses under our corporate account.
We can actually implement a simplistic version of this by continuing to use the
public RSS feeds of the individual accounts in conjunction with a very handy Twitter
API wrapper called TwitterooCore which you can find at http://rareedge.com/twitteroo/blog/.
The Twitter API
In practice, there is simply data missing from the RSS feed that we require as well
as features missing from the Twitteroo Core that move us deeper into the Twitter
API. One thing you may well miss is the ability to tie together conversations. For
example if Bob says "hello" and Jerry says in reply "@bob Back at you!" Twitter
tracks that Jerry replied to Bob and publishes this link in the stream. To get access
to these types of features and many other optimizations, you'll need to use the
REST-based Twitter API.
The Twitter API is documented at
http://apiwiki.Twitter.com/
There you can do things like get the users tweets, if their tweets are public, by
requesting:
http://twitter.com/statuses/user_timeline.xml?screen_name=mkennedy
What you get back to dependent on the requested format. Here we're asking for XML
(user_timeline.xml) but we can also get JSON, RSS, or ATOM.
Similarly, we can update our status by making a POST request to:
http://twitter.com/statuses/update.xml
Here again we have the four possible formats: XML, JSON, RSS, ATOM. However this
time we're using a form post for the request to update the user status which is
then in-turn returned as XML.
Great, we have this cool REST API based on loosely-typed GETs and POSTs. Should
we program against it using fundamental .NET types such as WebRequest, WebClient,
and similar classes? We could. But WebRequest is so
.NET 1.0 (circa 2001). There is a much newer API on the verge of release from Microsoft:
The REST Starter Kit [http://bit.ly/1IF3Ji].
While this toolkit is generally geared towards building to RESTful WCF services,
there's also a great set of classes for building REST clients. We will use these
classes to write our application. Let's take a look at how we can use the Twitter
API to write are simple application.
The REST Starter Kit
Part 1 - Getting the users of tweets.
We need to download the users messages as XML and convert them to .NET objects that
we can consume our application. This is really straightforward because the REST
Starter Kit as a cool feature to Visual Studio: "Paste XML as Types". This feature
will take an XML file and auto-generate types based on the inferred XML schema.
In our case, the XML file we will use is returned from the user timeline.
http://twitter.com/statuses/user_timeline.xml?screen_name=mkennedy
Open a link in a web browser, choose view source, and copy some XML. Then go to
Visual Studio, choose edit, Paste XML as Types. Of course you must have the REST
Starter Kit installed for this to work.[2]
[2] In my experience I ran into some errors deserializing the response from Twitter.
See the The Real World Intrudes section at the end of
the article if you run into difficulties.
After generating our status related types, we can use the HttpClient class to download
the statuses.
Listing 1.
private static void GetStatuses(IEnumerable userNames)
{
XmlSerializer serl = new XmlSerializer(typeof (statuses));
serl.UnknownElement += delegate { };
foreach (string name in userNames)
{
string url = string.Format(
"http://twitter.com/statuses/user_timeline.xml?screen_name={0}",
name);
HttpClient client = new HttpClient(url);
HttpResponseMessage response = client.Get();
response.EnsureStatusIsSuccessful();
string contents = response.Content.ReadAsString();
Stream stream = new MemoryStream(Encoding.UTF8.GetBytes(contents));
statuses userStatuses = (statuses) serl.Deserialize(stream);
if ( userStatuses.status.Length > 0 )
{
Console.WriteLine( "@{0}'s latest tweet: {1}", name,userStatuses.status[0].text );
}
Console.WriteLine();
}
}
There are two main things happening in this code. We're using the HttpClient class
to download the web content associated with the given user's timeline. Then we're
using the XmlSerializer in conjunction with the auto-generated XML-serializable
types from the "Paste XML as Types" command. That's all there is to it. The REST
Starter Kit does most of the work for us.
Part 2 - Checking for New Messages
Now that we have all the statuses, we need to find the ones that we haven't broadcasted
from our main account and send them along to step 3. We won't go into detail on
how to track that. But you can imagine a simple database that facilitates a LINQ
to SQL query like this:
Listing 2.
public StatusUpdate[] FindRebroadcastableStatuses(StatusUpdate[] updates)
{
return
(from up in updates
let neverPosted = up.User.LastPostBroadcasted == null
let afterPostDate = up.Time > up.User.LastPostBroadcasted
where neverPosted || afterPostDate
select up).ToArray();
}
Part 3 - Publishing the Statuses
Now that we've gathered the statuses of our various users, it's time to rebroadcast
them to our community. We're going back to the Twitter API to update our status.
Again will use the HttpClient class and following the RESTful principle of using
POST to add new items to a given URI. We will do an HTTP POST to our status to add
a new message to the account.
Listing 3.
private void UpdateStatus(string newStatus, long? replyToId)
{
string url = "https://twitter.com/statuses/update.xml";
HttpClient client = new HttpClient( url );
client.TransportSettings.Credentials =
new NetworkCredential( twitterUser, password );
HttpUrlEncodedForm form = new HttpUrlEncodedForm();
form.Add( "status", newStatus );
if ( replyToId != null )
{
form.Add( "in_reply_to_status_id", replyToId.ToString() );
}
HttpContent content = form.CreateHttpContent();
HttpResponseMessage message = client.Post( "", content );
message.EnsureStatusIsSuccessful();
}
This time we create an HTML form using the HttpUrlEncodedForm class. We set the
status field to our new status. If this status as a response to a previous tweet
then we add the in_reply_to_status_id so Twitter knows to add a "in reply to..."
tag to the tweet.
And there you have it. Working with the Twitter REST API is really quite straightforward.
If you use the REST Starter Kit it's downright easy.
The Real World Intrudes
But wait a minute. This is reality and building bulletproof applications is never
that simple. There are at least five significant gotchas you have to address in
practice when working with Twitter and the REST Starter Kit.
1. There will be times when Twitter is unavailable. You have to be ready for crashes
and other types of randomness.
Twitter is one of those sites that can barely handle the traffic it is receiving.
With its 1400% growth, this isn't getting much better. So you must program defensively
and assume that many of your API requests will fail.
2. The Twitter API is a free service. By default, you are limited to a small number
of requests per hour. Many of the limits are around 150 API calls / hour. You may
need to carefully design your application to work within the limits. Some applications
simply need more data than this permits. For example, .NET Dev Buzz [http://dotnet.ubbuzz.com/] has to track thousands of users.
In that case, you will have to get your application white listed with Twitter. You
can do that here:
http://twitter.com/help/request_whitelisting
3. The date-time format used by the Twitter API is not directly parseable in .NET.
The format returned is in the format "Fri Feb 01 18:18:08 +0000 2008". But if we
change this to "Fri Feb 01 +0000 2008 18:18:08" it is parsable. So you might need
to adjust these date-time values.
4. You will get 417 error codes when you try to talk to Twitter using the default
configuration. The fix is not immediately apparent, but it is very simple. So if
you see the following error:
System.ArgumentOutOfRangeException: ExpectationFailed (417) is not one of the following:
OK (200), Created (201), Accepted (202), NonAuthoritativeInformation (203), NoContent
(204), ResetContent (205), PartialContent (206) at Microsoft.Http.HttpMessageExtensions.EnsuRESTatusIs()
Just to set the following property: ServicePointManager.Expect100Continue = false;
5. "Paste XML as Types" doesn't entirely work. For some reason, certain messages
from Twitter are not deserializable to the types generated with this command. My
experience was that I didn't actually care about the data causing the problem. So
I just removed that part of the generated type. You may have to subscribe to the
XmlSerializer error events to prevent exceptions.
Conclusions
I hope you now have a greater appreciation for what you can do with Twitter and how
it can help you build your brand or build engaging applications. We've used the
REST Starter Kit to make it easy to consume the Twitter API. You've even seen some
of the odd things that can go wrong and how to fix them. Don't forget to download
the sample application here:
http://www.michaelckennedy.com/Samples/SampleStatusUpdater.zip
Now get out there and build something cool.
Posted in Articles | DevelopMentor | web2.0 |
Wednesday, May 27, 2009 1:53:18 PM (Pacific Standard Time, UTC-08:00)
I'm a huge fan of ASP.NET Routing. It gained popularity as the part of ASP.NET MVC
which channels requests for a given URL to the right controller action. In a
wise move, Microsoft moved the routing infrastructure out of ASP.NET MVC
and
into its own assembly with the release of .NET 3.5 SP1.
With ASP.NET Routing you can construct search engine optimized and human friendly URLs such as
these:
Here part of the URL ( tag or user) selects the page and part of the URL
( everything or codinghorror) are effectively query parameters to the page.
This is well documented in the ASP.NET MVC
world running on your server - you can't get anything done without it in
MVC. But what about Windows Azure? What if you don't want
ASP.NET MVC? What if you're a traditional type of person and want all the goodness that comes with what is now
called ASP.NET WebForms (aka "normal ASP.NET")?
In this brief post, I'll cover how to use ASP.NET routing and ASP.NET WebForms in Azure.
The sample project can be downloaded if you want to follow along.
Phil Haack has
written a good post on using routing alongside ASP.NET WebForms so I won't cover too much background information.
How does this change for Azure?
The short answer is that it doesn't. If you get routing working for IIS 7 in your web app, you can effectively deploy it to Azure.
But the steps always felt convoluted to me when reading others' write-ups on this. So let's run through converting a Windows Azure Web Role (essentially a "stock" ASP.NET WebForms
app) to use routing in Azure.
First you'll need the Azure SDK and Visual Studio tools:
- Next, create a new solution in Visual Studio by choosing Cloud Service->Web and Worker Cloud Service.
- Add a new Global.asax file to your web role project.
- Add a reference to System.Web.Routing and System.Web.Abstractions in your web role project.
- Define a custom class that derives from IRouteHandler which will map
URL parameters into the HttpContext for use in your pages:
internal class CustomRoute : IRouteHandler
{
public CustomRoute(string virtualPath)
{
VirtualPath = virtualPath;
}
public string VirtualPath { get; private set; }
public IHttpHandler GetHttpHandler(RequestContext requestContext)
{
foreach ( var aux in requestContext.RouteData.Values )
{
HttpContext.Current.Items[aux.Key] = aux.Value;
}
return BuildManager.CreateInstanceFromVirtualPath(
VirtualPath, typeof( Page ) ) as IHttpHandler;
}
}
- Register these routes in the Application_Start method of your Global.asax:
protected void Application_Start(object sender, EventArgs e)
{
RouteTable.Routes.Add( "ShowName",
new Route(
"naming/show/{name}",
new CustomRoute( "~/ShowName.aspx" )
) );
RouteTable.Routes.Add( "CreateAccount",
new Route(
"account/begin",
new CustomRoute( "~/Account.aspx" )
) );
RouteTable.Routes.Add( "Home",
new Route(
"home",
new CustomRoute( "~/Default.aspx" )
) );
}
Now if you run your app, you might expect the routing infrastructure to work. Inside
the ASP.NET Dev Server (aka cassini) this will likely work. But in the Azure Development Fabric you'll
see this:

The problem is you need to tell IIS 7.5 to get out of the way and let the request get to ASP.NET.
-
We'll define a class to short-circuit the IIS validation
class Iis7RoutingHandler : UrlRoutingHandler
{
protected override void VerifyAndProcessRequest(
IHttpHandler httpHandler, HttpContextBase httpContext)
{
}
}
- Modify the web.config by adding a handler and module to the system.webServer section:
...
-
Finally, we need to recover the data passed to the page. For example, in the sample project we have:
route: /naming/show/{name}
example: /naming/show/michael-kennedy
How will our page access the value of name? Recall that our custom route stashes the values in
HttpContext.Current.Items. We'll just pull them back out as follows in our Page_Load method of our ASPX class:
LabelName.Text = (string)HttpContext.Current.Items["name"];
That's it! You can see our routes working in our WebForms app running in Azure (well, technically the screenshot
is the dev fabric - but it works in the cloud as well):
Download the source and try it for yourself: AzureRoutingSample.zip (136 KB)
Posted in Articles | ASP.NET | Azure | DevelopMentor | web2.0 |
Wednesday, April 08, 2009 12:03:23 PM (Pacific Standard Time, UTC-08:00)
I recently wrote an article for DevelopMentor's Developments newsletter entitled Azure Storage. Read it at the
DevelopMentor website here:
http://www.develop.com/content/newsletters/aprilazure
I've republished here for my readers. Enjoy!
Developments: Azure Storage
by Michael Kennedy
[Listen to this article as a podcast: Azure-Storage-Article-Kennedy.mp3]
October 27th 2008, Los Angeles CA - It's 9 AM and Microsoft is
hosting PDC (their most forward looking developer conference).
Ray Ozzie and company are introducing Windows Azure:
A new platform which is their first foray into the nascent
world of large-scale utility computing. This scalable and reliable
platform-as-a-service functionality is commonly referred to as
"Cloud Computing" because it runs somewhere out there on the Internet.
Computing platforms that rival the reliability of the utility grids (e.g. electric and gas) which we daily take for granted have long been the stuff of dreams.
A few companies have realized this dream - Google and Amazon come to mind as a couple of the
rare exceptions who have accomplished this goal. These companies' web properties seem to handle
unbounded amounts of traffic with zero down time. The data centers, redundancies, software
engineering and operations know-how required to make this happen are exceedingly expensive.
Some reports have Google spending over $2.4 billion (that's 2,400 million dollars) on
data centers in 2007 alone.
Prior to large-scale cloud computing efforts (circa 2005), most of us could only dream of such scalability and reliability. Today we have at least three highly reputable companies offering some kind of pay as you go cloud computing platform - Microsoft, Amazon, and Google.
Microsoft's Azure is a new comer to the industry. But for .NET developers, it is not to be ignored. Azure allows you to use your existing skills to build essentially the same .NET applications you are familiar with and "deploy them to the cloud."
These scalable, reliable, and geographically-replicated applications
that run on Azure depend on data of course. Virtually all applications we write will be nothing without their underlying data. But if we simply use the tried and true methods of data storage such as the file system or a (single) database server our data is not all that scalable or reliable. Because we cannot have a scalable and reliable application without data, we need a new mechanism for storing and accessing data from our Azure applications.
Enter Azure Storage
Azure storage is the storage component of the Azure platform. It is actually three data services in one:
- Blob Storage - stores unstructured data essentially as a file, limited to 50 GB of data per blob.
- Table Storage - stores structured data that is somewhat like a
database. For full database capabilities there is a high level feature called Sql Data Services (SDS).
- Queues - provides interprocess communication functionality between various web and worker roles in your hosted services or even applications running outside of Azure. Queues can pass small xml or binary messages - less than 64 kb per message.
In this article, we will cover just the basics of the
three storage services of Windows Azure. I want to give you a sense
for what it's like to program against Azure Storage. At the base level all
access to Azure Storage uses pure REST APIs.
This means that you can access
it from any HTTP enabled platform / language. For example, to download the blob data
called "config.xml" in the container called "settings" for the Azure project "kennedy"
you would simply issue a GET to the Uri:
http://kennedy.blob.core.windows.net/settings/config.xml
To save data in a blob you do HTTP POSTs and PUTs in a similar fashion.
However, real life is full of edge cases, error handling, security, and
serialization which makes the pure HTTP model error prone. Thus, a sample library
serves as the de facto .NET API to Azure Storage and ships with the Azure SDK. It is
called StorageClient and can be found in default installs here:
C:\Program Files\Windows Azure SDK\v1.0\samples\StorageClient
We will examine working with each of the
storage services from the perspective of the StorageClient library -
but keep in mind that ultimately this library is a wrapper around a basic and open RESTful API.
Setting The Stage: The Sample Application
To explore Azure Storage I have written a simple photo sharing distributed application.
These set of applications allow users to upload photos to a photo sharing site.
These photos must be reviewed and approved by moderators of the site. Once approved, the
general public can view and interact with the photos. For a concrete example, you could
imagine writing a distributed version of the wallpaper sharing site
InterfaceLift and
deploying it on Azure in this fashion.
You can download the sample application and follow along if you want to see the full
source code and try it out yourself. Just be sure to start the Development Storage utility
that comes with the Azure SDK before running the application.
Our distributed application consists of three parts.
- The Uploader: A Windows Forms application that lets contributors upload images to the site.
- The Reviewer: A Windows Forms application that lets moderators view image submissions and either approve or reject them.
- The Website: An ASP.NET website for viewing the photos - this is our public facing application.
A typical use case might be as follows (see diagram below).
- We upload a photo submission with our uploader application. The photo is uploaded to Azure blob storage and a message is sent via an Azure Message Queue to all available reviewer applications. Additional information about the submitter is associated with the photo in Azure table storage.
- The reviewer application watches the message queue for new messages. When one arrives, the photo is added to a list of pending submissions. The reviewer can either reject (delete) the submission or approve it - move it to a permanent blob storage location where it will be publicly viewable.
- Users visit our website and can view all approved photos. This list will change in real-time because it is driven by the reviewer application. The web application simply pulls all photos from the approved photo container in Azure blob storage.
Saving Data: Creating Azure Blobs
To save data to Azure Blob Storage, you must realize blob storage follows the ACE pattern (Authority,
Container, Entity) to describe a blob. Authority is simply your Azure solution name. Containers are analogous to folders. And entities are analogous to files.
The listing below is essentially the code that runs when the uploader application uploads a pending image submission to blob storage.
Listing 1:

Sending Notifications: Azure Queuing
In addition to uploading the image to the pending images container in blob storage,
we will send a message to a message queue to notify any active or future reviewers of the new submission.
Listing 2.

Saving (More) Data: Structured Storage and Azure Tables
Finally, for the upload application, we must also save some information about the contributor.
In Azure Storage we have two reasonable places to store this information.
First, we could save this information directly in blob storage as meta-data
associated with the blob itself. This is straightforward and easy. But there is a big limitation: information in this meta-data is not queryable. Suppose I want get all images associated with a single contributor. There is no way in Azure Blob Storage to say give me all the blobs with this filter on the meta-data. You would have to pull the properties of every blob and do the comparison client-side. That's tantamount to filling a DataSet with "SELECT * FROM PendingImages" and it's a bad idea.
Instead we will use the third type of Azure Storage: Azure Table Storage.
Table Storage allows us to store data with up to 256 properties and
query this data as if it were a database. It is exactly what we need for the contributor information.
However, you must realize this is not a database. A better mental picture is a durable collection of
Dictionary object (as in Dictionary from System.Collections.Generics) with querying built
on top. I say this because there is no schema or relational constructs in Azure Table Storage. If you
need that, then you'll want Sql Data Services - a service on top of the core Azure platform.
The code to add an entry to Azure Table Storage does not fit into a single method as it's driven through the interaction of several classes we must define. Azure Table Storage can be accessed via ADO.NET Data Services (client-side) and this is the method we will use.
First we'll define a client-side schema for our entry by creating a class called Contributor which derives from the class TableStorageEntity (from the StorageClient library).
Listing 3.

Additionally we must define the tables and queries available to ADO.NET Data Services by created a class derived from TableStorageDataServiceContext and we do that below. We simply have one table called Contributors.
Listing 4.

With those two items in place, we can insert a "row" into Azure Table Storage as follows:
Listing 5.

As for querying Azure Table Storage that is very straight-forward. Because we are using ADO.NET Data Services, querying can be done via LINQ as in "from c in svc.Contributors select c.Name". Ultimately ADO.NET Data Services is also built on a RESTful API so this translates to the underlying HTTP REST calls. Alternatively, you can use that REST API directly from .NET or any other platform.
Waiting on Queues: The Reviewer Application's Code
Next, let's look at how we monitor and pull messages from Azure Queuing. Ultimately we must poll the queue using a RESTful HTTP request.
But the StorageClient resurfaces this to us as simple events.
Listing 6.

We won't cover how we move a blob from the pendingImages blob container to the approvedImages blob container which happens when a reviewer approves an image.
You can look at the sample to see how that is done.
Ultimately It's About the Website
Finally, let's look at the web application that actually displays the approved images. We don't do anything fancy such as paging or error handling that you'd see in a real application. But this will give you a good idea how to work with the blob data as a collection.
Here we'll create a BlobStorage object and access the BlobContainer approvedImageContainer as we have been in most of the listings. But then instead of saving or reading blobs, we use the ListBlobs method to simply list all the approve images in that container. In order to show the images on our webpage, we just use the BlobProperties.Uri and directly reference that in our HTML. Our ASP.NET application does not touch the data. Rather the consumers (IE, Firefox, Chrome, etc) of the HTML pull the image data directly from blob storage as they would from any web server.
Listing 7.

Now you have a good idea of the concepts and motivation behind Azure Storage. You have seen some typical usages of each of the three storage features: blob storage, table storage, and queuing. Our samples made use of the sample storage API library called StorageClient. Underlying this library we saw that Azure Storage is entirely accessed via RESTful APIs.
Want to get started? Visit http://www.azure.com and choose "Try It Now" to register for a CTP Azure account. You'll need to download the various SDK's listed on that same page. They will install the Visual Studio projects required for working with Azure as well as the Development Storage and Development Fabric so you can develop and debug your applications before deploying them to the cloud.
If you want some intensive, expert-lead training on Azure and associated .NET 4.0 topics be sure to contact DevelopMentor. Or call 800.699.1932 to find out what classes we have available today.
Posted in Articles | ASP.NET | Azure | DevelopMentor | Talks |
Wednesday, December 24, 2008 9:18:04 AM (Pacific Standard Time, UTC-08:00)

I'm pleased to announce that MSDN Magazine just published my Windows Workflow article entitled "ASP.NET WORKFLOW: Web Apps That Support Long-Running Operations"I hope you find it useful and interesting.
Posted in Articles | Books | DevelopMentor | Visual Studio |
Just a site note: I'm doing my part to rid the world of IE 6. Visit this site with IE 6 and you'll
get a shameful message telling you to "Stop Living in the Past".
|