Many modern systems require close to 100% up time, and many of them also deal with large amounts of data. The common practice today is to do this by employing SOA methodologies to build a distributed system, read: have different software handle different roles of your system, and have each of those roles handled by multiple instances. This is how we deal with distributing the load, and coping with server or network failures which are bound to happen.
But building a distributed system can be a daunting task, especially if you are the one who is going to maintain and debug it after going live. Since communication between services in such systems is done using messaging, usually using queuing systems, it is very easy to lose track of what happened in the past or what's going on right now. And if an exception was thrown somewhere, it is many times almost impossible to know about it let alone find it and fix it.
I'm happy to have had the opportunity to work with the great people at Particular Software, the creators of NServiceBus, on a software platform with the sole purpose of changing how we approach building a distributed system. With this platform you can create and maintain a distributed system of practically any size with ease, and monitoring and debugging has never been so fun.
The platform is built on top of NServiceBus, which is also the piece that's orchestrating whatever is happening under the hood. The good news is the platform makes it so simple to use, that you don't actually need to know how to use NServiceBus in order to build something with it.
Here's a quick look at what it offers, and why I think it is going to be a game changer in the .NET space.
ServiceMatrix - a visual editor for distributed systems
We are used to having a WYSIWYG type editors for writing content, but Particular took it one step further. Using a tool called ServiceMatrix, you can now create and edit your distributed system using a visual tool, that comes as a plugin to Visual Studio.
As can be seen in the image - the canvas in the middle is where the magic happens. You can create new services, name them according to what they are supposed to do, and then deploy them to endpoints (NSB Host to let them run as a Windows service later, or ASP.NET MVC to let it run as a website).
You can then visually define Events and Commands, and specify which service publishes or sends them (respectively), or what events it needs to subscribe to. The canvas gives a clear overview of what's going on at all times, and the Solution Builder pane to the left does a good job of summarizing all that nicely.
Once you finished drawing your system on the canvas, and all services have been deployed to endpoints, you can go ahead and run it by pressing F5 and your distributed system will work without you having to write a single line of code.
Now, obviously for your system to do what you actually want it to you'd have to write some code, and you will find all the generated code ready for you to edit in the various projects and classes that have been created (Solution Explorer, right pane in the image above).
Inspecting the message streams with ServiceInsight
With ServiceInsight you can inspect the stream of messages going through the entire system and sort through them using various criteria. When viewing a message, you get a nice graph showing its life cycle - which endpoint generated it, whether it is an Event or Command, and if it was sent as a response to another event you will see the entire story line with timing and so on. This allows you to get a good idea also of the context of a single message.
Whenever an endpoint had issues processing a message and an exception was thrown, ServiceInsight will show this clearly and allow you to inspect and retry. The entire exception will be shown, including full stack-trace with line numbers, so this makes it super easy to debug distributed systems of practically any size:
You can search messages using full-text search on the message body and also using various message properties like processing time or involved endpoint names. The search feature is also effective on the archive of messages, not just the real-time stream. Messages by default are archived for 30 days.
Real-time monitoring with ServicePulse
Monitoring a live distributed system is mainly about making sure all endpoints are functional, and getting alerts when something is not right. This is what ServicePulse is all about.
As you can see, the ServicePulse dashboard gives you a nice overview of the current state of the system. Any failed messages will be shown here (in the screenshot above we see the notification about the failed message we investigated with ServiceInsight moments ago), and any known endpoints that have failed to react to a heartbeat check will also pop up here. You can take immediate actions from there as well by retrying the failed messages manually or opening it in context in ServiceInsight.
You get this dashboard with no additional cost in terms of complexity - ServicePulse integrates with the platform to figure out on its own when something went wrong, and notifies you about it with all the details you need to resolve it. If an endpoint went down and back again, ServicePulse will calm down automatically as well.
ServicePulse, being the real-time monitoring tool that it is, also supports reporting on "Custom Checks". You can configure any endpoint you create using the platform to make a custom check (for example: availability of a website, a network location or files on disk) and report failure when the check fails. Any failures reported by any of the Custom Check modules will be reported on ServicePulse for you to take immediate action.
Summing it up
Being able to create distributed systems from scratch in minutes, and then monitor and debug it from one centralized tool is really a game changer. No more endlessly reading up logs or drawing on boards scratching our heads to try and figure out what happened. That is assuming we know something bad has happened.
I've had a great time working with the team in Particular building this platform, and I truly think this is a great set of tools anyone needs to at least play with. Give it a spin by installing the platform and following the guides. This one on ServiceMatrix. is a great place to start.