ScriptCs Templating Support


If you havn’t yet heard about ScriptCs, it is not too late. Go here 

I just checked in a ScriptCs Templating module to integrate Razor and StringTemplate transformations in ScriptCs workflow. ScrtipCs Templating module can apply a Razor or StringTemplate (ST4) template on top of one or more model files (normally an xml file or json file), for scenarios like code generation or templating. Example below.

The first bits are

Installing template module

  • Install scriptcs. You need to install the nightly build using Chocoloately - cinst scriptcs -pre -source (Refer if you don't understand this)
  • Clone the repo at or download the source code from there.
  • Open the solution and build it in Visual Studio.
  • From the command line create a package: nuget pack -version 0.1.0-alpha
  • In VS, edit your Nuget package sources (Tools->Library Package Manager->Settings) and add the folder where the package lives locally.
  • Install the template package globally: scriptcs -install ScriptCs.Engine.Templating -g -pre

Rendering a template

  • Note that what ever you pass after -- will goto the module as it's arguments.
  • Run scriptcs with your template, file specifying our template module using -modules switch: scriptcs mytemplate.cst -loglevel debug -modules template - The -modules template argument let scriptcs load our template module.
  • Check the log output for details.
  • You can specify the output file using the -out switch -scriptcs mytemplate.cst -modules template -- -out result.txt (The parameters after -- are the template module parameters according to ScriptCs convention)

Rendering a template Using Models

Template module automagically converts xml files/urls and json files/urls to dynamic models that can be used from your template. Technically, it creates a C# fleuent dynamic object that wraps the xml/json.

Quick example: Create a new folder, and create a model.xml file inside that.

<class name="MyClass1">
  <property name="MyProperty1" type="string"/>
  <property name="MyProperty2" type="string"/>
  <class name="MyClass2">
    <property name="MyProperty1" type="string"/>
    <property name="MyProperty2" type="string"/>

Now, create your template, and save it as template.cst - Let us use razor syntax.

@model dynamic

@foreach(var item in Model["class"])
   <p> is a class name</p>  

Now, you can run the transformation by specifying the model file, like this

scriptcs template.cst -modules template -- -xml model.xml -out result.txt

Let us rewrite the template to generate the class and properties from our XML model file

@model dynamic

@foreach(var c in Model["class"])
   @:class {

   foreach(var p in c["property"])
       @:public @p.type {get;set;}


Regenerate the result file and see.

Transforming multiple model files

You may specify multiple model files

scriptcs template.cst -modules template -- -xml model1.xml -out result1.txt -xml model2.xml -out result2.txt

You may also use Json files as the model

scriptcs template.cst -modules template -- -json model.json -out result.cs

Accessing models from templates

For converting XML files/data to a dynamic model object that'll be accessed from templates, ElasticObject is used. Refer

For converting Json files/data to a dynamic model object that'll be accessed from templates, DynamicJsonConverter is used. Refer

Template module command line

Example usages:

  • Render using template mytemplate.cst and ElasticObject dynamic model from model1.xml using Razor (Razor is default)
    •  scriptcs mytemplate.cst -modules template -- -xml model1.xml -out result1.txt
  • Do the same as above, but using vb as the template language
    • scriptcs mytemplate.vbt -modules template -- -vb -xml model1.xml -out result1.txt
  • Render using template mytemplate.st4 - Using StringTemplate instead of Razor
    • scriptcs mytemplate.st4 -modules template -- -st4 -xml model1.xml -out result1.txt

Parameter meaning:

Additional References:

More tests need to be added, and St4 support is a bit untested. Happy Coding.

CakeRobot - A Gesture Driven Robot That Follows Your Hand Movements Using Arduino, C# and Kinect

Over the last few weekends I’ve spend some time building a simple robot that can be controlled using Kinect. You can see it in action below.

Ever since I read this Cisco paper that mentions Internet Of Things will crate a whooping 14.4 Trillion $ at stake, I revamped my interests in hobby electronics and started hacking with DIY boards like Arduino and Raspberry Pi. That turned out to be fun, and ended up with the robot. This post provides you the general steps, and the github code may help you build your own.

Even if you don’t have a Kinect for the controller, you can easily put together a controller using your phone (Windows Phone/Android/iOS) as we are using blue tooth to communicate with the controller and the robot. Thanks to my younger kid for the background screaming audio effect in the below video, she was running behind the bot to kick it.

Now, here is a quick start guide to build your own. We’ve an app running in the laptop that is communicating to the Robot via blue tooth in this case that pumps the commands based on input from Kinect, you could easily build a phone UI as well.

And if you already got the idea, here is the code – You may read further to build the hardware part.

1 – Build familiarity

You need to build some familiarity with Arduino and/or Netduino – In this example I’ll be using Arduino.


Explore Arduino. The best way is to

Mainly you need to understand the Pins in the Arduino board. You can write simple programs with the Arduino IDE (try the Blink sample to blink an LED in IDE File->Samples). PFB the pins description, from SparkFun website.

    • GND (3): Short for ‘Ground’. There are several GND pins on the Arduino, any of which can be used to ground your circuit.
    • 5V (4) & 3.3V (5): The 5V pin supplies 5 volts of power, and the 3.3V pin supplies 3.3 volts of power. Most of the simple components used with the Arduino run happily off of 5 or 3.3 volts. If you’re not sure, take a look at Spark Fun’s datasheet tutorial then look up the datasheet for your part.
    • Analog (6): The area of pins under the ‘Analog In’ label (A0 through A5 on the UNO) are Analog In pins. These pins can read the signal from an analog sensor (like a temperature sensor) and convert it into a digital value that we can read.
    • Digital (7): Across from the analog pins are the digital pins (0 through 13 on the UNO). These pins can be used for both digital input (like telling if a button is pushed) and digital output (like powering an LED).
    • PWM (8): You may have noticed the tilde (~) next to some of the digital pins (3, 5, 6, 9, 10, and 11 on the UNO). These pins act as normal digital pins, but can also be used for something called Pulse-Width Modulation (PWM). We have a tutorial on PWM, but for now, think of these pins as being able to simulate analog output (like fading an LED in and out).
    • AREF (9): Stands for Analog Reference. Most of the time you can leave this pin alone. It is sometimes used to set an external reference voltage (between 0 and 5 Volts) as the upper limit for the analog input pins.




2 – Get The Components

Again, you could find an online store to buy these components. Also, you could try the nearby local electronic store and buy some bread boards, jumper wires (get some mail to mail & female to female wires) etc as well. Here is the list of components you need to build CakeRobot.

  • A Chassis – I used Dagu Magician Chassis  – From Spark Fun, Rhydolabz – it comes with two stepper motors that can be controlled by our driver board.
  • A Arduino board with Motor Driver – I used Dagu Mini Motor Driver – bought from Rhydolabz in India. For other countries you need to search and find. Some description about the board can be found here – It also has a special slot to plug in dagu blue tooth shield. You could also use Micro Magician
    • You also need a Micro USB cable to connect your PC to the motor driver to upload code.
  • A Bluetooth Shield – Get the Dagu blue tooth module if you can find it. I’ve purchased a Class 2 Rn 42 blue tooth shield From Rhydolabz
  • Few mini modular bread boards
  • Jumper wires – Get a mixed pack with M/M, F/F, M/F – like this one
  • A Tiny blue tooth dongle for your PC/Laptop, like this one, to communicate with the blue tooth shield in the robot (if you don’t have built in blue tooth)
  • Few sensors if you want to have more fun – I had an ultra sonic distance sensor to avoid collisions in my final version. A better alternative is Ping sensor.
  • A battery pack and battery holder that should supply around 6V to the Mini Motor Driver
  • Other components for your later creativity/exploration
    • LEDs
    • Resistors
    • More Sensors
  • Tools
    • Few star screw drivers
    • Duct tapes/rubber bands (yea, we are prototyping so no soldering as of now)

And I found these reads pretty useful


3 – Programming the components

You need to spend some hours figuring out how to program each of the components.

  • To start with, play with Arduino a bit, connecting LEDs, switches etc. Then, understand a bit about programming the Digital and Analog pins. Play with the examples
  • Try programming the ultrasonic sensor if you’ve one using your Arduino, using serial sockets. If you are using Ping sensor, check out this
  • Try programming the blue tooth module (Code I used for the distance sensor and blue tooth module are in my examples below, but it’ll be cool if you can figure things out yourself).


4 – Put the components together

Assemble the Dagu Magician Chassis, and place/screw/mount the mini motor driver and Bluetooth module on top of the same. Connect the components using jumper wires/plugin as required. A high level schematic below.



Here is a low resolution snap of mine, from top.



5 – Coding the Arduino Mini Driver

You can explore the full code in the Github repo  - How ever, here are few pointers. According to the Dagu Arduino Mini driver spec, the following digital pins can be used to control the motors

  • D9 is left motor speed
  • D7 is left motor direction
  • D10 is right motor speed
  • D8 is right motor direction

To make a motor move, first we need to set the direction by doing a digitalWrite of HIGH or LOW (for Forward/Reverse) to the direction pin. Next set the motor speed by doing an analogWrite of 0~255 to the speed pin. 0 is stopped and 255 is full throttle.

In the Arduino code, we are initiating communication via blue tooth, to accept commands as strings. For example, speedl 100 will set the left motor speed to 100, and speedr 100 will set the right motor speed to 100. Relevant code below.

        //Setting up the communication with Bluetooth shield over serial 

        Serial.begin(115200);  // rn42 bt


      //Read the input In getSerialLine (shortened for brevity)

      while(serialIn != '\n')
		if (!(Serial.available() > 0)) 

		serialIn =;
		if (serialIn!='\n') {
			char a = char(serialIn);
			strReceived += a;


        //Process the command (shortened for brevity)
	else if (command=="speedl")
		val=getValue(input,' ',1).toInt();
	else if (command=="speedr")
		val=getValue(input,' ',1).toInt();



Have a look at the full code of the quick Arduino client here. Then, compile and upload the code to your mini driver board.

6 – Coding the Controller & Kinect

Essentially, what we are doing is just tracking the Skeletal frame, and calculating the distance of your hand from your hip to provide the direction and speed for the motors. Skeletal tracking details here

We are leveraging for identifying the Blue tooth shield to send the commands. Please ensure your blue tooth shield is paired with your PC/Laptop/Phone – you can normally do that by clicking the blue tooth icon in system tray in Windows, and clicking Add Device.

       //For each 600 ms, send a new command
       //_btCon is our instance variable for a blue tooth connection, built over the cool 32Feet library
       internal void ProcessCommand(Skeleton skeleton)

            var now = DateTime.Now;
            if (now.Subtract(_prevTime).TotalMilliseconds < 600)

            _prevTime = DateTime.Now;

            Joint handRight = skeleton.Joints[JointType.HandRight];
            Joint handLeft = skeleton.Joints[JointType.HandLeft];
            Joint shoulderRight = skeleton.Joints[JointType.ShoulderRight];
            Joint shoulderLeft = skeleton.Joints[JointType.ShoulderLeft];
            Joint hipLeft = skeleton.Joints[JointType.HipLeft];
            Joint hipRight = skeleton.Joints[JointType.HipRight];
            Joint kneeLeft = skeleton.Joints[JointType.KneeLeft];

            if (handRight.Position.Y < hipRight.Position.Y)
                _btCon.SetSpeed(Motor.Left, 0);

            if (handLeft.Position.Y < hipLeft.Position.Y)
                _btCon.SetSpeed(Motor.Right, 0);

            if (handRight.Position.Y > hipRight.Position.Y)
                var speed = (handRight.Position.Y - hipRight.Position.Y) * 200;
                if (speed > 230) speed = 230;
                _btCon.SetSpeed(Motor.Left, (int)speed);

            if (handLeft.Position.Y > hipLeft.Position.Y)
                var speed = (handLeft.Position.Y - hipLeft.Position.Y) * 200;
                if (speed > 230) speed = 230;
                _btCon.SetSpeed(Motor.Right, (int)speed);


And so, it sets the speed based on your hand movements. Explore the ConnectionHelper and BluetoothConnector classes I wrote.


The code is here in Github. Fork it and play with it, and expand it.

Reactive Extensions Or Rx (More On IEnumerable, IQueryable, IObservable and IQbservable) - Awesome Libraries For C# Developers #2

In my last post – we had a look at Interactive Extensions. In this post, we’ll do a recap of Reactive Extensions and LINQ to Event streams.

imageReactive Extensions are out there in the wild for some time, and I had a series about Reactive Extensions few years back. How ever, after my last post on Interactive Extensions, I thought we should discuss Reactive extensions in a bit more detail. Also, in this post we’ll touch IQbservables – the most mysteriously named thing/interface in the world, may be after Higgs Boson. Push and Pull sequences are everywhere – and now with the devices on one end and the cloud at the other end, most of the data transactions happen via push/pull sequences. Hence, it is essential to grab the basic concepts regarding the programming models around them.

First Things First

Let us take a step back and discuss IEnumerable and IQueryable first, before discussing further about Reactive IObservable and IQbservable (Qbservables = Queryable Observables – Oh yea, funny name).


As you may be aware, the IEnumerable model can be viewed as a pull operation. You are getting an enumerator, and then you iterate the collection by moving forward using MoveNext on a set of items till you reach the final item. And Pull models are useful when the environment is requesting data from an external source. To cover some basics - IEnumerable has a GetEnumerator method which returns an enumerator with a MoveNext() method and a Current property. Offline tip - A C# for each statement can iterate on any dumb thing that can return a GetEnumerator.  Anyway, here is what the non generic version of IEnumerable looks like.

public interface IEnumerable
    IEnumerator GetEnumerator();

public interface IEnumerator
    Object Current {get;}
    bool MoveNext();
    void Reset();

Now, LINQ defines a set of operators as extension methods, on top of the generic version of IEnumerable – i.e,  IEnumerable<T>  - So by leveraging the type inference support for Generic Methods, you can invoke these methods on any IEnumerable with out specifying the type. I.e, you could say someStringArray.Count() instead of someStringArray.Count<String>(). You can explore Enumerable class to find these static extensions.

The actual query operators in this case (like Where, Count etc) with related expressions are compiled to IL, and they operate in process much like any IL code is executed by CLR. From an implementation point of view, the parameters of LINQ clauses like Where is a lambda expression (As you may be already knowing, the from.. select is just Syntax sugar that gets expanded to extension methods of IEnumerable<T>), and in most cases a delegate like Func<T,..> can represent an expression from an in memory perspective. But what if you want apply query operators on items sitting some where else? For example, how to apply LINQ operators on top of a set of data rows stored in a table in a database that may be in the cloud, instead of an in memory collection that is an IEnumerable<T>? That is exactly what IQueryable<T> is for.


IQueryable<T> is an IEnumerable<T> (It inherits from IEnumerable<T>) and it points to a query expression that can be executed in a remote world. The LINQ operators for querying objects of type IQueryable<T> are defined in Queryable class, and returns Expression<Func<T..>> when you apply them on an IQueryable<T>, which is a System.Linq.Expressions.Expression (you can read about expression trees here). This will be translated to the remote world (say a SQL system) via a query provider. So, essentially, IQueryable concrete implementations points to a query expression and a Query Provider – it is the job of Query Provider to translate the query expression to the query language of the remote world where it gets executed. From an implementation point of view, the parameters you pass for LINQ that is applied on an IQueryable is assigned to an Expression<T,..> instead. Expression trees in .NET provides a way to represent code as data or kind of Abstract Syntax Trees. Later, the query provider will walk through this to construct an equivalent query in the remote world.

    public interface IQueryable : IEnumerable {       
        Type ElementType { get; }
        Expression Expression { get; }
        IQueryProvider Provider { get; }
    public interface IQueryable<T> : IEnumerable<T>, IQueryable, IEnumerable {

For example, in LINQ to Entity Framework or LINQ to SQL, the query provider will convert the expressions to SQL and hand it over to the database server. You can even view the translation to the target query language (SQL), just by looking at the  Or in short, the LINQ query operators you apply on IQueryable will be used to build an expression tree, and this will be translated by the query provider to build and execute a query in a remote world. Read this article if you are not clear about how an expression trees are built using Expression<T> from Lambdas. 

Reactive Extensions

So, now let us get into the anatomy and philosophy of observables.

IObservable <T>

As we discussed, objects of type IEnumerable<T>  are pull sequences. But then, in real world, at times we push things as well – not just pull. (Health Alert – when you do both together, make sure you do it safe). In  a lot of scenarios, push pattern makes a lot of sense – for example, instead of you waiting in a queue infinitely day and night with your neighbors in front of the local post office to collect snail mails, the post office agent will just push you the mails to your home when they arrive.

Now, one of the cool things about push and pull sequences are, they are duals. This also means, IObservable<T> is a dual of IEnumerable<T> – See the code below. So, to keep the story short, the dual interface of IEnumerable, derived using the Categorical Duality is IObservable. The story goes like some members in Erik’s team (he was with Microsoft then) had a well deserved temporal meglomaniac hyperactive spike when they discovered this duality. Here is a beautiful paper from Erik on that if you are more interested – A brief summary of Erik’s paper is below.

//Generic version of IEnumerable, ignoring the non generic IEnumerable base

interface IEnumerable<out T>
	IEnumerator<T> GetEnumerator();

interface IEnumerator<out T>: IDisposable
	bool MoveNext(); // throws Exception
	T Current { get; } 

//Its dual IObservable

interface IObservable<out T>
	IDisposable Subscribe(IObserver<T> observer);

interface IObserver<in T>
	void OnCompleted(bool done);
	void OnError(Exception exception);
	T OnNext { set; } 

Surprisingly, the IObservable implementation looks like the Observer pattern.

Now, LINQ operators are cool. They are very expressive, and provide an abstraction to query things. So the crazy guys in the Reactive Team thought they should take LINQ to work against event streams. Event streams are in fact push sequences, instead of pull sequences. So, they built IObservable. IObservable fabric lets you write LINQ operators on top of push sequences like event streams, much like the same way you query IEnumerable<T>.  The LINQ operators for an object of type IObservable<T> are defined in Observable class. So, how will you implement a LINQ operator, like where, on an observer to do some filtering? Here is a simple example of the filter operator Where for an IEnumerable and an IObservable (simplified for comparison). In the case of IEnumerable, you dispose the enumerator when we are done with traversing.

 //Where for IEnumerable

        static IEnumerable<T> Where<T>(IEnumerable<T> source, Func<T, bool> predicate)
            // foreach(var element in source)
            //   if (predicate(element))
            //        yield return element;
            using (var enumerator = source.GetEnumerator())
                while (enumerator.MoveNext())
                    var value= enumerator.Current;
                    if (predicate(value))
                        yield return value;

//Where for IObservable

        static  IObservable<T> Where<T>(this IObserver<T> source, Func<T, bool> predicate)
           return Observable.Create<T>(observer =>
                   return source.Subscribe(Observer.Create<T>(value =>
                               if (predicate(value)) observer.OnNext(value);
                           catch (Exception e)

Now, look at the IObservable’s Where implementation. In this case, we return the IDisposable handle to an Observable so that we can dispose it to stop  subscription. For filtering, we are simply creating an inner observable that we are subscribing to the source to apply our filtering logic inside that - and then creating another top level observable that subscribes to the inner observable we created. Now, you can have any concrete implementation for IObservable<T> that wraps an event source, and then you can query that using Where!! Cool. Observable class in Reactive extensions has few helper methods to create observables from events, like FromEvent. Let us create an observable, and query the events now. Fortunately, the Rx Team already has the entire implementation of Observables and related Query operators so that we don’t end up in writing customer query operators like this.

You can do a nuget for install-package Rx-Main  to install Rx, and try out this example that shows event filtering.

  //Let us print all ticks between 5 seconds and 20 seconds
            //Interval in milli seconds
            var timer = new Timer() { Interval = 1000 };

            //Create our event stream which is an Observable
            var eventStream = Observable.FromEventPattern<ElapsedEventArgs>(timer, "Elapsed");
            var nowTime = DateTime.Now;

            //Same as eventStream.Where(item => ...);

            var filteredEvents = from e in eventStream
                                 let time = e.EventArgs.SignalTime
                                     time > nowTime.AddSeconds(5) &&
                                     time < nowTime.AddSeconds(20)
                                 select e;

            //Subscribe to our observable
            filteredEvents.Subscribe(t => Console.WriteLine(DateTime.Now));

            Console.WriteLine("Let us wait..");
            //Dispose filteredEvents explicitly if you want

Obviously, in the above example, we could’ve used Observable.Timer – but I just wanted to show how to wrap an external event source with observables. Similarly, you can wrap your Mouse Events or WPF events.  You can explore more about Rx and observables, and few applications here. Let us move on now to IQbservables.


Now, let us  focus on IQbservable<T>. IQbservable<T> is the counterpart to IObserver<T> to represent a query on push sequences/event sources as an expression, much like IQueryable<T> is the counterpart of IEnumerable<T>. So, what exactly this means?  If you inspect IQbservable, you can see that

public interface IQbservable<out T> : IQbservable, IObservable<T>

    public interface IQbservable
        Type ElementType { get; }
        Expression Expression { get; }
        IQbservableProvider Provider { get; }

You can see that it has an Expression property to represent the LINQ to Observable query much like how IQueryable had an Expression to represent the AST of a LINQ query. The IQbservableProvider is responsible for translating the expression to the language of a remote event source (may be a stream server in the cloud).


This post is a very high level summary of Rx Extensions, and here is an awesome talk from Bart De Smet that you cannot miss.

And let me take the liberty of embedding the drawing created by Charles that is a concrete representation of the abstract drawing Bart did in the white board. This is the summary of this post.

representation of the three dimensional graph of Rx's computational fabric

We’ll discuss more practical scenarios where Rx and Ix comes so handy in future – mainly for device to cloud interaction scenarios, complex event processing, task distribution using ISheduler etc - along with some brilliant add on libraries others are creating on top of Rx. But this one was for a quick introduction. Happy Coding!!

Interactive Extensions - Awesome Libraries For C# Developers #1

Recently while I was giving a C# talk,  I realized that a lot of developers are still not familiar with the advantages of some of the evolving, but very useful .NET libraries. Hence, I thought about writing a high level post introducing some of them as part of my Back To Basics series, generally around .NET and Javascript. In this post we’ll explore Interactive Extensions, which is a set of extensions initially developed for Reactive Extensions by the Microsoft Rx team.


Interactive Extensions, at its core, has a number of new extensions methods for IEnumerable<T> – i.e it adds a number of utility LINQ to Object query operators.  You may have hand coded some of these utility extension methods some where in your helpers or utility classes, but now a lot of them are aggregated together by the Rx team.  Also, this post assumes you are familiar with the cold IEnumerable model and iterators in C#. Basically, what C# compiler does is, it takes an yield return statement and generate a class out of that for each iterator. So, in one way, each C# iterator internally holds a state machine.  You can examine this using Reflector or something, on a method yield returning an IEnumerator<T>. Or better, there is a cool post from my friend Abhishek Sur here or this post about implementation of Iterators in C#

More About Interactive Extensions

Fire up a C# console application, and install the Interactive Extensions Package using install-package Ix-Main . You can explore the System.Linq.EnumerationsEx namespace in System.Interactive.dll  - Now, let us explore some useful extension methods that got added to IEnumerable.


Examining Few Utility Methods In Interactive Extensions

Let us quickly examine few useful Utility methods.


What the simplest version of 'Do' does is pretty interesting. It'll lazily invoke an action on each element in the sequence, when we do the enumeration leveraging the iterator.

 //Let us create a set of numbers
 var numbers = new int[] { 30, 40, 20, 40 };
 var result=numbers.Do(n=>Console.WriteLine(n));

 Console.WriteLine("Before Enumeration");

 foreach(var item in result)
                //The action will be invoked when we actually enumerate                
 Console.WriteLine("After Enumeration");


And the result below. Note that the action (in this case, our Console.WriteLine to print the values) is applied place when we enumerate.


Now, the implementation of the simplest version of Do method is something like this, if you have a quick peek at the the Interactive Extensions source code here in Codeplex, you could see how our Do method is actually implemented. Here is a shortened version.

public static class StolenLinqExtensions
        public static IEnumerable<TSource> StolenDo<TSource>(this IEnumerable<TSource> source, Action<TSource> onNext)
            //Get the enumerator
            using (var e = source.GetEnumerator())
                while (true)
                    //Move next
                    if (!e.MoveNext())
                    var current = e.Current;

                    //Call our action on top of the current item

                    //Yield return
                    yield return current;


Cool, huh.


DoWhile in Ix is pretty interesting.  It generates an enumerable sequence, by repeating the source sequence till the given condition is true.

IEnumerable<TResult> DoWhile<TResult>(IEnumerable<TResult> source, Func<bool> condition)

Consider the following code.

  var numbers = new int[] { 30, 40, 20, 40 };

  var then = DateTime.Now.Add(new TimeSpan(0, 0, 10));
  var results = numbers.DoWhile(() => DateTime.Now < then);

  foreach (var r in results)

As expected, you’ll see the foreach loop enumerating results repeatedly, till we reach meet the DateTime.Now < then condition – i.e, till we reach 10 seconds.


Scan will take a sequence, to apply an accumulator function to generate a sequence of accumulated values. For an example, let us create a simple sum accumulator, that'll take a set of numbers to accumulate the sum of each number with the previous one

 var numbers = new int[] { 10, 20, 30, 40 };
 //0 is just the starting seed value
 var results = numbers.Scan(0,(sum, num) => sum+num);

 //Print Results. Results will contain 10, 30, 60, 100

 //10+20 = 30
 //30 + 30 = 60
 //60 + 40 = 100
And you may have a look at the actual Scan implementation, from the Rx repository in Codeplex . Here is an abbreviated version.
IEnumerable<TAccumulate> StolenScan<TSource, TAccumulate>
   (this IEnumerable<TSource> source, TAccumulate seed, Func<TAccumulate, 
                               TSource, TAccumulate> accumulator)
            var acc = seed;

            foreach (var item in source)
                acc = accumulator(acc, item);
                yield return acc;


We just touched the tip of the iceberg, as the objective of this post was to introduce you to Ix. We may discuss this in a bit more depth, after covering few other libraries including Rx. There is a pretty exciting talk from Bart De Smet here that you should not miss. Ix is specifically very interesting because of it’s functional roots. Have a look at the Reactive Extensions repository in Codeplex for more inspiration, that should give you a lot more ideas about few functional patterns. You may also play with Ix Providers and Ix Async packages.

As usual, happy coding!!

Building A Recommendation Engine - Machine Learning Using Windows Azure HDInsight, Hadoop And Mahout

Feel like helping some one today?

imageLet us help the Stack Exchange guys to suggest questions to a user that he can answer, based on his answering history, much like the way Amazon suggests you products based on your previous purchase history.  If you don’t know what Stack Exchange does – they run a number of Q&A sites including the massively popular Stack Overflow. 

Our objective here is to see how we can analyze the past answers of a user, to predict questions that he may answer in future. May Stack Exchange’s current recommendation logic may work better than ours, but that won’t prevent us from helping them for our own  learning purposes Winking smile.

We’ll be doing the following tasks.

  • Extracting the required information from Stack Exchange data set
  • Using the required information to build a Recommender

But let us start with the basics.   If you are totally new to Apache Hadoop and Hadoop On Azure, I recommend you to read these introductory articles before you begin, where I explain HDInsight and Map Reduce model a bit in detail.

Behind the Scenes

Here we go, let us get into some “data science” woo do first. Cool!! Distributed Machine learning is mainly used for

  • Recommendations  - Remember the Amazon Recommendations? Normally used to predict preferences based on history.
  • Clustering  - For tasks like finding grouping together related documents from a set of documents, or finding like minded people from a community
  • Classification  - For identifying which set of category a new item belongs to. This normally includes training the system first, and then asking the system to detect an item.

“Big Data” jargon is often used when you need to perform operations on a very large data set. In this article, we’ll be dealing with extracting some data from a large data set, and building a Recommender using our extracted data.

What is a Recommender?

Broadly speaking, we can build a recommender either by

  • Finding questions that a user may be interested in answering, based on the questions answered by other users like him
  • Finding other questions that are similar to the questions he answered already.

imageThe first technique is known as user based recommendation, and the second technique is known as item based recommendations.

In the first case, taste can be determined by how may questions you answered in common with that user (the questions both of you answered). For example, think about User1, User2, User3 and User4 – Answering few questions Q1, Q2, Q3 and Q4. This diagram shows the Questions answered by the users

Based on the above diagram, User1 and User2 answered Q1, Q2 and Q3. Now, User3 answered Q3 and Q2, but not Q1.  Now, to some extent, we can safely assume that User3 will be interested in answering Q1 – because two users who answered Q2 and Q3 with him already answered Q1. There is some taste matching here, isn’t it?  So, if you have a array of {UserId, QuestionId} – it seems that data is enough for us to build a recommender.

The Logic Side

Now, how exactly we are going to do build a question recommender? In fact it is quite simple.

First, we need to find the number of times a pair of questions co-occur across the available users. Note that this matrix is having no relations with the user. For example, if Q1 and Q2 is appearing together 2 times (as in the above diagram), co occurrence value at {Q1,Q2} will be 2. Here is the co-occurrence matrix (hope I got this right).

  • Q1 and Q2 co-occurs 2 times (User1 and User2 answered Q1 ,Q2)
  • Q1 and Q3 co-occurs 2 times (User1 and User2 answered both Q1, Q3)
  • Q2 and Q3 co-occurs 3 times (User1, User2 and User3 answered Q2, Q3)
  • Like wise..


The above matrix just captures how many times a pair of questions co-occurred (answered) as discussed above. There is no mapping with users yet. Now, how we’ll relate this to find a user’s preference? To find out how close a question ‘matches’ a user, we just need to

  • Find out how often that question co occurs with other questions answered by a that user
  • Eliminate questions already answered by the user.

For the first step, we need to multiply the above matrix with the user’s preference matrix.

For example, let us Take User3. For User3, the Preference mapping with questions [Q1,Q2,Q3,Q4] is [0,1,1,0] because he already answered Q2 and Q3, but not Q1 and Q4. So, let us multiply this with the above co-occurrence matrix. Remember that this is a matrix multiplication /dot product. The Result indicates how often a Question co-occurs with other questions answered by a user (weightage).


We can omit Q2 and Q3 from the results, as we know the User 3 already answered them. Now, from the remaining, Q1 and Q4 – Q1 has the higher value (4) and hence the higher taste matching with User3. Intuitively, this indicated Q1 co-occurred with the questions already answered by User 3 (Q2 and Q3) more than Q4 co-occurred with Q2 and Q3 – so User3 will be interested in answering Q1 more than Q4. In an actual implementation, note that the User’s taste matrix will be a sparse matrix (mostly zeros) as the user will be answering only a very limited subset of questions in the past. The advantage of the above logic is, we can use a distributed map reduce model for compute with multiple map-reduce tasks - Constructing the co-occurrence matrix, Finding the dot product for each user etc.

Now, let us start thinking about the implementation.


From the implementation point of view,

  1. We need to provision a Hadoop Cluster
  2. We need to download and extract the data to analyze (Stack Overflow data)
  3. Job 1 – Extract the Data - From each line, extract {UserId, QuestionId} for all questions answered by the user.
  4. Job 2 – Build the Recommender - Use the output from above Map Reduce to build the recommendation model where possible items are listed against each user.

Let us roll!!

Step 1 - Provisioning Your Cluster

Now remember, the Stack Exchange data is huge. So, we need to have a distributed environment to process the same. Let us head over to Windows Azure. If you don’t have an account, sign up for the free trial. Now, head over to the preview page, and request the HDInsight (Hadoop on Azure) preview.

Once you have the HD Insight available, you can create a Hadoop cluster easily. I’m creating a cluster named stackanalyzer.



Once you have the cluster ready, you’ll see the Connect and Manage buttons in your dashboard (Not shown here). Connect to the head node of your cluster by clicking the ‘Connect’ button, which should open a Remote Desktop Connection to the head node. You may also click the ‘Manage’ button to open your web based management dashboard. (If you want, you can read more about HD Insight here)

Step 2 - Getting Your Data To Analyze

Once you connected to your cluster’s head node using RDP, you may download the Stack Exchange data. You can download the Stack Exchange sites data from Clear Bits, under Creative Commons. I installed Mu-Torrent client in the head node, and then downloaded and extracted the data for – The extracted files looks like this – a bunch of XML files.


What we are interested is in the Posts XML File. Each line represents either a question, or an answer. If it is a question, PostTypeId =1, and if it is an answer, PostTypeId=2.The ParentId represents the question’s Id for an answer, and OwnerUserId represents the guy who wrote the answer for this question. 

<row Id="16" PostTypeId="2" ParentId="2" CreationDate="2010-07-09T19:13:37.540" Score="3"
     Body="&lt;p&gt;...shortenedforbrevity...  &lt;/p&gt;&#xA;"
     OwnerUserId="34" LastActivityDate="2010-07-09T19:13:37.540" />

So, for us, we need to extract the {OwnerUserId, ParentId} for all posts where PostTypeId=2 (Answers) which is a representation of {User,Question,Votes}. The Mahout Recommender Job we’ll be using later will take this data, and will build a Recommendation result.

Now, extracting this data itself is a huge task when you consider the Posts file is huge. For the Cooking site, it is not so huge – but if you are analyzing the entire Stack Overflow, the Posts file may come in GBs. For extraction of this data itself, let us leverage Hadoop and write a custom Map Reduce Job.

Step 3 - Extracting The Data We Need From the Dump (User, Question)

To extract the data, we’ll leverage Hadoop to distribute. Let us write a simple Mapper. As mentioned earlier, we need to figure out {OwnerUserId, ParentId} for all posts with PostTypeId=2. This is because, the input for the Recommender Job we may run later is {user, item}.  For this, first load the Posts.XML to HDFS. You may use the hadoop fs command to copy the local file to the specified input path.


Now, time to write a custom mapper to extract the data for us. We’ll be using Hadoop On Azure .NET SDK to write our Map Reduce job.  Not that we are specifying the input folder and output folder in the configuration section. Fire up Visual Studio, and create a C# Console application. If you remember from my previous articles, hadoop fs <yourcommand> is used to access HDFS file system, and it’ll help if you know some basic *nix commands like ls, cat etc.

Note: See my earlier posts regarding the first bits of HDInsight to understand more about Map Reduce Model and Hadoop on Azure

You need to install the Hadoop Map Reduce package from Hadoop SDK for .NET via Nuget package manager.

install-package Microsoft.Hadoop.MapReduce 

Now, here is some code where we

  • Create A Mapper
  • Create a Job
  • Submit the Job to the cluster

Here we go.

using System;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using System.Text;
using System.Xml.Linq;
using Microsoft.Hadoop.MapReduce;

namespace StackExtractor

    //Our Mapper that takes a line of XML input and spits out the {OwnerUserId,ParentId,Score} 
    //i.e, {User,Question,Weightage}
    public class UserQuestionsMapper : MapperBase
        public override void Map(string inputLine, MapperContext context)
                var obj = XElement.Parse(inputLine);
                var postType = obj.Attribute("PostTypeId");
                if (postType != null && postType.Value == "2")
                    var owner = obj.Attribute("OwnerUserId");
                    var parent = obj.Attribute("ParentId");
                    // Write output data. Ignore records will null values if any
                    if (owner != null && parent != null )
                        context.EmitLine(string.Format("{0},{1}", owner.Value, parent.Value));
                //Ignore this line if we can't parse

    //Our Extraction Job using our Mapper
    public class UserQuestionsExtractionJob : HadoopJob<UserQuestionsMapper>
        public override HadoopJobConfiguration Configure(ExecutorContext context)
            var config = new HadoopJobConfiguration();
            config.DeleteOutputFolder = true;
            config.InputPath = "/input/Cooking";
            config.OutputFolder = "/output/Cooking";
            return config;


    //Driver that submits this to the cluster in the cloud
    //And will wait for the result. This will push your executables to the Azure storage
    //and will execute the command line in the head node (HDFS for Hadoop on Azure uses Azure storage)
    public class Driver
        public static void Main()
                var azureCluster = new Uri("https://{yoururl}");
                const string clusterUserName = "admin";
                const string clusterPassword = "{yourpassword}";

                // This is the name of the account under which Hadoop will execute jobs.
                // Normally this is just "Hadoop".
                const string hadoopUserName = "Hadoop";

                // Azure Storage Information.
                const string azureStorageAccount = "{yourstorage}";
                const string azureStorageKey =
                const string azureStorageContainer = "{yourcontainer}";
                const bool createContinerIfNotExist = true;
                Console.WriteLine("Connecting : {0} ", DateTime.Now);

                var hadoop = Hadoop.Connect(azureCluster,

                Console.WriteLine("Starting: {0} ", DateTime.Now);
                var result = hadoop.MapReduceJob.ExecuteJob<UserQuestionsExtractionJob>();
                var info = result.Info;

                Console.WriteLine("Done: {0} ", DateTime.Now);
                Console.WriteLine("\nInfo From Server\n----------------------");
                Console.WriteLine("StandardError: " + info.StandardError);
                Console.WriteLine("StandardOut: " + info.StandardOut);
                Console.WriteLine("ExitCode: " + info.ExitCode);
            catch(Exception ex)
                Console.WriteLine("Error: {0} ", ex.StackTrace.ToString(CultureInfo.InvariantCulture)); 
            Console.WriteLine("Press Any Key To Exit..");


Now, Compile and run the above program. The ExecuteJob will upload the required binaries to your cluster, and will initiate a Hadoop Streaming Job that’ll run our Mappers on the cluster, with input from the Posts file we stored earlier in the Input folder. Our console application will submit the Job to the cloud, and will wait for the result. The Hadoop SDK will upload the map reduce binaries to the blob, and will build the required command line to execute the job (See my previous posts to understand how to do this manually).  You can inspect the job by clicking Hadoop Map Reduce status tracker from the desktop short cut in the head node.

If everything goes well, you’ll see the results like this.


As you see above, you can find the output in /output/Cooking folder. If you RDP to your cluster’s head node, and check the output folder now, you should see the files created by our Map Reduce Job.


And as expected, the files contain the extracted data, which represents the UserId,QuestionId – For all questions answered by a user. If you want, you can load the data from HDFS to Hive, and then view the same with Microsoft Excel using the ODBC for Hive. See my previous articles.

Step 4 – Build the recommender And generate recommendations

As a next step, we need to build the co-occurrence matrix and run a recommender job, to convert our {UserId,QuestionId} data to recommendations. Fortunately, we don’t need to write a Map Reduce job for this. We could leverage Mahout library along with Hadoop. Read about Mahout Here

RDP to the head node of our cluster, as we need to install Mahout. Download the latest version of Mahout (0.7) as of this writing, and copy the same to the c:\app\dist folder in the head node of your cluster.


Mahout’s Recommender Job has support for multiple algorithms to build recommendations – In this case, we’ll be using SIMILARITY_COOCCURRENCE. The Algorithms Page of Mahout website has lot more information about Recommendation, Clustering and Classification algorithms. We’ll be using the files we’ve in the /output/Cooking folder to build our recommendation.

Time to run the Recommender job. Create a users.txt file and place the IDs of the users for whom you need recommendations in that file, and copy the same to HDFS.


Now, the following command should start the Recommendation Job. Remember, we’ll use the output files from our above Map Reduce job as input to the Recommender. Let us kick start the Recommendation job. This will generate output in the /recommend/ folder, for all users specified in the users.txt file. You can use the –numRecommendations switch to specify the number of recommendations you need against each user. If there is a preference relation with a user and and item, (like the number of times a user played a song), you could keep the input dataset for a recommender as {user,item,preferencevalue} – In this case, we are omitting the preference weightage.

Note: If the below command fails after re run complaining output directory already exists, just try removing the tmp folder and the output folder using hadoop fs –rmr temp and hadoop fs –rmr /recommend/

hadoop jar c:\Apps\dist\mahout-0.7\mahout-core-0.7-job.jar -s SIMILARITY_COOCCURRENCE 

After the job is finished, examine the  /recommend/ folder, and try printing the content in the generated file. You may see the top recommendations, against the user Ids you had in the users.txt.


So, the recommendation engine think User  1393 may answer the questions 6419, 16897 etc if we suggest the same to him. You could experiment with other Similarity classes like SIMILARITY_LOGLIKELIHOOD, SIMILARITY_PEARSON_CORRELATION etc to find the best results. Iterate and optimize till you are happy.

For an though experiment here is another exercise - Examine the Stack Exchange data set, and find out how you may build a Recommender to show a ‘You may also like’ questions based on the questions a user favorite?


In this example, we were doing a lot of manual work to upload the required input files to HDFS, and triggering the Recommender Job manually. In fact, you could automate this entire work flow leveraging Hadoop For Azure SDK. But that is for another post, stay tuned. Real life analysis has much more to do, including writing map/reducers for extracting and dumping data to HDFS, automating creation of hive tables, perform operations using HiveQL or PIG, etc. However, we just examined the steps involved in doing something meaningful with Azure, Hadoop and Mahout.

You may also access this data in your Mobile App or ASP.NET Web application, either by using Sqoop to export this to SQL Server, or by loading it to a Hive table as I explained earlier. Happy Coding and Machine Learning!! Also, if you are interested in scenarios where you could tie your existing applications with HD Insight to build end to end workflows, get in touch with me.

I suggest you to read further.

May be this is the best time for Microsoft to Open Source the .NET Framework

imageJust read a thoughtful post from OdeToCode / Scott Allen – Where is .NET headed

Scott wrote,

If your business or company still relies solely on components delivered to developers through an MSDN subscription, then it is past time to start looking beyond what Microsoft offers for .NET development so you won’t be left behind in 5 years. Embrace and support open source.

At least, that’s how I see things.

As I commented there, the best move Microsoft could make at this point, with respect to .NET is - to Open Source the .NET Framework. And then launch a program like Apache Incubator around to promote ‘real’ OSS initiatives around the .NET ecosystem.

I believe, the real problem with .NET ecosystem is the unavailability of serious frameworks for solving new age problems - .NET developers are cramped with the non-availability of Proven libraries for Machine Learning, Distributed Processing, Text Processing etc. - Talk about Hadoop, Solr, Lucene, Mahout, Storm, OpenNLP etc - All written in Java and is continuously maturing, despite the fact that Sun started screwing the Java community. The .NET ports for some of these libraries bit the dust long time back, or has very little adoption compared to their prominence in Java ecosystem.

Microsoft is now trying to bridge this gap by bringing in platforms like Hadoop to Azure, and building bridges around the same for .NET interoperability.

CLR and C# are awesome - but MS just can't push forward the development of mature libraries around the same to keep up with the pace of OSS initiatives. The true future of .NET lies in the hand of OSS community at large. Hopefully, if Azure turns out to be a big success story, then Microsoft won't mind open sourcing the Entire .NET stack ;). As Scott pointed in his article,  Azure and related hosted offerings (like Office365) + Windows 8 could become the focus points for Microsoft’s revenue.

It is heartening to see that a number of frameworks like ASP.NET MVC, MEF, EF etc already got Open Sourced. But Microsoft can do better in the OSS world to ensure .NET ecosystem will grow further. I believe a lack of buy in from OSS community at large is there –and Microsoft could reduce the friction a little bit by open sourcing .NET.

Hack Raspberry Pi – How To Build Apps In C#, WinForms and ASP.NET Using Mono In Pi

imageRecently I was doing a bit of R&D related to finding a viable, low cost platform for client nodes. Obviously, I came across Raspberry Pi, and found the same extremely interesting. Now, the missing piece of the puzzle was how to get going using C# and .NET in the Pi. C# is a great language, and there are a lot of C# developers out there in the wild who are interested in the Pi.

In this article, I’ll just document my findings so far, and will explain how develop using C# leveraging Mono in a Raspberry Pi. Also, we’ll see how to write few minimal Windows Forms & ASP.NET applications in the Pie as well.

Step 1: What is Raspberry Pi?

Raspberry Pi is an ARM/Linux box for just ~ $30. It was introduced with a vision to teach basic computer science in schools. How ever, it got a lot of attention from hackers all around the world, as it is an awesome low cost platform to hack and experiment cool ideas as Pi is almost a full fledged computer.  More About R-Pi From Wikipedia.

The Raspberry Pi is a credit-card-sized single-board computer developed in the UK by theRaspberry Pi Foundation with the intention of promoting the teaching of basic computer science in schools. The Raspberry Pi has a Broadcom BCM2835 system on a chip (SoC),[3] which includes anARM1176JZF-S 700 MHz processor (The firmware includes a number of "Turbo" modes so that the user can attempt overclocking, up to 1 GHz, without affecting the warranty),[4] VideoCore IV GPU,[12] and originally shipped with 256 megabytes of RAM, later upgraded to 512MB.[13] It does not include a built-in hard disk or solid-state drive, but uses an SD card for booting and long-term storage.[14] The Foundation's goal is to offer two versions, priced at US$25 and US$35.

Another good introduction from Life Hacker is here

Step 2: Setting up your Development Environment

Let us have a quick look at setting up the development environment.

If you have a physical Raspberry Pi

Raspberry Pi boots from an SD Card. Basically, this means you can transfer an operating system image to an SD Card, and boot your Pi from there. The Raspberry Pi download page lists multiple OS images you can use to load your SD Card. You will need to unzip it and write it to a suitable SD card using the UNIX tool dd. Windows users should use Win32DiskImager. There is a simple guide about how to setup your SD card if you are wondering how to do this. Then, you can connect your Pi device to the TV using an HDMI cable, and can hook up an old USB keyboard to start hacking.

Though multiple images are available in the below mentioned download page, please note that at this point only the Soft-float Debian “wheezy” image can be used to install Mono with out issues. This is because Mono doesn’t support hard-float for Raspberry Pi (simplified version) and some runtime methods like DateTime won’t work if you are using a hard float version. So, you need to download the Soft-Float Debian image and transfer the image file to the SD card. If you feel a bit adventurous, you can try this hard-float port of Mono for the Pi, but it is not supported.

If you don’t have a physical Raspberry Pi

You can still download one of the images from the download page and boot from an emulator like Qemu, which supports ARM architecture.

Get If you are on Windows and if you are not familiar with Qemu, you can download this pre-packaged version of  Qemu + Raspberry Pi from here – How ever there is a problem. The Image file that comes with this download is the Hard float version. As we need Mono to work properly, download the Soft-Float Debian Wheezy image – and copy that to the Qemu folder, and modify run.bat in qemu folder to the below command line to point it to the correct image. (If you downloaded Qemu and Soft Float Debian Wheezy image separately, you can create a handy run.bat). Note that as of this writing, the image Soft float image is 2012-08-08-wheezy-armel.img – You need to modify this accordingly. The –cpu switch specify the CPU architecture, –hda attaches the image as the boot device and –m specifies the memory. I’m allocating 512 256 MB to my virtual machine. (Note: Updated the command line to use 256 memory based on the comment from Peter below)

qemu-system-arm.exe -M versatilepb -cpu arm1176 -hda 2012-08-08-wheezy-armel.img -kernel kernel-qemu -m 256 -append "root=/dev/sda2"

Step 3: Booting the Pie And Installing Mono

Once you boot the Pie, enter the username and password based on the image/distribution you are using (user/password is normally pi/raspberry).  Now, to install mono, we’ll use the apt-get package manager. Let us update and upgrade all the packages before we proceed with the install. Run apt-get with admin permissions, using sudo in the console/terminal.

Each of the below command will take a bit of time - so make sure you've a pressure ball around. And let the force be with you for a successful installation #lol. 
sudo apt-get update --fix-missing

sudo apt-get upgrade --fix-missing

sudo apt-get install mono-complete --fix-missing

Step 4: Writing C#

Assuming you’ve got the installation right - Time to write a little bit of C#. Mono has a CSharp command line (REPL), you can bring it up by typing CSharp. So, just play around. If you are setting this up to teach CSharp to kids in schools (original Raspberry Pi vision), the force is stronger in you than you think (Sorry, HBO is now re-telecasting all the old Star Wards stuff). What ever. Here is my CSharp console. Note that if your DateTime.Now is behaving madly, you never read my above paragraph and you’ve got the wrong OS image.



Step 5: Write And Compile Your Code – A Windows Forms app

Quit the CSharp console (quit) and start the Window manager by typing ‘startx’ – Never mind if you are already there. This should bring up the Raspberry Pi desktop.  Now, let us create a CSharp file using Leaf pad, a light weight editor in the Raspberry Pi. Either in Run or in a terminal window, type

sudo leafpad /opt/windowapp.cs
And type in some code. Let us create a quick Windows forms application to ensure this is possible in Mono in the Pi.



Now, let us compile the code using the mono compiler (gmcs). Ensure that you are providing the reference libraries we are importing using the /r switch. This should produce the windowapp.exe executable.



Cool. Now time to run it via Mono.  This should display a tiny window with width and height 100 – You can drag the corners to see the little Window we created. From here onwards, it is your normal .NET style. Add controls, build apps and have a blast.


Step 6: Installing The XSP Web Server For Some ASP.NET Love

Let us now install Mono XSP server so that we can run some ASP.NET code in the Pie, to use it as a tiny web server.  Let us get the XSP bits using apt-get. Fire up your terminal, and type

sudo apt-get install mono-xsp2 mono-xsp2-base

Which should install XSP.


Cool, you got the XSP installed. Now, let us try some ASP.NET stuff in the Pi. Create a new aspx file in /opt folder using Leafpad, as we did earlier. Here is my aspx file, a simple aspx file I copied from Startdeveloper as I was lazy.


Save the file to your /opt folder, and And start XSP2 in the opt folder.  Run XSP2, and open the Internet->Browser and navigate to http://localhost:8080 – You see, it works.




In this post, we explored how to setup Mono to build and run .NET Win forms and Web applications in the Raspberry Pi. Build a computer to learn .NET and C# for under 40$, huh. Follow me in twitter @ – Happy Coding.

BIG DATA for .NET Devs: HDInsight, Writing Hadoop Map Reduce Jobs In C# And Querying Results Back Using LINQ

Azure HD Insight Services is a 100% Apache Hadoop implementation on top of Microsoft Azure cloud ecosystem. 

In this post, we’ll explore

  1. HDInsight/Hadoop on Azure in general and steps for starting with the same
  2. Writing Map Reduce Jobs for Hadoop using C# in particular to store results in HDFS.
  3. Transferring the result data from HDFS to Hive
  4. Reading the data back from the hive using C# and LINQ


If you are new to Hadoop and Big Data concepts, I suggest you to quickly check out

There are a couple of ways you can start with HDInsight.

Step 1: Setting up your instance locally in your Windows

For Development, I highly recommend you to install HDInsight developer version locally – You can find it straight inside the Web Platform installer.

Once you install the HDInsight locally, ensure you are running all the Hadoop services.


Also, you may use the following links once your cluster is up and running.

Here is the HDInsight dashboard running locally.


And now you are set.

Step 2: Install the Map Reduce package via Nuget

Let us explore how to write few Map Reduce jobs in C#. We’ll write a quick job to count namespaces from C# source files Earlier, in a couple of posts related to Hadoop on Azure - Analyzing some ‘Big Data’ using C# and Extracting Top 500 MSDN Links from Stack Overflow – I showed how to use C# Map Reduce Jobs with Hadoop Streaming to do some meaningful analytics. In this post, we’ll re-write the mapper and reducer leveraging the the new .NET SDK available, and will apply the same on few code files (you can apply that on any dataset).

The new .NET SDK for Hadoop makes it very easy to work with Hadoop from .NET – with more types for supporting Map Reduce Jobs, For creating LINQ to Hive queries etc. Also, the SDK provides an easier way to create and submit your own Map Reduce jobs directly in C# either to the local developer instance or to Azure Hadoop cluster. 

To start with, create a console project and install the Microsoft.Hadoop.Mapreduce package via Nuget.

Install-Package Microsoft.Hadoop.Mapreduce

This will add the required dependencies.

Step 3: Writing your Mapper and Reducer

The mapper will read the input from the HDFS file system, and the writer will emit outputs to HDFS. HDFS is Hadoop’s distributed file system, which guarantees high availability. Checkout the Apache HDFS architecture guide for details.

With Hadoop SDK, now you can inherit your Mapper from the MapperBase class, and Reducer from the ReducerCombinerBase class. This is equivalent to the independent Mapper and Reducer exes I demonstrated earlier using Hadoop streaming, just that we’ve got a better way of doing the same. In the Map method, we are just extracting the namespace declarations using reg ex to emit the same (See hadoop streaming details in my previous article)

    public class NamespaceMapper : MapperBase
        //Override the map method.
        public override void Map(string inputLine, MapperContext context)
            //Extract the namespace declarations in the Csharp files
            var reg = new Regex(@"(using)\s[A-za-z0-9_\.]*\;");
            var matches = reg.Matches(inputLine);

            foreach (Match match in matches)
                //Just emit the namespaces.

    public class NamespaceReducer : ReducerCombinerBase
        //Accepts each key and count the occurrances
        public override void Reduce(string key, IEnumerable<string> values, 
            ReducerCombinerContext context)
            //Write back  

Next, let us write a Map Reduce Job and configure the same.

Step 4: Writing your Namespace Counter Job

You can simply specify your Mapper and Reducer types and inherit from HadoopJob to create a job class. Here we go.

   //Our Namespace counter job
    public class NamespaceCounterJob : HadoopJob<NamespaceMapper, NamespaceReducer>
        public override HadoopJobConfiguration Configure(ExecutorContext context)
            var config = new HadoopJobConfiguration();
            config.InputPath = "input/CodeFiles";
            config.OutputFolder = "output/CodeFiles";
            return config;

Note that we are overriding the Configure method to specify the configuration parameters. In this case, we are specifying the input and output folders for our mapper/reducer – The lines in the files in input folder will be passed to our mapper instances, and the combined output from the reducer instances will be placed in the output folder.

Step 5: Submitting the job

Finally, we need to connect to the cluster and submit the job, using the ExecuteJob method. Here we go with the main driver.

class Program
        static void Main(string[] args)
            var hadoop = Hadoop.Connect();
            var result=hadoop.MapReduceJob.ExecuteJob<NamespaceCounterJob>();

We are invoking the ExecuteJob method using the NamespaceCounterJob type we just created. In this case, we are submitting the job locally – if you want to submit the job to an Azure HDInsight cluster for the actual execution scenario, you should pass the Azure connection parameters. Details here

Step 6: Executing the job

Before executing the job, you should prepare your input – in this case, you should copy the source code files in the input folder we provided as part of the configuration while creating our Job (see the  NamespaceCounterJob). To do this, fire up the Hadoop command line console from the desktop. If your cluster is on Azure, you can remote login to the cluster head node by choosing Remote Login from the HDInsight Dashboard.

  • Create a folder using the hadoop fs –mkdir input/CodeFiles command
  • Copy few CSharp files to your folder using hadoop fs –copyFromLocal your\path\*.cs  input/CodeFiles

See I’m copying all my CS files under BasicsRevisited folder to input/CodeFiles.


Now, build your project in Visual Studio, open the bin folder and execute your exe file. This will internally kick start MRRunner.exe and your map reduce job will get executed (The name of my executable is simply MapReduce.exe). You can see the detected file dependencies are automatically submitted.


Once the Map Reduce job is completed, you’ll find that the combined output will be placed in output/CodeFiles folder. You can issue the –ls and –cat commands to list the files and view the content of the part-00000 file where the output will be placed (Yes, a little Linux knowledge will help at times Winking smile). The part-00000 file contains the combined output of our task – see the name spaces along with their count from the files I submitted.


Step 7: Loading data from HDFS to Hive

As a next step, let us load the data from HDFS to Hadoop Hive so that we can query the same. We'll create a table using the CREATE TABLE hive syntax, and will load the data. You can run ‘hive’ command from the Hadoop command line to run the following statements.

CREATE TABLE nstable (
  namespace STRING,
  count INT)

LOAD DATA INPATH 'output/CodeFiles/part-00000' into table nstable;

And here is what you might see.


Now, you can read the data from the hive.

And there you go. Now you know everything about writing your own Hadoop Map Reduce Jobs in C#, load the data to the Hive, and query the same back in C# to visualize your data.  Happy Coding.

© 2012. All Rights Reserved.