Kibloc – Real time, distance based object tracking and counting using Kinect


ANOOP MADHUSUDANAN

Vote on HN

This weekend hack is a small Kinect application - Kibloc is a physical object counter/tracker using Kinect.

Update As Of Feb 1 2012: This article is explaining Version 1.0 Beta 2 APIs. How ever, now the Official 1.0 Kinect SDK For Windows is available, and there are changes from the Beta 2 API.

imageKinect for Windows SDK (Download) is pretty intuitive (I’m using Version 1.0 Beta 2 for this, so if you want to test this you should use Beta 2 APIs which is still available as a separate download and NOT the Version 1.0 APIs), and you may use the same to develop pretty cool applications using Microsoft Kinect. In this post, we’ll be focusing on implementing a quick real time blob counter using Kinect depth data, for counting and tracking objects in front of the sensor. This is a basic demo, but as you can imagine, this has got a couple of pretty hot real life use cases.  As a heads up, the source code is at http://kibloc.codeplex.com/ and keep it handy when you read along. Ensure you’ve the NuGet Packages in packages.config

Here is the video that demonstrates real time, distance based blob tracking.

In the video, the image drawing for the blob highlight is a bit slow, this is because I’m drawing it on top of the color image as discussed below. For actual games/calculations, you don’t need the drawing on top of the color image/bitmap.

So, let us develop this. The steps involved are pretty simple.

  • Initializing Kinect
  • Get the depth image, and slice the depth frame to draw pixels with in the range to form a gray scale image
  • Pass this image to the blob detection algorithm
  • Find the convex hull/edges/quadrilateral 
  • Draw the same on the color image from the video frame

Introduction to Kinect API

Kinect API allows you to work with the Kinect camera’s image/video stream, almost similar to a web cam.  Apart from the RGB Video camera, Kinect has got a depth sensor as well., which gives you information about the distance of an object in front of the device - and you can leverage the Video data and Depth data together to detect objects using Kinect. Kinect also exposes built in support for Skeletal tracking. The classes for working with Video, Depth and Skeletal Tracking resides in the namespace Microsoft.Research.Kinect.Nui. You can access the Kinects connected to the device via the Runtime.Kinects enumeration, and can access the Depth, Video and Skeletal tracking data by handling the DepthFrameReady, VideoFrameReady and SkeletalFrameReady events of a connected Kinect object. In the below example you could see that we are accessing the Depth data from Kinect.

Additionally, you can also use the Audio features of Kinect, leveraging classes available under the namespace Microsoft.Research.Kinect.Audio.

That is it for an interesting, and I suggest you should go through the SDK Quick starts and samples here at Coding 4 Fun if you need to learn more about how to work with the SDK in detail. The API is intuitive and super easy, but I suggest you may go through these videos to kick start.

 

Initializing Kinect

The first step is to initialize Kinect, so here is the self explanatory code.

           
// If you don't make sure the Kinect is plugged in and working before trying to use it, the app will crash
if (Runtime.Kinects.Count > 0)
{
    runtime = Runtime.Kinects[0];
    runtime.Initialize(RuntimeOptions.UseColor | RuntimeOptions.UseDepth);
    runtime.DepthFrameReady += (s, e) =>
        {
           // Use depth data
        };

    runtime.VideoFrameReady += (s, e) =>
        {
            //Use video data
        };

    runtime.DepthStream.Open(ImageStreamType.Depth, 2, ImageResolution.Resolution320x240, ImageType.Depth);
    runtime.VideoStream.Open(ImageStreamType.Video, 2, ImageResolution.Resolution640x480, ImageType.Color);
}
else
    MessageBox.Show("Oops, please check if your Kinect is connected?");

Basically, you may see that we are initializing the Kinect run time by passing the flags UseColor and UseDepth, so that we can consume the DepthStream and VideoStream later. The event handlers are attached to DepthFrameReady and VideoFrameReady events as shown above.

Get the depth image for a range

Kinect depth data is just an array of bytes, where each pixel is represented with two bytes (16 bits). Each pixel in the depth data provides distance of a pixel from the sensor, instead of color information as in a normal bitmap pixel. You can either specify Kinect to provide you the raw depth data (ImageType.Depth) or to provide you depth data along with player index (ImageType.DepthAndPlayerIndex) when you initialize the DepthStream (see the above code). You can read this iProgrammer article if you are interested to learn more about the Depth data, in detail.

Basically, what we do in the below method is to walk through the PlanarImage (raw byte array) for Depth data, and create a bitmap with 4 bytes per pixel (Red, Blue, Green and Alpha)  – where Red, Blue and Green values will be the same value to form a gray scale image. The value is any number between 0 to 255, based on the distance of a pixel.

image

    /// <summary>
    /// Credits due: 
    /// Portions of code below from
    /// (1) http://www.codeproject.com/Articles/317974/KinectDepthSmoothing though I'm not really smoothing here :)
    /// (2) http://stackoverflow.com/questions/94456/load-a-wpf-bitmapimage-from-a-system-drawing-bitmap
    /// 
    /// -- Anoop
    /// </summary>

   public static BitmapSource SliceDepthImage(this ImageFrame image, int min=20, int max=1000)
        {
            int width = image.Image.Width;
            int height = image.Image.Height;

            var depthFrame = image.Image.Bits;
            // We multiply the product of width and height by 4 because each byte
            // will represent a different color channel per pixel in the final iamge.
            var colorFrame = new byte[height * width * 4];

            // Process each row in parallel
            Parallel.For(0, 240, depthRowIndex =>
            {
                //  Within each row, we then iterate over each 2 indexs to be combined into a single depth value
                for (int depthColumnIndex = 0; depthColumnIndex < 640; depthColumnIndex += 2)
                {
                    var depthIndex = depthColumnIndex + (depthRowIndex * 640);

                    // Because the colorFrame we are creating has twice as many bytes representing
                    // a pixel in the final image, we set the index to be twice of the depth index.
                    var index = depthIndex * 2;

                    // Calculate the distance represented by the two depth bytes
                    var distance = CalculateDistanceFromDepth(depthFrame[depthIndex], depthFrame[depthIndex + 1]);

                    // Map the distance to an intesity that can be represented in RGB
                    var intensity = CalculateIntensityFromDistance(distance);

                    if (distance > min && distance < max)
                    {
                        // Apply the intensity to the color channels
                        colorFrame[index + 0] = intensity; //blue
                        colorFrame[index + 1] = intensity; //green
                        colorFrame[index + 2] = intensity; //red
                        colorFrame[index + 3] = 255; //alpha
                    }
                }
            });

Once the depth image is created, we pass it over to the Blob detector to detect the blobs and high light the same.

Blob Detection and Highlighting

For blob detection, I’m using the Excellent AForge library. The BlobCounter class takes the above image, and calculate the edges/rectangles of the blobs in the image. Here is an expansion of our earlier DepthFrameReady and ImageFrameReady events, you can see that we are creating a sliced depth image based on the slider values, and creating a depth frame image bitmap and a color frame image bitmap, and passes both of them to the blob detector. We are also specifying what type of high lighting to be used by the detector, based on the combo box the user has selected.

 runtime.DepthFrameReady += (s, e) =>
        {
            if (colorFrame == null) return;

            if (sliderMin.Value > sliderMax.Value)
                sliderMin.Value = sliderMax.Value;

            detector.Highlighting = (HighlightType) cmbHighlight.SelectedIndex;
                        
            txtInfo.Text = detector.BlobCount + " items detected..";
            txtDistance.Text = "Detecting objects in the range " + sliderMin.Value + " and " + sliderMax.Value + " mm";

            //Depth frame bitmap
            var depthFrame=e.ImageFrame.SliceDepthImage((int)sliderMin.Value,(int)sliderMax.Value);
            var depthBmp =depthFrame.ToBitmap();

            //Color frame bitmap
            var colorBmp = colorFrame.ToBitmapSource().ToBitmap();

            //Detect blobs using depthBmp, draw high lights to colorBmp
            var outBmp=detector.ProcessImage(depthBmp,colorBmp);

            //Draw the output high lighted color image
            this.ImageColor.Source = outBmp.ToBitmapSource();

            depthBmp.Dispose();
            colorBmp.Dispose();
            outBmp.Dispose();
                        
        };

    runtime.VideoFrameReady += (s, e) =>
        {
            //colorFrame is a global variable 
            colorFrame = e.ImageFrame;
        };

The following ProcessImage method in our BlobDetector class basically uses the AForge library to do the detection based on depth image.

image

After detection, we are drawing the high lights back to the color image which results in the image above. This code is mostly based on the AForge blob detection sample.

        // Set image to display by the control
        public Bitmap ProcessImage(Bitmap depthImage, Bitmap colorImage)
        {
            leftEdges.Clear();
            rightEdges.Clear();
            topEdges.Clear();
            bottomEdges.Clear();
            hulls.Clear();
            quadrilaterals.Clear();

            this.image = AForge.Imaging.Image.Clone(depthImage, PixelFormat.Format24bppRgb);
            imageWidth = this.image.Width;
            imageHeight = this.image.Height;

            blobCounter.ProcessImage(this.image);
            blobs = blobCounter.GetObjectsInformation();

            //Let us resize the color image to the size of depth image
            ResizeNearestNeighbor filter = new ResizeNearestNeighbor(depthImage.Width, depthImage.Height);
            var outImage = filter.Apply(colorImage);

            //Let use flip the color image to match the depth image
            outImage.RotateFlip(RotateFlipType.RotateNoneFlipX);


            BlobCount = blobs.Count();

            GrahamConvexHull grahamScan = new GrahamConvexHull();

            foreach (Blob blob in blobs)
            {
                List<IntPoint> leftEdge = new List<IntPoint>();
                List<IntPoint> rightEdge = new List<IntPoint>();
                List<IntPoint> topEdge = new List<IntPoint>();
                List<IntPoint> bottomEdge = new List<IntPoint>();

                // collect edge points
                blobCounter.GetBlobsLeftAndRightEdges(blob, out leftEdge, out rightEdge);
                blobCounter.GetBlobsTopAndBottomEdges(blob, out topEdge, out bottomEdge);

                leftEdges.Add(blob.ID, leftEdge);
                rightEdges.Add(blob.ID, rightEdge);
                topEdges.Add(blob.ID, topEdge);
                bottomEdges.Add(blob.ID, bottomEdge);

                // find convex hull
                List<IntPoint> edgePoints = new List<IntPoint>();
                edgePoints.AddRange(leftEdge);
                edgePoints.AddRange(rightEdge);

                List<IntPoint> hull = grahamScan.FindHull(edgePoints);
                hulls.Add(blob.ID, hull);

                List<IntPoint> quadrilateral = null;

                // find quadrilateral
                if (hull.Count < 4)
                {
                    quadrilateral = new List<IntPoint>(hull);
                }
                else
                {
                    quadrilateral = PointsCloud.FindQuadrilateralCorners(hull);
                }
                quadrilaterals.Add(blob.ID, quadrilateral);

                // shift all points for vizualization
                IntPoint shift = new IntPoint(1, 1);

                PointsCloud.Shift(leftEdge, shift);
                PointsCloud.Shift(rightEdge, shift);
                PointsCloud.Shift(topEdge, shift);
                PointsCloud.Shift(bottomEdge, shift);
                PointsCloud.Shift(hull, shift);
                PointsCloud.Shift(quadrilateral, shift);
            }

            //Method to draw the high lights, just see the full source code
            DrawHighLights(outImage);

            return outImage;


        }

Conclusion

In this post, we explored how to create a blob detector and object counter using Kinect. As of now, Kinect’s built in tracking feature is limited just to Skeletal tracking, but you can use libraries like AForge.net and openCV to leverage Kinect for more out of the box scenarios. Thanks to my kiddo for lending me her dolls for the demo Winking smile. Also, follow me in twitter. And here are few more posts on a similar taste.

© 2012. All Rights Reserved. Amazedsaint.com