Sunday, October 2, 2011

Setting up the for Kinect Development
This past week, I have been focusing on becoming familiar with development using the Kinect SDK. I decided to develop the recognition engine in C# rather than C++. Two main reasons:

1.) C# makes it easier to perform threading, and this will come in useful when having to integrate both audio and motion input.

2.) C# has some basic UI elements built in, so this allows me to test and prototype faster as I can easier code simple demos to test the recognition engine before I render the final interface in unity/

After looking through some demo code and documentation I found out that the Kinect SDK allows for two methods of motion tracking. The first is a polling-based approach, where the Kinect sensors return information at stated intervals. The other option is a event-based approach, where Kinect sensors return data if there is actually an event (motion, change in depth, etc....). The event-based approach seems more efficient for my purposes, and I decided to build on top of this approach.

The first step in linking up any code to the Kinect is to turn on the sensors that you need.
Here we activate the depth sensors, skeletal tracking and the raw color camera (for testing purposes so we can see how our real world actions correlate to on screen actions)


Next, we hook assign event handlers to the sensor events.

Finally, we create an entire new thread for our audio input and speech recognition.
Most of my code right now is under the nui_SkeletonFrameReady. This is called every time the Kinect senses a chance in any of the skeletal joint position, and my interface will react accordingly.

Joint Data 
The Kinect Skeleton only returns the positions of the joints in as a Vector3. Thus I created a wrapper class over these joint positions that calculates both velocity and acceleration data. The velocity and acceleration are updated every time the SkeletonFrame event handler is called. I am looking at how to better smooth and interpolate this data so that gesture motion will not lead to "spikey".

Voice Data
Through the Kinect, we can leverage Microsoft's Speech API that has voice recognition features. We open the Kinect's microphone input as an audio source stream, and this stream is passed to the Microsoft's speech recognizer.

Before anything else, it is necessary to generate a grammar, a list of words or phrases that the speech recognizer should be looking for.


















Much like the SkeletonTracking, the speech recognition system uses an eventhandler system. The 3 events that are thrown are SpeechRecognized, SpeechHypothesized and SpeechRecognitionRejected events. I still need to do more reading about the exact details of these three events, but I am only responding for SpeechRecognized events for now.

Testing Environment
I made a simple testing interface as I continue to implement gesture recognition features and modes. 
All my work right now involves basic manipulation (zoom, pan) of a circle in two-dimensional state.
I also output some key information such as the current mode that my interface is in and the sensitivity modifier(discussed in the previous post). 

I will go more in-depth about this test environment in my next post and explain some of the new gesture recognition models that I have come up with. Videos too! Look for it tomorrow once I get camtasia (need Joe's admin password) installed on my dev computer.

Test interface:

No comments:

Post a Comment