Friday, September 23, 2011

Spatial State


There are actually a wealth of motion-capture driven interface tech demos out on the net. One of the most famous examples comes from a TED talk by John Underkoffler (the designed behind the fictional Minority Report interface.

In Underkoffler’s interface, all control is performed by gestures and movements of the hands. As Underkoffler navigates through a series of pictures, his hand becomes an extension into the 3D spatial representation of all the photo files. The movement of his hand across the x,y,z axis map to the movement across the same axises inside the virtual space.

Despite the multi-dimensionality of Underkoffler’s user interaction, there is one aspect that strikingly static: the actual root position of the user. As Underkoffler swings his hands up, down, left, right, forward and backward. His feet and the majority of his body is stationary If you even look carefully at the TED video, Underkoffler has marked (through tape) the exact location that he must stand during all interaction. I believe that the disregard of this extra modality is a wasted opportunity in producing a natural user interface.

Thus I propose to use the spatial position of the user root (body) to as a way to represent state with the ability to modify all ongoing interactions. This is quite a bit of information and might be hard to conceptually understand, so I will give a short example of one way I plan to use spatial location of the root:
The spatial interaction environment

Image that you are manipulating a 3D unit cube using the Kinect. As you move your hands up, down, left and right, your view of the cube pans accordingly. However, what if you want to do very small and detailed pans? This becomes difficult due to the accuracy of the Kinect motion capture system. Thus I will implement a system where the closer you are to the Kinect (smaller Z axis value), the smaller the mapping factor between hand movement and camera panning (large movements in real life will lead to smaller movements in the virtual world). This is the opposite when you are further away from the Kinect device; the mapping factor is greater and thus smaller movements of the hands can lead to large movements in the virtual void. Your spatial position in the interaction environment becomes a modifier to the sensitivity levels of hand gesture.


This is also effective because it is a natural extension of the real world. As you work on more detailed elements of a drawing, your will moving in closer; however, if you are doing overall "big picture" work, you will take a step back so that you maintain the entire perspective.

I think that using this concept of spatial state can lead to even richer interactions, I will to incorporate this in other interactions too.

Wednesday, September 21, 2011

Hello all!

Now that I have finished a more complete design document, I have began jumping into the Microsoft Kinect SDK and taking a look at the capabilities of the Kinect.

Linking Kinect to a Windows 7 PC is actually a fairly simple process. Once the SDK has been installed, you simply plug the Kinect into the pc through the USB port and the fun begins. The first thing that I look at was the type of data that the Kinect SDK would give. The Kinect SDK is able to track up to two skeletal models, giving joint location values as a vector in the 3D plane.
The values are normalized from -1 to 1 on the x, y and z axis. The Kinect also returns a depth map that is stored in a byte array, but I have not decided yet if that data is necessary. The raw video streams are also fairly easy to extract using the SDK and this is be greatly useful for debugging as we can overlay the skeletal joints on top and see what the Kinect is recognizing at the moment.

I have also designed the general pipeline of our interface. The code will be divided in a very traditional model-view-controller pattern. Our controller consists of the Kinect device and its accompanying SDK. Here, raw motion and voice input is both captured and filtered into usable data. The model consists of our recognition engine. The current state of the interface is stored here and the engine is responsible for changing state due to input if necessary. Finally I will use the Unity engine solely for rendering, and it will connect to my recognition engine through the use of .dll’s. 



I will be coding a few simple Kinect demos in the next few days and I am beginning to design the interaction and user experience. Look for all this in another blog post shortly.

Wednesday, September 14, 2011

My Abstract

First Post!
Here is my abstract:

Human computer interaction has traditionally been limited to the mouse and keyboard; however, with the advent of touch screens and motion capture hardware, there has been a rise in the concept of the “natural user interface” or NUI. The natural user interface is heavily driven by gesture rather than precise movements and clicks of the mouse. This project will explore interactions with visualizations driven by multimodal input (in the form of motion and voice).