Friday, September 23, 2011

Spatial State


There are actually a wealth of motion-capture driven interface tech demos out on the net. One of the most famous examples comes from a TED talk by John Underkoffler (the designed behind the fictional Minority Report interface.

In Underkoffler’s interface, all control is performed by gestures and movements of the hands. As Underkoffler navigates through a series of pictures, his hand becomes an extension into the 3D spatial representation of all the photo files. The movement of his hand across the x,y,z axis map to the movement across the same axises inside the virtual space.

Despite the multi-dimensionality of Underkoffler’s user interaction, there is one aspect that strikingly static: the actual root position of the user. As Underkoffler swings his hands up, down, left, right, forward and backward. His feet and the majority of his body is stationary If you even look carefully at the TED video, Underkoffler has marked (through tape) the exact location that he must stand during all interaction. I believe that the disregard of this extra modality is a wasted opportunity in producing a natural user interface.

Thus I propose to use the spatial position of the user root (body) to as a way to represent state with the ability to modify all ongoing interactions. This is quite a bit of information and might be hard to conceptually understand, so I will give a short example of one way I plan to use spatial location of the root:
The spatial interaction environment

Image that you are manipulating a 3D unit cube using the Kinect. As you move your hands up, down, left and right, your view of the cube pans accordingly. However, what if you want to do very small and detailed pans? This becomes difficult due to the accuracy of the Kinect motion capture system. Thus I will implement a system where the closer you are to the Kinect (smaller Z axis value), the smaller the mapping factor between hand movement and camera panning (large movements in real life will lead to smaller movements in the virtual world). This is the opposite when you are further away from the Kinect device; the mapping factor is greater and thus smaller movements of the hands can lead to large movements in the virtual void. Your spatial position in the interaction environment becomes a modifier to the sensitivity levels of hand gesture.


This is also effective because it is a natural extension of the real world. As you work on more detailed elements of a drawing, your will moving in closer; however, if you are doing overall "big picture" work, you will take a step back so that you maintain the entire perspective.

I think that using this concept of spatial state can lead to even richer interactions, I will to incorporate this in other interactions too.

2 comments:

  1. In the cube example, you map z axis translation to "sensitivity" of user input. I like the analogy with how that maps to zooming in and zooming out in a real world perspective. However, do you feel that the "sensitivity" adjustment would be extremely user specific?

    For example, A (with poor vision) might choose to stand closer by default as compared to B with normal vision. Do you feel that this example would then require a "setup" phase where users would have to tune the settings to map to their own needs.

    Also: do you feel that this example exploits the flexibility of root movement to the fullest? The cube example is a case of object manipulation. What about another example of data visualization (e.g. the human anatomy) where the objective is to explore a high dimensional visual data set?

    ReplyDelete
  2. Even before reading this comment, I agree with the last paragraph of MK's comment, though my data-set was different. If the data-set was very large or fine grain, it might be difficult to tune exactly where you need to stand.

    Also a bit of administration, feel free to post the simple Kinect tests on the blog as you start to play with your base framework.

    ReplyDelete