Monday, December 19, 2011

I GOT MY PROJECT TO WORK ON DOGS TOO!!!


WOO!!!!!

Just kidding. Anyways, I've been at work the last few days, adding new interaction features and tightening my existing gesture commands. There are still some kinks that I have to iron out as you can see from my video...but check it out (turn on sound too, I'm narrating!).




[EDIT]
This video just came up as a recommended video next to my youtube upload.
Best kinect hack ever?.....Are puppies cute?.......YES





Monday, December 12, 2011

GUI in a NUI

Today, I was working on refining the "GUI" interface on my project. That means working with menu navigation, animations, text and selection.

So far I have two menus:
(1)A mode menu that allows you to selected an operation on the spot.
(2) A Global menu that allows to adding objects, exiting the app or doing any large scale changes
The mode menu is accessed through the swipe of the right hand, the global menu is activated by taking a step back.

Even thought it may seem like eye-candy, the correct animations and fonts can make a big difference in usability. Part of the idea of the NUI is that you have a very "physical" interface, objects move on screen like they would in real life since you are actually using your hands. Thus the slide animations I've included add to this feel.  Furthermore, font for such an interface is an interesting selection. I assume that most users will be standing at least 2 feet away (due to the limitations of the Kinect) and thus most font is kept at a large size. I've used a san serif font called Segoe; this gives the interface a cleaner look.


Here is a demo video (still working on improving the constants for optimal gesture recognition).



Also sorry for the jittery video capture. The animations are smooth...trust me :).





Sunday, December 11, 2011

Interaction Feedback and Selection


This last week, I have been playing around with interaction feedback and object selection for my Kinect NUI.
I have used my original concept for gesture feedback. After testing, I realized that having a fourth state (in motion) was excessive and there was too much visual noise being generated. Thus on the top of the screen, there are three possible colors that can show up:

Blue = initial gesture conditions met, gesture started
Green  = gesture recognized and completed
Red = gesture completed, but is not in the list of interactions

For the challenge of multiple objects and object selection, I decided to map the position of the hands on screen and have a trailing line to represent the path of each hand. To selection an object, use any hand, place it near the object and wiggle it in a small circle. To deselect an an object, wiggle your hand at a location away from the selected object. The selected object is signified by the blue glow around it.

I also show the menu system which is activated by the horizontal swipe of the right hand.

Demo Video:


Sunday, December 4, 2011

Self-Evaluation

It has been an interesting journey, looking at my proposal and where I am now.  I am first going to talk about parts of the project I felt i did a good job on and parts that I should have done a better job on.

Here are some things I liked about what I did:

-I liked that I wrote out the definition of interaction (mode+context+gesture) early on. I think it shaped the way
I would approach the entire problem (using motion segmentation and using 3D space as a modifier) in a way that was different from the current existing crop of NUI demos.

-I like that I quickly implemented a rough prototype just using c#. This allow me to become familiar with the Kinect API and understand the abilities of the Kinect at a very early stage.

-Building a motion-segmentation engine early on: This would be the basis of all my advanced motion recognition , and it help a lot that I developed a structure way of reading my motions, so that once I developed a workflow for reading one specific type of motion, I could use the skeleton of that code and port it to the other motion types.

-Iterating through different approaches of motion recognition. I first implemented a live 1-1 mapping. Then I parsed actions after they were completed. Finally I used a combination of the two approaches. By going through many approaches I was able to see the strength + benefits of each one, and it was a very interesting exercise.

Here are somethings I could have done a better job on:

-Focus more on the interaction designs throughout the entire process. Sometimes I became too focused in the implementation and debugging of my code. I would then have to play catch-up designing the interactions. My interactions would have been more fully fleshed out at this stage if I had made a continual focus on this.

-Have a clearer strategy for the actual implementation. Honesty, I should have taken Mubbasir's advice and just used our existing Kinect to Unity plugin. But instead I developed my own and ran into many problems and spent more time working on this than the actual interface. I ultimately did get all the wrappers to sync up with Unity, but my solution was very buggy and led to memory leaks. In the end I used existing plugin because I just didn't want to waste anymore time on debugging memory leaks. If I had paid closer attention to the feasibility of implementation at a early stage, this could have been avoided.

-Bring in users to try out the interface and get feedback throughout my work This is honestly something that I should have done. I might have been afraid to show my work at such an early stage, but with feedback from users, I could have had many new leads to my interaction designs. I realized this during the Beta Review, when I got a lot of great ideas in just a short 20 minute chat with Badler and Joe.

Conclusion
All in all, I felt that this has been a very stimulating project. It has the right mixture of human computer interaction design and coding to suite my tastes. I think that it has been fun to just take a stab at making a NUI interface. Now I realize the shortcomings of a motion-based nui, but also some of the strengths too. From my work and observations, the NUI is not ready at all to supplant the traditional gui, but it also brings whole new functionality that has never existed before. I am really glad that I focused on making my motion framework, because I have a strong base to build other applications on top of the Kinect and I plan to play around with this in the future.

Beta Review Recap and Plans for next few weeks

Last Friday I showed my current working demo to Joe, Norm and Mubbasir. 
I showed my current interactions of Menu/mode selection, zooming, panning and rotation of an 3D object in my  Unity-based interface. From the feedback, I realized that I had a good basis of interaction and motion interpretation, but I really needed to tighten the interactions so that they were intuitive and also easy to perform in 3D space. I had started playing around with the use of space as a "value modifier" for my various interactions, and I really need to fully flesh the utilization of this out in my current interactions. I had some great possible ideas from Badler and Joe, and I plan to implement them in the following weeks.

User feedback again was something that I had a good basis for, but still needed more iterations on. Current, I have implemented a passive feedback using a particle cloud that shows the movement of the user's hand velocities, and basic feedback on whether the user is in interaction space or not. My next steps for user feedback will be to implement my original idea of displaying the states of actions and whether they ended being accepted as gestures.

The good thing is that I am completely down the base motion framework, so the rest of the changes will be focused on the actual design of the interaction gestures and flow of interaction.

Here is my timeline for the next week.

11/4 - 11/7: use Badler's and Joe's input to design a more intuitive rotation gesture. Tighten up constants for zoom and pan gestures.

11/7-11/9: Implement user feedback of current action states.

11//9-11/16: Develop a menu system that will allow users to add different objects in the scene and create new scenes. Allow for a selection system that will allow users to select different objects for editing.

11/17-onward: Work on presentation of work in the video.

Monday, November 28, 2011

Update on motion recognition

So last week, I was looking at my two approaches towards motion recognition (live updating vs waiting for a motion to fully complete). I decided that it was best to combine these two approaches and get the best of both worlds (faster reaction time  and accuracy).

This is the basic workflow of how I recognize motions:

1. A motion has began and is in phase
I start tracking this motion. There is a set "learning period", where the framework records all data of this motion. Once the "learning period" has passed, the framework makes the best guess on what kind of gesture is being performed.

2. We have assigned a gesture to this motion
For every frame, we read in the current motion changes and we update the interface. Here we are basically implementing the live update approach.

3. The motion has been completed
Now, we look back at the complete motion data and use the old method at looking at the entire motion sequence data. We make any necessary adjustments to our gesture and correct the interface changes if the entire motion does not match or initial guess.

So, far this has worked relatively well. Basically what happens is that during the live-update phase, we have a constant update in our interface. Once the motion is complete, the interface makes the final changes and quickly changes to the final/correct position.

This is very comparable to momentum scrolling on current touchscreen smartphones and tablets: when you have your finger on the screen, the scrolling maps one-to-one based on your finger movement. Once you lift, the screen scrolls based on the final acceleration/velocity of your finger movement. Similarily in our approach, we basically have a one-to-one mapping during the actual motion. Once the motion is completed, we correct the change based on the entire motion.

Sunday, November 20, 2011

Challenges with Motion Recognition / Motion-Interaction Mapping

After creating the motion recognition framework for a 3d object editor-like interaction in unity (zoom, panning and rotation) , I have come upon a few interesting choices in approach.

For some brief background, I currently break down all user motions into discrete actions. Each action contains information such as velocity, acceleration, start/end position and duration. I currently have programming two approaches into interpreting this data.

The first is a live approach. The live approach does not wait for an action to end. Rather it checks the current action that is being track (if any). If the current action satisfies certain rules for an interaction (ex. specific start position, duration, must occur in unison with another action), then we start changing the state of the interface based on each new frame update. Here is a quick example of this. In order for a correct rotation gesture, both hands must be above the waist, and the actions of the two hands must start in the roughly the same time. If both initial conditions are true, then I start changing the orientation of my object based on the updated positions of the two hands until the action is complete.

In the second approach, rather than looking at initial conditions of the current action. We look through the list of completed actions. Then we analyze the saved data of the those actions to see if the sequence of actions satisfy any possible interactions.

There are plus and minuses to both approaches. In the live approach, the interaction is very responsive and there is barely any downtime between gesture and a change in the interface; however we also sacrifice accuracy. This approach is dependent on the fact that our read of the initial state is correct, we must make an assumption that if the user begins a specific motion, he will also end it correctly.

The second approach is more accurate. We can look at the entire sequence of actions to ensure that we will match the correct on screen changes. However, there is lag time. What if the user attempts one long motion--we will not be able to process this motion until it is complete, and thus the user will be unable to see any changes in the interface for a relatively long time.

Any thoughts?