So last week, I was looking at my two approaches towards motion recognition (live updating vs waiting for a motion to fully complete). I decided that it was best to combine these two approaches and get the best of both worlds (faster reaction time and accuracy).
This is the basic workflow of how I recognize motions:
1. A motion has began and is in phase
I start tracking this motion. There is a set "learning period", where the framework records all data of this motion. Once the "learning period" has passed, the framework makes the best guess on what kind of gesture is being performed.
2. We have assigned a gesture to this motion
For every frame, we read in the current motion changes and we update the interface. Here we are basically implementing the live update approach.
3. The motion has been completed
Now, we look back at the complete motion data and use the old method at looking at the entire motion sequence data. We make any necessary adjustments to our gesture and correct the interface changes if the entire motion does not match or initial guess.
So, far this has worked relatively well. Basically what happens is that during the live-update phase, we have a constant update in our interface. Once the motion is complete, the interface makes the final changes and quickly changes to the final/correct position.
This is very comparable to momentum scrolling on current touchscreen smartphones and tablets: when you have your finger on the screen, the scrolling maps one-to-one based on your finger movement. Once you lift, the screen scrolls based on the final acceleration/velocity of your finger movement. Similarily in our approach, we basically have a one-to-one mapping during the actual motion. Once the motion is completed, we correct the change based on the entire motion.
Monday, November 28, 2011
Sunday, November 20, 2011
Challenges with Motion Recognition / Motion-Interaction Mapping
After creating the motion recognition framework for a 3d object editor-like interaction in unity (zoom, panning and rotation) , I have come upon a few interesting choices in approach.
For some brief background, I currently break down all user motions into discrete actions. Each action contains information such as velocity, acceleration, start/end position and duration. I currently have programming two approaches into interpreting this data.
The first is a live approach. The live approach does not wait for an action to end. Rather it checks the current action that is being track (if any). If the current action satisfies certain rules for an interaction (ex. specific start position, duration, must occur in unison with another action), then we start changing the state of the interface based on each new frame update. Here is a quick example of this. In order for a correct rotation gesture, both hands must be above the waist, and the actions of the two hands must start in the roughly the same time. If both initial conditions are true, then I start changing the orientation of my object based on the updated positions of the two hands until the action is complete.
In the second approach, rather than looking at initial conditions of the current action. We look through the list of completed actions. Then we analyze the saved data of the those actions to see if the sequence of actions satisfy any possible interactions.
There are plus and minuses to both approaches. In the live approach, the interaction is very responsive and there is barely any downtime between gesture and a change in the interface; however we also sacrifice accuracy. This approach is dependent on the fact that our read of the initial state is correct, we must make an assumption that if the user begins a specific motion, he will also end it correctly.
The second approach is more accurate. We can look at the entire sequence of actions to ensure that we will match the correct on screen changes. However, there is lag time. What if the user attempts one long motion--we will not be able to process this motion until it is complete, and thus the user will be unable to see any changes in the interface for a relatively long time.
Any thoughts?
For some brief background, I currently break down all user motions into discrete actions. Each action contains information such as velocity, acceleration, start/end position and duration. I currently have programming two approaches into interpreting this data.
The first is a live approach. The live approach does not wait for an action to end. Rather it checks the current action that is being track (if any). If the current action satisfies certain rules for an interaction (ex. specific start position, duration, must occur in unison with another action), then we start changing the state of the interface based on each new frame update. Here is a quick example of this. In order for a correct rotation gesture, both hands must be above the waist, and the actions of the two hands must start in the roughly the same time. If both initial conditions are true, then I start changing the orientation of my object based on the updated positions of the two hands until the action is complete.
In the second approach, rather than looking at initial conditions of the current action. We look through the list of completed actions. Then we analyze the saved data of the those actions to see if the sequence of actions satisfy any possible interactions.
There are plus and minuses to both approaches. In the live approach, the interaction is very responsive and there is barely any downtime between gesture and a change in the interface; however we also sacrifice accuracy. This approach is dependent on the fact that our read of the initial state is correct, we must make an assumption that if the user begins a specific motion, he will also end it correctly.
The second approach is more accurate. We can look at the entire sequence of actions to ensure that we will match the correct on screen changes. However, there is lag time. What if the user attempts one long motion--we will not be able to process this motion until it is complete, and thus the user will be unable to see any changes in the interface for a relatively long time.
Any thoughts?
Sunday, November 13, 2011
Motion Recognition Work
Now that I have my base c# framework ported over to unity, I have began work building more advanced motion recognition to power my beta review object editor/navigator application.
There are multiple motion recognition engines that "look for" specific motions. They all take in the same atomic action data from my motion segmenter. The motion recognizers are also turned on/off based on the current state/context and from other motion recognizers. For example, I implemented a motion recognizer that checks to see if you have moved your hands into the "up" position (up from you side and pointed towards the general direction of the screen). Once this state is reached, then my rotation recognizer is activated and will look to is if you rotate you two hands in synchronization like you would if you were rotating a real life object.
Here is a snapshot of my work in progress. I am rotating a 3d cube in 3 dimensions (x,y,z axis) based from the motion controls. If you look on the bottom, your will see the 2 unity icons and a black bar. The unity icon (its a placeholder image for now) shows up if you right/left hand is the "up" position. The black bar represents the distance between your two hands which will affect the sensitivity of your rotation (more about that on the next post). The basic idea is to give on-screen feedback at every step of the interaction.
Look for a video demo once I get the rotation kinks worked out.
There are multiple motion recognition engines that "look for" specific motions. They all take in the same atomic action data from my motion segmenter. The motion recognizers are also turned on/off based on the current state/context and from other motion recognizers. For example, I implemented a motion recognizer that checks to see if you have moved your hands into the "up" position (up from you side and pointed towards the general direction of the screen). Once this state is reached, then my rotation recognizer is activated and will look to is if you rotate you two hands in synchronization like you would if you were rotating a real life object.
Here is a snapshot of my work in progress. I am rotating a 3d cube in 3 dimensions (x,y,z axis) based from the motion controls. If you look on the bottom, your will see the 2 unity icons and a black bar. The unity icon (its a placeholder image for now) shows up if you right/left hand is the "up" position. The black bar represents the distance between your two hands which will affect the sensitivity of your rotation (more about that on the next post). The basic idea is to give on-screen feedback at every step of the interaction.
Look for a video demo once I get the rotation kinks worked out.
Sunday, November 6, 2011
Post-Alpha Review Next Steps
The alpha-review was great in terms in getting feedback on my current progress. There are two main issues/areas of focus that have been repeatedly emphasized by my reviewers:
1. Gesture Recognition is not trivial and can be challenge.
2. You should define an application/use-case early on and adjust your motion recognition accordingly.
These are two really good points and I definitely agree with them.
I'm currently bringing my existing C# test framework into Unity, which will my final production code. This has taken sometime, but I have finally resolved many issues regarding wrappers, dlls and Unity's inability to use the Microsoft .net 4.0 framework (the framework that the Kinect SDK utilizes).
I found this trick online http://www.codeproject.com/KB/dotnet/DllExport.aspx and basically had to play around with lots of compile settings both in C# and C++ for this to work.
Now getting back to addressing my main two challenges, I have decided that my initial use case for the beta-review will be a spatial interaction environment for 3D objects. Thus in simple terms, the user should be able to move and rotate an object in 3D space while also being able to change his/her view point. Think of a the cis 277 object editor but in unity and with voice and gesture controls.
Probably one of the most important aspects to gesture recognition doesn't actually have to pertain directly to the recognition of motion, it has to do with user feedback. How does a user know if his gesture has been registered? How does the user even know if the interface is listening for input in the first place?
This user feedback has been implemented religiously in most traditional gui interfaces (good ones that is). Hover your mouse over a button--That button now will change color/opacity/shape letting the user know that the button is listening. Move your mouse out and the button will return its previous state, letting the user know that the button is no longer listening to your mouse click. Finally if you click the button, the button will change its visual state once more to signify that your action has been successfully recorded.
In order for successfully gesture recognition, this same interaction flow must be replicated in our NUI. If our user does not know that his action is being recorded, then they will madly wave back and forth which will leader to further misinterpretation by our recognition engine. Furthermore, the user must be told if his gesture/motion has been correctly recognized if it has been ignored because our engine is enable to parse it.
Since it does not make sense to have button feedback (the whole point of a NUI is to remove the mouse-pointer paradigm) and pop up dialog boxes are intrusive, I've decided to utilize a border highlight that displays the feedback. The response is coded in the color of the border. The color will then fade away after the feedback is shown.
My rationale for this:
1. A color border is non-obtrusive but yet has enough global visual scale to reach the attention of the suer.
2. The differentiation between recorded gestures and actually completed gestures allows the use of gestures that formed from the build up of many atomic gestures. Thus as we build up to these complex gestures, it is still good to know that our sequence of actions is still being recorded.
3. The initial "on-phase" provides feedback to the user that he is in the interaction space and all his motions are currently being watched.
This mode of feedback is inspired from Alan Cooper's concept of modeless feedback from his book on interaction design, About Face.
1. Gesture Recognition is not trivial and can be challenge.
2. You should define an application/use-case early on and adjust your motion recognition accordingly.
These are two really good points and I definitely agree with them.
I'm currently bringing my existing C# test framework into Unity, which will my final production code. This has taken sometime, but I have finally resolved many issues regarding wrappers, dlls and Unity's inability to use the Microsoft .net 4.0 framework (the framework that the Kinect SDK utilizes).
I found this trick online http://www.codeproject.com/KB/dotnet/DllExport.aspx and basically had to play around with lots of compile settings both in C# and C++ for this to work.
Now getting back to addressing my main two challenges, I have decided that my initial use case for the beta-review will be a spatial interaction environment for 3D objects. Thus in simple terms, the user should be able to move and rotate an object in 3D space while also being able to change his/her view point. Think of a the cis 277 object editor but in unity and with voice and gesture controls.
Probably one of the most important aspects to gesture recognition doesn't actually have to pertain directly to the recognition of motion, it has to do with user feedback. How does a user know if his gesture has been registered? How does the user even know if the interface is listening for input in the first place?
This user feedback has been implemented religiously in most traditional gui interfaces (good ones that is). Hover your mouse over a button--That button now will change color/opacity/shape letting the user know that the button is listening. Move your mouse out and the button will return its previous state, letting the user know that the button is no longer listening to your mouse click. Finally if you click the button, the button will change its visual state once more to signify that your action has been successfully recorded.
In order for successfully gesture recognition, this same interaction flow must be replicated in our NUI. If our user does not know that his action is being recorded, then they will madly wave back and forth which will leader to further misinterpretation by our recognition engine. Furthermore, the user must be told if his gesture/motion has been correctly recognized if it has been ignored because our engine is enable to parse it.
Since it does not make sense to have button feedback (the whole point of a NUI is to remove the mouse-pointer paradigm) and pop up dialog boxes are intrusive, I've decided to utilize a border highlight that displays the feedback. The response is coded in the color of the border. The color will then fade away after the feedback is shown.
Initial "recording feedback". The user has stepped into the interaction space.
User action has been, recorded and saved. However, it might be of a longer gesture sequence so the entire gesture is not complete yet and we are waiting for more motions.
User action or sequence of actions has recognized and we have updated the state of the interface to correspond with this.
Current action or current sequence of actions cannot be recognized as a specific command. The sequence/action has been deleted from queue. Start the action again from the beginning.
My rationale for this:
1. A color border is non-obtrusive but yet has enough global visual scale to reach the attention of the suer.
2. The differentiation between recorded gestures and actually completed gestures allows the use of gestures that formed from the build up of many atomic gestures. Thus as we build up to these complex gestures, it is still good to know that our sequence of actions is still being recorded.
3. The initial "on-phase" provides feedback to the user that he is in the interaction space and all his motions are currently being watched.
This mode of feedback is inspired from Alan Cooper's concept of modeless feedback from his book on interaction design, About Face.
Subscribe to:
Posts (Atom)