Archive for July 2014
[This post was originally written in April 2014, but is only now getting posted in July 2014. Hobby projects are like that!]
Hacking stuff, that is! Holofunk stuff, to be precise! Working code, to be sure!
I’ve drifted into posting more on Facebook and on one particular forum of my long acquaintance, so now I get to be a bit lazy and sum up those scraps of news a little more centrally and with more detail.
In January I got the lead out and ported Holofunk to the new Kinect. This required first porting it to x64. The main casualty there was that VST support — sound effect plugin support — stopped working, for no reason I could sort out after a couple of nights. That’s the maximum length of time I can spend on any blocking issue before I start looking for a workaround. So now I’m just using the sound effects built into the BASS audio library, which are quite sufficient for the time being.
I did get it working again, and modulo a couple of performance issues that I’m discussing with them, it’s pretty stunning. I was able to get green-screened color video working in almost no time based on their sample code, for multiple players; this was impossible with the first Kinect.
Hand Pose, At Last
Much more importantly, though, I started getting hand pose data. And the hand pose data is fast and reliable. There are only three hand poses supported — open (all fingers spread), pointing (one or two fingers pointing, the rest in a fist), and closed (a plain fist). It takes a little getting used to, as far as making rapid and clear transitions from one pose to another; but with just a little practice it gets very fluid.
Three hand poses is kind of like having a mouse that has a three-position switch on it… it’s not a whole lot to work with, but it’s enough. I started brainstorming gestural interfaces, and my wife MIchelle helped me take some notes:
[picture of HF brainstorming notes]
The basic idea is this:
- You open your hand to get the app’s attention. (This is “armed” state, internally.)
- You can then close your hand to start recording yourself; as long as your hand is closed, the recording continues. When you open your hand, you “drop” the recording at that spot on the screen.
That’s the most basic interaction: make a fist to record a loop, then let go to play it.
- You can also point, to enter “pointing mode.” Basically, each hand has its own “pointing mode” that determines what will happen when that hand points.
- The default “pointing mode” is “mute/unmute.” In this mode, you point at a sound or group of sounds, and you make a fist to mute them, or open your hand to unmute them. If you mute some muted sounds, they get deleted altogether. This gives you the ability to bring loops in and out.
- Another “pointing mode” is “sound effects.” In this mode, you point at a sound or group of sounds, and then you move your hand up/down/left/right to apply one of four sound effects (one per direction). I prototyped this interface with Holofunk 1.0 and it works OK, so I’m bringing that forwards.
- There can be multiple “sound effects” modes with different combinations of sound effects.
I’ve implemented the “mute/unmute” behavior and it’s pretty incredible — the hand recognition is fast enough that you really feel like you’re grabbing a bunch of sounds and then shushing them by squeezing them, then opening your hand again to bring them back to life.
So how do you change modes? My main insight here was that I wanted some kind of “chord” gesture — in a conventional interface you’d have shift-click, or control-click, or something. So what could be a modifier for the pointing gesture? I had already implemented radial popup menus, I just needed a way to invoke them.
What I came up with was to use body pose as a modifier. Specifically, if you put your hand on your hip (akimbo, in other words), then when you point with the other hand, you get a popup menu that lets you pick the pointing mode for that hand. So you just put your hand on your hip, point your other hand at “effect mode”, grab that menu item, and now that other hand is in effect mode. It’s natural and feels quite good. Putting your hand behind your back (rather than elbow-out akimbo) means you’ll get the system popup menu, with commands like “Delete all” and “Change tempo”.
Body pose is modal. This is your NUI koan for the day.
This combination of hand pose (for pointing and picking), body pose (for modifying that pointing/picking), and per-hand interaction mode means that the interface is truly ambidextrous: both hands can perform independent gestures simultaneously. You could have one hand applying reverb/flange/chorus/delay and the other hand applying volume/pan, or one hand muting and unmuting while the other hand tweaks sound effects, or whatever you like.
Right now I have the popup menus coming up, but not interacting properly — some minor issue, I think. Will be fixing that very soon.
My current code has a nice little hierarchical state machine for the per-hand interaction, so I have two independent state machines, one per hand. Previously, an event — such as the user pointing — would always cause a transition to a fixed new state. But in the new interface, pointing while the other hand is on the hip should bring up a popup menu; pointing while in mute/unmute mode should enter mute/unmute state; and so forth.
All I needed to implement this, it turned out, was a “computed transition” that would run some code to determine a target state, rather than using a fixed target state. This was a very simple thing to add, and wound up perfectly expressing both pointing modes and body-pose modes.
Now, having an ambidextrous, bilateral interface is all very well, but not all interactions involve only one hand. Some interactions want two. For example, using two hands to drag out a selection rectangle for sound grouping. Or, dragging out a time distortion envelope for time mapping.
I have two independent state machines, one per hand. Fine if the hands are independent, but what if they’re not? Do both hands need to be in the same state, somehow? How do you coherently use two state machines for one interaction? It all just felt wrong and ugly and hacky, a sure sign that I needed to sleep on it some more. When this project gets stalled, it’s either because I don’t have a working brain cell left, or because I haven’t got a clear simple picture of how it should work. And I don’t have the spare time to write vague code and then have to debug it!
I plan to build a state machine hierarchy in which a “body” state machine can look at both hands, and if it does not want to consume the state of both hands in a two-handed way, it can delegate the state of each hand to a lower-level, per-hand state machine. State machine delegation, in other words. I think it will work well… once I get there.