Mixed-Reality Development

Considerations in Building a Gesture-based 3D User Interface

author01 Gautam Bhatnagar March 30 15 min read


One of the toughest problems to tackle in virtual and augmented reality is how to allow the user, in the real world, to navigate and control the virtual world comfortably and with minimal fuss. The solutions, on the hardware side, range from simple remotes and game controllers, all the way to body suits, which are capable of tracking a user’s entire body.

The Oculus Rift debuted with a straightforward and elegant solution; an Xbox 360 game controller. Considering that its primary use case is gaming, it works quite effectively. A user can navigate menus and control in-game characters and objects in a familiar manner, just as they would in a typical, non-VR video game.

However, while it might be easy to use, the Xbox controller doesn’t do a particularly good job of giving the user presence in the virtual world. There is no approximation of the user’s hands and fingers. They are stuck holding a controller, and that too, one that doesn’t exist in the virtual world. It breaks immersion, and is rather limited in utility.

Oculus addressed these issues by recently releasing the Oculus Touch controllers. Instead of a single gamepad, there are now two controllers, one for each hand. Most importantly, the controllers are fully tracked and have avatars in the virtual world. If you place a controller on the ground in the real world, you also do so to the virtual representation of the controller. Oculus also approximated grabbing objects by gripping triggers on the controllers. Their rival, HTC, includes similar controllers in their flagship product, the Vive. All of this gives the user a much stronger interaction with the virtual world, and even the feeling that their hands are actually present in it.

The Advent of Body Tracking

The Leapmotion controller takes this concept a step farther. It fully tracks both of the user’s arms and hands. As a result, one can see their hands in the virtual world, which is an incredibly immersive feeling. Rather than push a joystick around or squeeze a pair of triggers, one can actually make a fist to grab something in the virtual world in a remarkably natural manner. Unfortunately, haptic feedback is entirely missing, as is the feeling of weight. One can effortlessly lift a building in VR with their pinky, which, while entertaining, is unrealistic.

Nevertheless, at IMMERS3D, we believe that gesture tracking systems like Leapmotion, Microsoft Hololens, Intel RealSense and Google Tango, are the future. Perhaps surprisingly, the majority of first-time users of VR demoing our software responded better to using Leapmotion to navigate the virtual world than the controllers from Oculus and HTC. There are no buttons to learn, and virtual objects respond to the user’s hands exactly as they expect them to, very similarly to the contrast between a mouse and keyboard versus the touchscreen of a phone or tablet.

However, it is very much worth noting that Leapmotion is far from perfect, and has drawn criticism from the VR community for its usability. While it can be blissfully intuitive when it works, its tracking is far from perfect. Positional accuracy is sometimes lacking, as is its ability to properly distinguish fingers and even the user’s hands in certain cases. When it comes to immersion in virtual reality, accuracy is critical. Doubly so when the user’s hands are also an input device. When the virtual hand bounces around or disappears due to poor tracking, it can be jarring and extremely frustrating when trying to grab an object or select a menu button.

Tracking Limitations

We’re now going to examine why the hand tracking can break, and discuss some software and UI techniques to prevent it from doing so.


Let’s begin by looking at the constraints of the Leapmotion, and of the VR headset, the Oculus Rift DK2 in this example. (Most currently retailing headsets have very similar parameters) The Oculus Rift has an FOV (field of view) of roughly 100 degrees. The Leapmotion tracking system has an FOV of 135 degrees. [1] This means that, fortunately, your hands will always be visible in the virtual space as long as you are looking at them, since they can be tracked even beyond the edges of your effective vision. There is one caveat, however, and that is that they must also be within range of the Leapmotion’s tracking distance. While the Leapmotion can “see” a wide angle, it can’t see that far. Its effective range is one inch to 2 feet, with tracking being increasingly less accurate at the extremes. This poses a problem for two reasons.

First, the average person’s arm length is 25 inches, or just beyond the Leapmotion’s recommended tracking range. [2] This means that when one stretches their arm out, there is a high likelihood that the Leapmotion will either poorly track their hand, or fail to track it altogether. Taller users might run into problems even without extending their arms fully.

Second, Oculus recommends objects of interest in the virtual space to be placed a minimum of 30 inches from the user. Anything less than this can cause eye-strain. In fact, they recommend UI elements to be placed nearly 100 inches from the user. [3] This leaves us with the dilemma in which an object or UI element is either going to be annoyingly close to the user, or that it will be too far out of reach for the user to interact with.

Designing a Robustly-Tracked UI

Not taking these constraints into consideration practically guarantees tracking issues, and is, in the author’s humble opinion, why the Leapmotion sometimes gets an undeserved bad rap. It can be remarkably accurate and robust when designed for properly. The goal is to place UI elements and other interactive objects in such a manner that the user’s hands will naturally remain within the Leapmotion’s tracking range and ideally, as close to the user’s eye level as possible. Additionally, to reduce eyestrain, these elements should not permanently be placed close to the user. The user should be able to hide them or even walk away from them when they aren’t being used.

One of the most commonly proposed and demoed approaches is to simply place menus and buttons on the user’s wrists and hands themselves, so that they interact with them just as if they were checking the time on their watch. This solves all three problems rather elegantly; they will always by definition be within arm’s reach, they will strongly tend to be within the sensor’s range as the user brings their wrist up to eye level to look at it (or look down at their wrist), and, the user can hide the menus simply by putting their arms down. The drawbacks are that it can be tiring to hold both hands up for prolonged periods of time, and that it is difficult to place large menus and other elements on the limited space of the user’s arm. Nevertheless, it is an effective technique for simpler controls, and one that we will likely see used more often in the future.

Another option is to anchor the UI to the user’s head rather than their arms, to effectively create a virtual HUD. While this forcibly places the UI within the tracking limits and at the user’s eye level, there are several problems with this approach. It will likely cause eyestrain, being permanently affixed at a close distance to the user’s eyes. Worse still, static HUDs and loading screens break immersion and cause nausea, sometimes severely, due to them not behaving consistently with the rest of the environment. Your mind expects an object to stay in one place in the virtual world, not to follow your face.

It is important to note that dashboards and cockpits, such as in cars and planes, often have the opposite effect. They provide a reference point, and often alleviate nausea, and help the user feel oriented. The major distinction here is that a dashboard is not affixed to the user’s body in any way. Rather, the user is inside it, and is moving with it.

Putting it All Together

At IMMERS3D, we favor mixing the HUD and cockpit concepts. Rather than permanently affixing our UI to the user’s head, we initially place it at the user’s eye level immediately after they put on the headset, and at a distance which matches their arm’s length. The UI is effectively tailor-fit for the specific user, and placed at a comfortable location for them, whether they are tall or short, or have long arms or short ones. However, it stays put in this location and does not follow the user around. The user can lean away from it, towards it, or even walk away from it if they so choose.

With the entire 3D space at our disposal, we could place buttons in the air, at waist level, or even on the ground, as long as they are within tracking range. However, it is also very important to consider ergonomics with this level of design freedom. According to ANSI, a computer shouldn’t exclusively have a touch screen as its input, as it can cause carpal tunnel syndrome and other chronic pain, and should have multiple input methods. [4] In the case of a laptop or desktop, the keyboard and mouse prevent the user from having to use the touch screen for a prolonged period.


While the UI in “Minority Report” looked cool, it would have a host of ergonomic problems in the real world. [5]


Our solution is to place less-frequently used controls, such as those for changing scenes, at eye level. These are designed to be visually appealing, thus displayed more prominently, but not touched as often. Controls within a given scene, such as buttons to play, pause and seek through a video, are better off at waist level, and just within the user’s field of view, almost mimicking a floating keyboard. The user can interact with these controls with their arms relaxed, leading to a much more comfortable experience.

Finally, we still have the issue of the UI elements persisting in the scene. If the user is watching a 360 video, or exploring a 3D environment, buttons and menus can add unnecessary clutter. On a computer, one might minimize an unused window, and on a phone they might swipe it away. With a fully 3D scene, and accurate tracking of the user’s body, though, we can do better. We can show and hide certain elements, such as video controls, depending on whether the user’s arms are in tracking range. The user can make a menu appear when raising their hand, and make it disappear by relaxing it.

With these methods, we place the UI elements in a position that is both comfortable for the user and within a range that the user’s hands will be tracked accurately. We also only display them when they are relevant, allowing the user to enjoy the actual content, rather than being bogged down by the user interface. Taking these factors into consideration can make the difference between a situation with poor hand tracking, arm and eye strain for the user, and a comfortable, easy to use user interface.


This level of personalization and responsiveness to the user is a very powerful effect of being able to accurately track their head, arms and hands. Immersion is critical to a VR or AR experience, and body tracking dramatically aids in the sensation of presence. Eye tracking will make user interaction even more seamless and natural, allowing for menus and objects to appear and disappear depending on where the user is looking, leading to very exciting possibilities.

IMMERS3D believes that intimate and natural user interfaces are the future, and we are focused on making them a reality today. While we are still in the early days of such technology, thoughtful design and engineering can harness them to provide incredible utility. We would love to hear from you if you are just as excited about the future of VR and AR as we are!