Considerations in Building a Gesture-based 3D User Interface
15 min read
One of the toughest problems to tackle in virtual and
augmented reality is how to allow the user, in the real world, to navigate and
control the virtual world comfortably and with minimal fuss. The solutions, on
the hardware side, range from simple remotes and game controllers, all the way
to body suits, which are capable of tracking a user’s entire body.
The Oculus Rift debuted with a straightforward and elegant
solution; an Xbox 360 game controller. Considering that its primary use case is
gaming, it works quite effectively. A user can navigate menus and control in-game
characters and objects in a familiar manner, just as they would in a typical,
non-VR video game.
However, while it might be easy to use, the Xbox controller
doesn’t do a particularly good job of giving the user presence in the virtual world.
There is no approximation of the user’s hands and fingers. They are stuck
holding a controller, and that too, one that doesn’t exist in the virtual
world. It breaks immersion, and is rather limited in utility.
Oculus addressed these issues by recently releasing the
Oculus Touch controllers. Instead of a single gamepad, there are now two
controllers, one for each hand. Most importantly, the controllers are fully
tracked and have avatars in the virtual world. If you place a controller on the
ground in the real world, you also do so to the virtual representation of the
controller. Oculus also approximated grabbing objects by gripping triggers on
the controllers. Their rival, HTC, includes similar controllers in their
flagship product, the Vive. All of this gives the user a much stronger
interaction with the virtual world, and even the feeling that their hands are
actually present in it.
The Advent of Body Tracking
The Leapmotion controller takes this concept a step farther. It fully tracks both of the user’s arms and hands.
As a result, one can see their hands in the virtual world, which is an
incredibly immersive feeling. Rather than push a joystick around or squeeze a
pair of triggers, one can actually make a fist to grab something in the virtual
world in a remarkably natural manner. Unfortunately, haptic feedback is
entirely missing, as is the feeling of weight. One can effortlessly lift a
building in VR with their pinky, which, while entertaining, is unrealistic.
Nevertheless, at IMMERS3D, we believe that gesture tracking
systems like Leapmotion, Microsoft Hololens, Intel RealSense and Google Tango,
are the future. Perhaps surprisingly, the majority of first-time users of VR demoing
our software responded better to using Leapmotion to navigate the virtual world
than the controllers from Oculus and HTC. There are no buttons to learn, and
virtual objects respond to the user’s hands exactly as they expect them to,
very similarly to the contrast between a mouse and keyboard versus the
touchscreen of a phone or tablet.
However, it is very much worth noting that Leapmotion is far
from perfect, and has drawn criticism from the VR community for its usability.
While it can be blissfully intuitive when it works, its tracking is far from
perfect. Positional accuracy is sometimes lacking, as is its ability to
properly distinguish fingers and even the user’s hands in certain cases. When
it comes to immersion in virtual reality, accuracy is critical. Doubly so when
the user’s hands are also an input device. When the virtual hand bounces around
or disappears due to poor tracking, it can be jarring and extremely frustrating
when trying to grab an object or select a menu button.
We’re now going to examine why the hand tracking can break,
and discuss some software and UI techniques to prevent it from doing so.
Let’s begin by looking at the constraints of the Leapmotion,
and of the VR headset, the Oculus Rift DK2 in this example. (Most currently retailing
headsets have very similar parameters) The Oculus Rift has an FOV (field of
view) of roughly 100 degrees. The Leapmotion tracking system has an FOV of 135
degrees.  This means that, fortunately, your hands will always be
visible in the virtual space as long as you are looking at them, since they can
be tracked even beyond the edges of your effective vision. There is one caveat,
however, and that is that they must also be within range of the Leapmotion’s
tracking distance. While the Leapmotion can “see” a wide angle, it can’t see
that far. Its effective range is one inch to 2 feet, with tracking being
increasingly less accurate at the extremes. This poses a problem for two
First, the average person’s arm length is 25 inches, or just
beyond the Leapmotion’s recommended tracking range.  This means
that when one stretches their arm out, there is a high likelihood that the
Leapmotion will either poorly track their hand, or fail to track it altogether.
Taller users might run into problems even without extending their arms fully.
Second, Oculus recommends objects of interest in the virtual
space to be placed a minimum of 30 inches from the user. Anything less than this
can cause eye-strain. In fact, they recommend UI elements to be placed nearly
100 inches from the user.  This leaves us with the dilemma in
which an object or UI element is either going to be annoyingly close to the
user, or that it will be too far out of reach for the user to interact with.
Not taking these constraints into consideration practically
guarantees tracking issues, and is, in the author’s humble opinion, why the
Leapmotion sometimes gets an undeserved bad rap. It can be remarkably accurate
and robust when designed for properly. The goal is to place UI elements and
other interactive objects in such a manner that the user’s hands will naturally
remain within the Leapmotion’s tracking range and ideally, as close to the
user’s eye level as possible. Additionally, to reduce eyestrain, these elements
should not permanently be placed close to the user. The user should be able to
hide them or even walk away from them when they aren’t being used.
One of the most commonly proposed and demoed approaches is
to simply place menus and buttons on the user’s wrists and hands themselves, so
that they interact with them just as if they were checking the time on their
watch. This solves all three problems rather elegantly; they will always by
definition be within arm’s reach, they will strongly tend to be within the
sensor’s range as the user brings their wrist up to eye level to look at it (or
look down at their wrist), and, the user can hide the menus simply by putting
their arms down. The drawbacks are that it can be tiring to hold both hands up
for prolonged periods of time, and that it is difficult to place large menus
and other elements on the limited space of the user’s arm. Nevertheless, it is
an effective technique for simpler controls, and one that we will likely see
used more often in the future.
Another option is to anchor the UI to the user’s head rather
than their arms, to effectively create a virtual HUD. While this forcibly
places the UI within the tracking limits and at the user’s eye level, there are
several problems with this approach. It will likely cause eyestrain, being
permanently affixed at a close distance to the user’s eyes. Worse still, static
HUDs and loading screens break immersion and cause nausea, sometimes severely,
due to them not behaving consistently with the rest of the environment. Your
mind expects an object to stay in one place in the virtual world, not to follow
It is important to note that dashboards and cockpits, such
as in cars and planes, often have the opposite effect. They provide a reference
point, and often alleviate nausea, and help the user feel oriented. The major
distinction here is that a dashboard is not affixed to the user’s body in any
way. Rather, the user is inside it, and is moving with it.
At IMMERS3D, we favor mixing the HUD and cockpit concepts.
Rather than permanently affixing our UI to the user’s head, we initially place
it at the user’s eye level immediately after they put on the headset, and at a
distance which matches their arm’s length. The UI is effectively tailor-fit for
the specific user, and placed at a comfortable location for them, whether they
are tall or short, or have long arms or short ones. However, it stays put in
this location and does not follow the user around. The user can lean away from
it, towards it, or even walk away from it if they so choose.
With the entire 3D space at our disposal, we could place
buttons in the air, at waist level, or even on the ground, as long as they are
within tracking range. However, it is also very important to consider
ergonomics with this level of design freedom. According to ANSI, a computer
shouldn’t exclusively have a touch screen as its input, as it can cause carpal
tunnel syndrome and other chronic pain, and should have multiple input methods.
 In the case of a laptop or desktop, the keyboard and mouse
prevent the user from having to use the touch screen for a prolonged period.
While the UI in “Minority Report” looked cool, it would
have a host of ergonomic problems in the real world. 
Our solution is to place less-frequently used controls, such
as those for changing scenes, at eye level. These are designed to be visually
appealing, thus displayed more prominently, but not touched as often. Controls
within a given scene, such as buttons to play, pause and seek through a video,
are better off at waist level, and just within the user’s field of view, almost
mimicking a floating keyboard. The user can interact with these controls with
their arms relaxed, leading to a much more comfortable experience.
Finally, we still have the issue of the UI elements
persisting in the scene. If the user is watching a 360 video, or exploring a 3D
environment, buttons and menus can add unnecessary clutter. On a computer, one
might minimize an unused window, and on a phone they might swipe it away. With
a fully 3D scene, and accurate tracking of the user’s body, though, we can do
better. We can show and hide certain elements, such as video controls,
depending on whether the user’s arms are in tracking range. The user can make a
menu appear when raising their hand, and make it disappear by relaxing it.
With these methods, we place the UI elements in a position
that is both comfortable for the user and within a range that the user’s hands
will be tracked accurately. We also only display them when they are relevant,
allowing the user to enjoy the actual content, rather than being bogged down by
the user interface. Taking these factors into consideration can make the
difference between a situation with poor hand tracking, arm and eye strain for
the user, and a comfortable, easy to use user interface.
This level of personalization and responsiveness to the user
is a very powerful effect of being able to accurately track their head, arms
and hands. Immersion is critical to a VR or AR experience, and body tracking
dramatically aids in the sensation of presence. Eye tracking will make user
interaction even more seamless and natural, allowing for menus and objects to
appear and disappear depending on where the user is looking, leading to very
IMMERS3D believes that intimate and natural user interfaces
are the future, and we are focused on making them a reality today. While we are
still in the early days of such technology, thoughtful design and
engineering can harness them to provide incredible utility. We would love to
hear from you if you are just as excited about the future of VR and AR as we