LISTA DE SÍMBOLOS
9. RELÉ DE PROTECCIÓN DE GENERADORES POR PÉRDIDA DE SINCRONISMO DE SINCRONISMO
9.6. Estabilidad del sistema
9.6.1. Ayudas a la estabilidad
Another option for applying gestural interaction are cameras. In this way, users do not have to touch a device at all, but their actions are recognized by computer vision. Camera-based solutions can track multiple body points and multiple users at once. However, especially marker-less camera tracking is usually less accurate than device-based tracking and one major issue is the so-called Midas Touch problem. As there are no buttons to press in this kind of interaction, there is no straightforward way to distinguish whether a body movement is meant as an input to the system or not.
4
http://cwonline.com/store/view_product.asp?Product=1179 (accessed 2015- 9-8); image source: http://commons.wikimedia.org/wiki/File:P5_in_use.jpg
(accessed 2015-9-8)
Cameras Types
There exist different camera types depending on the wavelength range they capture or the information they provide. Color cameras capture images in a wavelength range similar to human vision, but usually with the important difference, that no depth perception is given (monocular vision). They can be appropriate to track objects with a specific color, e.g. hands/faces with skin color, or color patterns, e.g. QR codes. Depth information for tracked objects can be obtained with triangulation when using a stereo camera pair. Another important camera type are infrared cameras that capture a wavelength range out of human vision. This can be useful to track objects that are marked with a color that should not be seen by humans or does not occur often in the environment. The effect can be emphasized by using infrared reflective materials that are irradiated by infrared LEDs.
Figure 2.4: Depth image construction using the structured light principle
According to the structured light principle, infrared cameras can further be used to capture an infrared pattern invisible to the human eye that is projected onto the environment. By investigating the distortion of this pattern, distances from a part of the environment to the camera can be estimated as it is done by the first generation Kinect (seeFigure 2.4). This also leads to another type of cameras: depth cameras. Depth cameras provide depth perception with an image, in which every pixel describes the distance from the camera to the object hit by the corresponding projection ray (see Figure 2.4 on the right-hand side). There are different ways to capture such a depth image.
Besides the aforementioned structured light principle, another impor- tant way to capture the depth image is the time-of-flight priniciple, which was the de facto standard before the release of the Microsoft Kinect for Xbox 360. Time-of-flight cameras measure the time it takes for a light sig- nal to return from an object it hits to the camera. Based on the known speed of light, the distance to the object can be calculated. This prin- ciple is used in the second generation Kinect (Microsoft Kinect for Xbox One6 and for Windows v2, see Figure 2.5). Of course, the two presented
Figure 2.5: Microsoft Kinect for Xbox One7
depth camera types have existed before the release of the Kinect. How- ever, they never have provided a similar high resolution while being sold at such a low price and in such large amounts. Before the Kinect, technology that supported similar features cost many times the price of the Kinect while requiring a complex setup and configuration and often even providing lower accuracy. This is why such technology was only available for research prototypes. On the other hand, previous consumer products, such as the Nintendo Wii Remote, only provided much simpler data and interaction as described above. Only the Kinect made depth cameras and corresponding full body interaction available for every home. A third type of depth camera uses a stereo-camera pair and analyzes the two camera images to construct
6
http://xbox.com/kinect(accessed 2015-9-15)
7Image source:
http://commons.wikimedia.org/wiki/File:Xbox-One-Kinect. jpg(accessed 2015-9-8)
a depth image by matching image parts and triangulation, similar as it is done in human stereoscopic vision. Often, stereo-cameras additionally use LEDs or other light sources to illuminate the scene with pattern-less (IR) light. An example of a stereo-camera depth sensor is the LEAP Motion Controller. The three mentioned depth camera types are summarized with examples, advantages and disadvantages inTable 2.2 (cf. [12, 89, 108]):
Table 2.2: Types of depth cameras (Abbreviations: res. = resolution; accy. = accuracy; dist. = distances; esp. = especially)
Camera Type
Examples Advantages Disadvantages
Structured- light
Microsoft Kinect for Xbox 360 / Windows v1, Asus Xtion PRO, PrimeSense Carmine, Intel RealSense F200 high image res., high depth res. for lower dist., low cost shadow artifacts, problems w/ dull, shiny, small, or under-sharp-angle surfaces, low accy. esp. at farther dist., low depth res. at farther dist. Time-of-
flight
Microsoft Kinect for Xbox One / Windows v2, SoftKinetic DS325, Mesa Imaging SR4000, Bluetech- nix Argos3D P320 high accy., depth res. & frame-rates
low image res., high cost, problems w/ sharp edges, motion artifacts, high power requirements Stereo camera tri- angulation LEAP Motion Controller, VisLab 3DV-A, Stereolabs ZED high frame-rates & image res.
low accy. esp. for farther dist. & regions w/ homo- geneous color, often no complete depth image
All three depth camera types usually capture IR light to be unobtrusive. They all capture a depth image of their whole field-of-view at a certain point in time and with a frame-rate high enough to capture common human motions. Especially the cameras using structured-light are mostly suited for
indoor use, as they can be disturbed by sun light or other IR light sources. They can further have problems with materials that reflect or absorb the used IR light range in an extreme way. Apart from those three depth camera types, there exist several other approaches such as modulated phase-shifting [28] or triangulation with multiple laser beams [141], however, the three mentioned types are the most commonly found depth camera types for full body interaction, while the others are still in development, not suitable for full body interaction, or not available for end consumers in general.
Different camera types can be combined to benefit from their different advantages. For this purpose, it is helpful to register the viewing frustums of the used cameras to know, e.g. which depth pixel belongs to which color pixel and vice versa, when using a setup with a depth and color camera. This technique is also used by the Kinect.
User Tracking
Figure 2.6: Left: IR reflective markers attached to legs8;
Right: Face detection in a color image9
One can distinct between marker-based and marker-less user tracking. Markers are objects that have a distinct shape or material properties (in- frared reflective, special color or visual pattern), which makes them easy to
8Image source: http://commons.wikimedia.org/wiki/File:Kistler_plates.jpg (accessed 2015-9-8) 9Image source: http://commons.wikimedia.org/wiki/File:Face_detection.jpg (accessed 2015-9-8)
track in image streams for computers. Therefore, the computer first knows the two dimensional position of the marker in the camera image. Under certain conditions, it can further be possible to calculate the orientation or distance of the object, based on how it is warped in the camera image. When markers are placed on specific body parts, the motions of those body parts can be tracked and used for interaction (see Figure 2.6 on the left- hand side). Without markers, computers can still track important parts of a color image, e.g. by detecting skin color (see Figure 2.6 on the right- hand side). However, in this case it is less easy to calculate distances, as the actual size of tracked objects (here: faces) is unknown and can only be roughly estimated. When using a a stereo-camera pair, triangulation can be applied to calculate the distance of image regions that have been matched between the two cameras [30].
Figure 2.7: User tracking in a depth image
Depth images make it easier to recognize three dimensional shapes in the image, which allows to track persons in the field-of-view and their spe- cific joint configurations (see Figure 2.7). Depending on the technology, different information is provided by the tracking system. The most basic
information is the position of different body parts, either two dimensional in the camera image, or three dimensional in real world space. Further- more, the orientation of certain body parts may be provided, which can be calculated out of the positional information under certain conditions, e.g. for intermediate joints. One frame of user tracking data describes the current configurations of users’ body parts, which need to be interpreted to recognize certain postures. The differences of those configurations between two or more frames describe the movements of users’ body parts, and need to be interpreted to recognize gestures.