2 Asociado a la Unidad de Competencia: - BOLETÍN OFICIAL DEL ESTADO

Instrumented approaches are more suitable than uninstrumented ones, especially for the purpose of identity-detection. For PGM, the IWARD robot can take the advantage of numerous commercial and research oriented indoor positioning systems that are available to identify, track and localise humans in real time. IR [175], US [179] and vision-based [185] positioning systems require line of sight between the tag and the reader. In an Audible Sound Positioning System, [189] sound can be interfered by other sound noises in the dynamic changing and crowded public place. Although Magnetic positioning systems do not require line-of sight they have a limited coverage range (3 m).

RF based positioning systems seem the obvious choice as they do not require line of sight and they can cover larger area than any other systems. Although, a WLAN-based positioning system reuses the existing WLAN infrastructures, which lowers the cost of indoor localisation, their accuracy is only 1.65 m out of a 312 m2 area [167]. Similarly the Bluetooth based Topaz is very expensive and accuracy lies between 2 m to 3 m. An active-type RFID based absolute localisation system such as WhereNet [181] based on TDOA has only 2 m to 3 m accuracy. Thus, with this accuracy alone, the robot cannot approach the target person and continue its guide operation. Ubisense [184] uses TOA jointly with UWB technology and offers higher accuracy among all these technologies.

However, they are very expensive. Unfortunately, all these existing systems that provide absolute positioning information are often not a practical solution for mobile robotics since they typically rely upon pre-installed and calibrated environmental infrastructures. It might not be acceptable to retrofit the whole hospital area with external sensors. Absolute Location systems require a complex infrastructure of beacon nodes, which can be expensive and cumbersome to install and manage. Additionally, the number of stations and the interval between the stations affect the accuracy.

Rather than structuring the environment with external sensors, the best solution would be a simple plug & play solution using only the robot’s on-board sensors. Hence, relative location technologies that do not require structuring the environment are economically more viable options for the PGM. Absolute location systems might be useful for robot navigation but in the IWARD project the navigation module has been developed by one of its other research partners. Relative location between the robot and the follower is sufficient for the PGM to estimate the follower position during the guidance operation to synchronise its speed accordingly. Surprisingly, research on the development of relative location systems based on infrared and/or radio sensors/emitters is a rather unexplored area. Infrared Emitter [194, 195] or Ultrasonic Transponder [199] have been tested by a few researchers on this purpose, unfortunately they only work in line of sight situations. Another system called Cricket [180] developed by MIT is very promising as it has a centimetre level accuracy. It uses the difference of arrival times between the RF signal and the ultrasonic pulse and can be used both as absolute and relative positioning. However, in such systems the optical line of sight is still required for transmitting the ultrasonic pulse.

As discussed, RFID is very promising due to the no contact and non-line-of-sight nature of this technology. The robot equipped with an active RFID reader can easily identify an RF tag instrumented person using the RF signal transmitted from the tag, but RFID does not usually support localisation. Additionally, radio propagation in indoor environment is subject to numerous problems such as severe multipath, rare line-of sight (LOS) path, absorption, diffraction, and reflection due to the presence of obstacles and people. For these reasons, despite all the promise surrounding RFID, it is still not possible to locate an RFID tag from a single RFID reader, particularly in cluttered or dynamically changing environments. A single RFID reader can only detect if the tag is

in its range. Kanda et al [210] continuously changed the transmission power of the reader to coarsely estimate the tags’ position from the on-board RFID reader, which is worthy to note here. Unfortunately, the time needed to detect a person was relatively slow in their system. The delay is about 10 seconds. With such latency, the mobile robot might have significantly changed its position since the measurement took place. If the robot is travelling at 1 m/s and the localisation algorithm takes 5 seconds to complete from the time the ranging measurements were taken, the robot might be 5 m off from its intended position [172].

For uninstrumented scenarios, a camera is the best modality for human sensing. Cameras are inexpensive, and are the most field-tested solutions and widely used in mobile robots for person detecting, tracking and identification. Most solutions track people and identify them from faces [29, 39, 41]. Such systems assume that people coarsely face the robot. These approaches cannot be applied for a mobile robot which has to deal with moving people with faces that are not always perceivable [218]. Others extended the approach to skin colour [29, 36] or the full or upper human body detection. Some complementary approaches combine person detection and recognition [241, 242] in order to distinguish the targeted person from others. Stereo-based person detection provides more robust results than monocular approaches [151]. Some researchers [152- 154] successfully applied foreground segmentation on disparity maps to differentiate people from the background scenery. Moreover, stereo-based approaches can provide information about the person’s distance from the stereo head [139]. The field-of-view of conventional cameras is usually not more than 60 degrees, which means that a larger field-of-view can only be obtained through pan motion as implemented by [39]. Others [36, 47, 48] used an omnidirectional camera. Nevertheless, despite many advances, their performances heavily depend on the lighting conditions, viewing angle, distance to persons, and variability of humans’ appearance in video streams. Detecting people across multiple image frames as well as robustly identifying them solely with visual features is still a topic of much research in computer vision. Moreover, in real environments, such as crowded hospitals, this solution is not practically feasible to implement, because occlusions frequently arise due to the existence of many people [228].

Scanning range-finders and Doppler-shift sensors hold second choice among uninstrumented sensing approaches. They are insensitive to lighting and require less processing power when compared to vision. The UWB radar [98]; [99] works through- the-wall but it can only detect static people. Laser and sonar approaches still require line of sight. Moreover, leg detection in a 2D scan does not provide robust features for discriminating the different persons in the robot’s vicinity, while the detector fails when one leg occludes the other. Similarly, if multiple people walk with similar speed, their Doppler signatures interfere with each other. Recently UWB Doppler radars for detection and positioning of human beings in a complex environment became popular as they are immune against multi-path interference. But they are very expensive and cannot distinguish a certain person among others.

For resource-constrained scenarios, simple binary sensors are preferable. But they can only detect people and nothing else. It will require retrofitting the hospital environment, which is not a practical solution. Moreover, PIR cannot detect stationary people and floor tiles and EF sensors are difficult to install and interpret. Furthermore, all binary sensors require a high network density to provide accurate locations. Finally, binary sensor based person identification approaches [79, 81-85, 90, 243] only work among a small group of people over a constructed laboratory environment.

Despite the advances in the field, truly robust human-sensing is still practically an unrealised goal. Not any single module can robustly perform people detection, tracking, identification or localisation at the same time from a mobile platform. Multiperson tracking is still challenging in the real world, crowded environment such as hospitals, office buildings or in airports. People are easily lost, and tracks are often terminated or, even worse, incorrectly extended in the face of ambiguities [66].

The solution to overcome this issue for the PGM is sensor fusion. Sensor fusion approaches build upon the use of multiple sensors or sensing modalities in an attempt to combine their advantages while cancelling out their disadvantages as much as possible. These multi-modal sensing approaches have proven to be reliable and suitable for real- world applications. Multi-sensory systems increase the fault tolerance, reliability, robustness and precision of human sensing capability for mobile robots [66, 229, 235].

As discussed in section 3.4.3.2, sensor fusion over constructed environments is not a practical solution for the IWARD robot. Consequently, numerous guide robots considered multimodal sensor fusion for human sensing relying on onboard sensors. Typical approaches use cameras for face detection and microphones for sound source localisation. Speech-based user tracking is not suitable as single users typically don’t speak along their paths, and background noise is significantly higher than the voice of a person in a distance of several meters [33]. Others combined cameras with: lasers [222- 224]; sonar [217]; laser and sonar [48, 226]; laser and microphone [215, 225], microphone and PIR [73]; laser and US [227]. The major drawback of most of these hierarchical approaches is the sequential integration of the sensory cues. These systems typically fail if the laser range finder yields no information. Besides, for a mobile robot which has to deal with moving people, faces will not always be perceivable, hence verification could fail too. Moreover, unfortunately, not the least of them is overcome line-of-sight requirements in indoor environments [48].

Integration of Cricket, Stereovision and Active RFID modality should be an ideal solution for the IWARD PGM. Apart from person detection and tracking, stereo can provide distance information. Active RFID systems have a higher range than the passive ones and can perform accurate person identification and coarse localisation. Although similar combinations were employed in [228], their robots did not perform similar guiding tasks that the IWARD system required. In [228], vision and RFID data were fused for their person following robot. The person following robot has somewhat an opposite function than a person guiding robot where the robot follows a human instead of the human following the robot. Also they used passive RFID whose range is limited to only within 0.5 m and they used hierarchical approaches for sensory data integration. These systems typically fail if the RFID yields no information. Cricket sensors provide centimetre level distance accuracy when they are at line of sight. Feasibility of this sensor can be more rigorously scrutinized to find out whether it can be used as a third modality to be added with the other two choices.

In document BOLETÍN OFICIAL DEL ESTADO (página 30-41)