Vision-based Autonomous Navigation
for Wind Turbine Inspection using an
Unmanned Aerial Vehicle
by
Ren´e Parlange Chavarr´ıa
Thesis submitted as partial requirement for a
MSc degree in Computer Science at
Instituto Nacional de Astrof´ısica, ´Optica y Electr´onica February, 2019
Santa Mar´ıa Tonantzintla, 72840, Puebla, M´exico
Supervised by:
Jos´e Mart´ınez Carranza, Ph.D., INAOE
c
INAOE 2019 All rights reserved
The author grants INAOE permission to reproduce and distribute copies of this thesis in its entirety or in parts,
Abstract
Wind power generation is being rapidly adopted around the world as an alternative source of clean and renewable energy. Wind turbines require periodic inspection and maintenance to ensure a good performance and a prolonged lifetime. Traditionally, inspection involves the risk of a person falling while abseiling from the platform at the roof of the nacelle. Recently, UAVs have been controlled by operators to inspect the blades, taking pictures and video of their surface. However, this task requires ex-pert pilots and causes them to experience fatigue quickly. Alternatively, autonomous UAVs are not subject to human tiredness and can follow trajectories in a repeatable manner. The proposed approach consists in developing an autonomous agent that is able to locate itself and build a map of its environment, performing simultaneous localization and mapping. To recognize the position of the blades and hub, the UAV detects line segments and removes noise by segmenting the wind turbine from the background of the scene with color thresholding. The blade lines are detected by fitting a geometrical model to the filtered lines. Then, by performing a backwards projection from a 2D image plane to a 3D scene, the path planner establishes a trajectory of waypoints along the blades for the UAV to inspect from a safe dis-tance. Finally, knowing its current location, the autonomous navigation controller can follow these inspection points, gathering useful image data for further evalua-tion by an expert inspector. Experiments were carried out in a fully autonomous manner in simulation and in a real environment using scale wind turbines, as well as localization and navigation tests with a full-scale wind turbine.
Resumen
La generaci´on de energ´ıa e´olica ha tenido una creciente adopci´on como una fuente alternativa de energ´ıa limpia y renovable. Las turbinas e´olicas requieren inspecciones peri´odicas para asegurar su buen desempe˜no y una larga vida ´util. Tradicionalmente, la inspecci´on con acceso por cuerdas involucra el riesgo de que una persona caiga. Recientemente, los Veh´ıculos A´ereos No Tripulados (VANTs) han sido controlados por operadores para inspeccionar las palas, tomando fotos y video de la superficie. Sin embargo, esta tarea requiere pilotos expertos y les causa fatiga r´apidamente. Por otro lado, los VANTs aut´onomos no est´an sujetos al cansancio humano y pueden seguir trayectorias de forma repetible. El enfoque propuesto consiste en desarrollar un agente aut´onomo que sea capaz de localizarse y construir un mapa de su entorno. Para reconocer la posici´on de las palas y el rotor, el VANT detecta segmentos de l´ıneas y separa la turbina del fondo de la escena con una m´ascara de segmentaci´on obtenida estableciendo umbrales en espacios de color. Las l´ıneas que constituyen las palas se detectan ajustando un modelo geom´etrico de la turbina a las l´ıneas filtradas. Despu´es, realizando una proyecci´on inversa del plano de imagen 2D a una escena 3D, el planificador genera una trayectoria para que el VANT pueda inspeccionar la turbina desde una distancia segura. Finalmente, conociendo su ubicaci´on actual, el controlador de vuelo sigue estos puntos de inspecci´on, recolectando im´agenes ´utiles para la evaluaci´on posterior de un experto. Se llevaron a cabo experimentos con inspecciones aut´onomas en simulaci´on y en entornos reales usando turbinas a escala, as´ı como pruebas de localizaci´on y navegaci´on con una turbina de tama˜no real.
Acknowledgements
I would like to express my utmost gratitude to my advisors, Dr. Carranza and Dr. Sucar, for inviting me to work on this project and patiently guiding me with their insightful remarks.
I am grateful to Dr. Ren for her careful observations during my academic stay at her laboratory in Texas Tech University, and Douglas Crockett for making that exchange happen. I also wish to acknowledge the help provided by her students Yafeng Wang and Sanka Liyanage with the experimental setup at the National Wind Institute facility in Reese Technology Center.
I would like to thank my labmates Oyuki Rojas and Luis Gonz´alez from INAOE for our constant exchange of knowledge throughout this process.
This research project was possible thanks to the financial support from CONACYT grant 291137 and CONACYT-INEGI project 268528. Special thanks to CEMIE-E´olico P12 and CONACYT grant 291250 for funding my short-term scholar visit at Texas Tech University. I am also particularly grateful to Dr. Ibarg¨uengoytia for providing access to carry out experiments with the Komai KWT300 wind turbine at Centro Regional de Tecnolog´ıa E´olica (CERTE). I greatly appreciate the detailed work of Marco Antonio de Jes´us from the mechanical workshop at INAOE, who built an impressive scale wind turbine for our research.
Table of Contents
Abstract II
Resumen III
Acknowledgements IV
Table of Contents VIII
List of Figures XII
List of Tables XIV
Acronyms XV
1 Introduction 1
1.1 Motivation . . . 1
1.2 Justification . . . 2
1.3 Problem statement . . . 3
1.4 Objectives . . . 3
1.4.1 General objective . . . 3
1.4.2 Specific objectives. . . 4
1.5 Scope . . . 4
1.6 Contributions . . . 5
1.8 Thesis structure . . . 6
2 Theoretical Framework 7 2.1 Localization . . . 7
2.1.1 Camera calibration . . . 8
2.1.2 Simultaneous Localization and Mapping . . . 10
2.2 Detection . . . 12
2.2.1 Edge detection . . . 13
2.2.1.1 Gaussian blur . . . 13
2.2.1.2 Canny edge detector . . . 14
2.2.2 Hough transform . . . 16
2.2.2.1 Standard Hough Transform . . . 20
2.2.2.2 Probabilistic Hough Transform . . . 20
2.2.2.3 Progressive Probabilistic Hough Transform. . . 20
2.2.3 Color thresholding . . . 21
2.2.3.1 HSV color model . . . 21
2.2.3.2 CIE L*a*b* color space . . . 22
2.2.4 LSD: Line Segment Detector . . . 23
2.2.5 Random Sample Consensus . . . 26
2.3 Autonomous navigation . . . 30
2.3.1 Path planning . . . 30
2.3.2 Waypoint controller. . . 33
2.4 Chapter summary . . . 36
3 Related Work 37 3.1 Wind turbine inspection with UAVs . . . 37
3.2 Localization . . . 42
3.2.1 Sparse vs. dense SLAM . . . 44
3.2.2 Direct vs. indirect SLAM . . . 45
3.2.4 LSD-SLAM . . . 47
3.2.5 ORB-SLAM . . . 48
3.3 Detection . . . 49
3.4 Autonomous navigation . . . 52
3.5 Chapter summary . . . 54
4 Methodology 55 4.1 Localization . . . 56
4.2 Blade detection . . . 59
4.2.1 Hough transform . . . 59
4.2.2 Color thresholding . . . 61
4.2.3 LSD: Line Segment Detector . . . 62
4.2.4 Random Sample Consensus . . . 64
4.3 Autonomous navigation . . . 68
4.3.1 Pinhole camera model . . . 68
4.3.2 Waypoint follower. . . 69
4.4 ROS architecture . . . 70
4.5 Chapter summary . . . 71
5 Experiments and Results 72 5.1 Simulated environment . . . 72
5.1.1 Objective . . . 72
5.1.2 Scene. . . 73
5.1.3 Results. . . 74
5.1.3.1 Evaluation metrics . . . 76
5.2 Real setting - 3 m scale wind turbine . . . 78
5.2.1 Objective . . . 78
5.2.2 Scene. . . 78
5.2.3 Results. . . 79
5.2.3.2 Qualitative results . . . 85
5.2.3.3 Qualitative evaluation . . . 86
5.2.3.4 Ordering of trajectories . . . 95
5.2.3.5 Quantitative evaluation . . . 96
5.3 Real setting - 42 m full-scale wind turbine . . . 97
5.3.1 Objective . . . 97
5.3.2 Scene. . . 97
5.3.3 Results. . . 97
6 Conclusions 103 6.1 Future work . . . 104
List of Figures
2.1 No radial distortion. . . 9
2.2 Barrel distortion. . . 9
2.3 Pincushion distortion. . . 9
2.4 Non-maximum suppression . . . 15
2.5 Rho-theta (ρ, θ) parametrization. . . 16
2.6 Detection of a red line crossing three points, using the Hough transform. 18 2.7 Intersection of sinusoids in the Hough space. . . 19
2.8 Hue, Saturation, Value (HSV) color model. . . 21
2.9 Top and frontal view of the CIE L*a*b* color space. . . 22
2.10 Line-support regions: A group of connected pixels that share the same gradient angle up to a certain tolerance. . . 23
2.11 Example of a rectangular approximation of a line-support region. Left: image. Middle: one of the line-support regions. Right: rect-angular approximation superposed to the line-support region.. . . 24
2.12 Non-structured level-line orientations. Angles are independent ran-dom variables uniformly distributed in [0, 2π]. . . 24
2.13 In this example, there are 4 aligned points among 7 points. . . 25
2.14 The pinhole camera model determines the relationship between a point in a 3D scene and its projection onto a 2D image plane. . . 30
2.15 Diagram for camera coordinate system unaligned with the world co-ordinate system.. . . 31
2.16 Path planning of inspection points for our three-bladed wind turbine. 32
2.17 Diagram with waypoints for autonomous inspection. . . 33
2.18 Block diagram for a proportional-integral (PI) controller. . . 35
3.1 Visual localization and mapping systems (Desouza and Kak, 2002). . 44
3.2 Categorization of popular monocular SLAM systems (Kudan, 2017). . 46
3.3 LSD-SLAM on its foodcourt dataset. . . 47
3.4 ORB-SLAM2 with camera facing downwards. . . 48
4.1 Flowchart for simulated/real autonomous wind turbine inspection. . . 55
4.2 Metric monocular ORB-SLAM with camera facing downwards. . . 56
4.3 Geometric projection to the ground using the pinhole camera model. 57
4.4 The pinhole camera model was used to determine the relationship
between a point in the 2D image plane and 3D its coordinates. . . 57
4.5 Standard Hough transform for lines.. . . 60
4.6 Progressive Probabilistic Hough transform for lines. . . 60
4.7 HSV color segmentation in simulated environment (thresholds in
Ta-ble 6.4). . . 61
4.8 HSV color segmentation at NWI facilities in Reese Technology Center. 61
4.9 Standard Hough transform over the HSV color space. . . 61
4.10 All detected line segments pictured in random colors. . . 62
4.11 CIE L*a*b* segmentation mask. . . 63
4.12 Subset of filtered lines, obtained by applying the segmentation mask. 63
4.13 RANSAC model. Subset of three lines (in blue) with their
intersec-tions (in orange) and centroid (in red). . . 66
4.14 Blade detection with generated inspection points. . . 66
4.15 Test with RANSAC blade detector from different viewpoints. . . 67
4.16 3D inspection points generated by the path planner using the pinhole
4.17 The publisher/subscriber ROS architecture for the simulated
environ-ment. . . 70
4.18 The publisher/subscriber ROS architecture for real settings. . . 71
5.1 Top view of the 10 m simulated scale wind turbine. . . 73
5.2 AR.Drone 2.0 in front of the 10 m simulated wind turbine. . . 73
5.3 Aerial photograph of CERTE and its KWT300 wind turbine. . . 73
5.4 Gazebo simulation. . . 74
5.5 Inspection plan in Rviz. . . 74
5.6 Autonomous wind turbine inspection in simulation. . . 75
5.7 3D plot of simulated inspection trajectory. . . 77
5.8 Bebop 2 ready for inspection. . . 78
5.9 GoPro below frontal camera. . . 78
5.10 RANSAC blade detection. . . 80
5.11 Inspection plan in Rviz. . . 80
5.12 3D plot of inspection trajectory #1. . . 80
5.13 Autonomous inspection #1 of scale wind turbine. . . 82
5.14 RANSAC blade detection. . . 83
5.15 Inspection plan in Rviz. . . 83
5.16 3D plot of inspection trajectory #2. . . 83
5.17 Autonomous inspection #2 of scale wind turbine. . . 85
5.18 RANSAC blade detection. . . 87
5.19 Inspection plan in Rviz. . . 87
5.20 3D plot of inspection trajectory #3. . . 87
5.21 RANSAC blade detection. . . 89
5.22 Inspection plan in Rviz. . . 89
5.23 3D plot of inspection trajectory #4. . . 89
5.24 RANSAC blade detection. . . 91
5.26 3D plot of inspection trajectory #5. . . 91
5.27 RANSAC blade detection. . . 93
5.28 Inspection plan in Rviz. . . 93
5.29 3D plot of inspection trajectory #6. . . 93
5.30 Path planning of inspection points for the three blades. . . 95
5.31 Blade inspection of the Komai KWT300 wind turbine at CERTE. . . 97
5.32 Manual inspection of Komai KWT300 wind turbine at CERTE (part 1). . . 98
5.33 Manual inspection of Komai KWT300 wind turbine at CERTE (part 2). . . 99
5.34 Semi-autonomous inspection of Komai KWT300 at CERTE (part 1). 100
List of Tables
2.1 Line detectors and color spaces used for experiments in simulated and
real environments. . . 25
2.2 RANSAC parameters and detection results for inspection #1. . . 29
2.3 The six possible orderings for blade inspection, as labeled in Figure 2.16. . . 32
3.1 Comparison of UAV systems for wind turbine inspection. . . 41
3.2 Main elements in widely used monocular SLAM systems. . . 46
5.1 Tracking error for simulated trajectory. . . 77
5.2 RANSAC parameters and detection results for inspection #1. . . 81
5.3 Flight plan #1. 2D detection points to 3D inspection waypoints. . . . 81
5.4 RANSAC parameters and detection results for inspection #2. . . 84
5.5 Flight plan #2. 2D detection points to 3D inspection waypoints. . . . 84
5.6 RANSAC parameters and detection results for inspection #3. . . 88
5.7 Flight plan #3. 2D detection points to 3D inspection waypoints. . . . 88
5.8 RANSAC parameters and detection results for inspection #4. . . 90
5.9 Flight plan #4. 2D detection points to 3D inspection waypoints. . . . 90
5.10 RANSAC parameters and detection results for inspection #5. . . 92
5.11 Flight plan #5. 2D detection points to 3D inspection waypoints. . . . 92
5.12 RANSAC parameters and detection results for inspection #6. . . 94
5.14 Blade inspection points ordering for the six presented experiments, as
labeled in Figure 5.30. . . 95
5.15 Quantitative evaluation for six experiments in a real setting. . . 96
6.1 Intrinsic parameters for simulated frontal camera of AR.Drone 2.0. . . 113
6.2 Intrinsic parameters for simulated bottom camera of AR.Drone 2.0. . 113
6.3 Intrinsic parameters for frontal camera of Bebop 2. . . 113
6.4 Upper and lower thresholds for Hue, Saturation, Value (HSV) in
sim-ulation. . . 114
6.5 Upper and lower thresholds for Hue, Saturation, Value (HSV) in real
setting with 1.5 m scale wind turbine at the National Wind Institute
facilities in Reese Technology Center. . . 114
6.6 Upper and lower thresholds in CIE L*a*b* for 3 m scale wind turbine
Acronyms
UAV Unmanned Aerial Vehicle UAS Unmanned Aerial System DOF Degrees of Freedom FOV Field of View
IMU Inertial Measurement Unit
SLAM Simultaneous Localization and Mapping
VO Visual Odometry
SfM Structure from Motion SHT Standard Hough Transform PHT Probabilistic Hough Transform
PPHT Progressive Probabilistic Hough Transform LSD Line Segment Detector
RANSAC Random Sample Consensus ROS Robot Operating System
Chapter 1
Introduction
This chapter presents an overview of the growth in wind power generation around the world and the importance of regularly inspecting wind turbines to sustain that performance. We point out the drawbacks of traditional inspection methods, com-pared with the more recent approach involving remotely operated Unmanned Aerial Vehicles (UAVs). Then, we propose an autonomous navigation solution for wind turbine inspection using this kind of aerial vehicles, stating the motivation and jus-tification for our method, as well as the objectives and scope of this project. Finally, we report our contributions and the publications derived from this research.
1.1
Motivation
Wind power is an emerging source of clean and renewable energy. During 2015, 3.7% of all global electricity was supplied by wind energy. By the end of 2016, there were 341,320 wind turbines spinning around the world. The demand for wind farms has grown dramatically, creating over a million jobs last year (G.W.E.C., 2017).
Like any machine, a wind turbine requires maintenance and it involves periodic inspection of its surface to determine the physical state of its rotor blades. They are subject to mechanical stress that produces cracks, and also to external factors, such as erosion and lightning strikes. Wind turbines deteriorate with adverse weather
and a late detection of imperfections could shorten their lifetime and decrease their performance in terms of power generation.
This task has been traditionally done through simple visual inspection from the ground with a telephoto camera lens or by rope access, where a person abseils from the platform at the top of the nacelle. The former method is time consuming and the latter puts a human life at risk, while both are restricted by the mobility and field of view of the operator. Trying to overcome these challenges, a recent approach using UAVs is rapidly being adopted in the industry.
Mexico has embraced wind energy generation by installing an impressive 713.6 MW of new capacity to reach a total of 3,073 MW by the end of 2015. Moreover, Mex-ico has set an ambitious annual target of 2,000 MW installed per year until 2023 (G.W.E.C., 2016). Apart from this increased adoption and installation of wind turbines, ongoing market reforms in the electricity sector are expected to have a significant impact on the future of wind power in the country.
Currently, there is research in Mexico focused on different aspects regarding gen-eration and distribution of electricity based on wind power. This thesis is aligned with the objectives of Project 12 (P12) of the Mexican Wind Energy Innovation Center (Centro Mexicano de Innovaci´on en Energ´ıa E´olica, CEMIE-E´olico) led by Instituto Nacional de Astrof´ısica, ´Optica y Electr´onica (INAOE) and Instituto Na-cional de Electricidad y Energ´ıas Limpias (INEEL). The main objective of P12 is the development of technology based on artificial intelligence and mechatronics for the integration of a wind park to a smart grid.
1.2
Justification
Some benefits of employing autonomous UAVs for wind turbine inspection are:
• The elimination of risk involved in the process of rope access inspection by a human operator.
• The decrease in inspection time and the reduction of costs.
• The increase in the number of possible inspections per day, as the autonomous controller is not subject to human tiredness.
• The improvement in mobility and field of view, along with the quality of the images, as they are captured from a safe, close distance.
1.3
Problem statement
This method poses a series of challenges. Firstly, the UAV must be able to locate itself within its environment. This problem was approached using Simultaneous Localization and Mapping (SLAM). It allows the UAV to locate itself in a map it concurrently builds, using the images from its monocular camera as input. From the sequence of frames, it extracts features from the scene and anchors them to the map, estimating its pose using these keyframes.
Secondly, the UAV must be able to perceive the three-point star shape of the wind turbine and determine its relative orientation with respect to the camera. Based on an arbitrary takeoff position, the UAV flies up to the height of the hub and detects the lines that best fit a geometrical model of the wind turbine.
Once the position and orientation of the blades are known, the path planner es-tablishes a trajectory of 3D inspection points from the 2D detection of the blades. Finally, the autonomous navigation controller generates control signals and reaches those waypoints, while recording video of the execution of the flight plan.
1.4
Objectives
1.4.1
General objective
To develop an Unmanned Aerial System (UAS) that carries out autonomous flight to navigate along the blades of the wind turbine capturing images for further analysis.
1.4.2
Specific objectives
• To keep a continuous localization using the integrated monocular camera and sensors, building a metric map of the environment and estimating the current pose of the UAV.
• To detect the blades, in order to plan a safe inspection trajectory.
• To develop a path planner with visual guidance, based on line features.
• To incorporate a priori information, such as the geometrical properties, and dimensions of the scene and the object of interest, to enhance detection and path planning.
• To gather pictures and video for further evaluation of the physical state of the wind turbine.
1.5
Scope
This methodology aims at the automatic acquisition of high quality image data. The subsequent evaluation of the physical state of the blades for maintenance scheduling is out of the scope of this research. However, the detection of external faults could be performed with human expertise or with an image processing method applied to the acquired sequence of images.
The inspection task in this work covers the frontal area of the wind turbine in the scenario where the blades are nearly static, either because of low wind speed or because the wind turbine has been stopped for maintenance.
The vision-based autonomous navigation system was developed with a robotics sim-ulator and tested in real scenarios using the following wind turbines:
• A 10 m simulated 3D model in Gazebo.
• A 3 m scale wind turbine at INAOE.
• A full-scale 42 m Komai KWT300 wind turbine at CERTE.
1.6
Contributions
• A RANSAC procedure for blade detection that finds line segments that best fit a geometrical model of a wind turbine.
• A path planner that generates an inspection trajectory by backprojecting the 2D detection of the blades to 3D coordinates.
• A waypoint navigation controller with lateral movements that maintains the orientation of the camera perpendicular to the surface of the blades.
• A state machine that coordinates the autonomous inspection task from takeoff to landing.
1.7
Publications
Partial results of this research were published at the International Micro Air Vehicle Conference (IMAV 2018) in collaboration with Dr. Beibei Ren from Texas Tech University.
• Parlange, R., Martinez-Carranza, J., Sucar, L.E., and Ren, B. (2018). Vision-based autonomous navigation for wind turbine inspection using an unmanned aerial vehicle. In Watkins, S., editor, 10th International Micro Air Vehicle Competition and Conference, pages 283-288, Melbourne, Australia.
IMAV 2018 video for presentation:
1.8
Thesis structure
This document is structured in six chapters as follows. Chapter 1 presents a general overview of this research, including a description of the problem, our motivation, justification, objectives and contributions. Chapter 2 lays out the theoretical con-cepts needed to address such problem. Chapter 3 consists of a review of the state of the art, mentioning the main related work. Chapter 4 describes the methodology that was followed with this approach. Chapter 5 presents the experiments that were carried out during this research and their corresponding qualitative and quantita-tive results. Finally, Chapter 6 compiles the conclusions that were drawn from the experiments, and the future work stating the areas of improvement.
Chapter 2
Theoretical Framework
This chapter is divided in three sections that cover the theoretical concepts behind the modules developed for autonomous wind turbine inspection with an UAV. Most techniques, ranging from localization, object detection to path planning are based on computer vision, while autonomous navigation uses control theory to reach the 3D coordinates of inspection points. There is a sequential dependence between these sections as autonomous navigation depends on path planning, which in turn depends on object detection and relies on the capacity of the UAV to locate itself.
2.1
Localization
In robotic navigation, simultaneous localization and mapping (SLAM) is the compu-tational problem in which a robot builds a map of its unknown environment, while at the same time locates itself using this map. This is a chicken-and-egg problem as localization estimates the pose of a robot given a map with landmarks, and mapping infers landmarks given the poses of the robot. Thus, SLAM solves this problem si-multaneously using landmark detection, feature extraction, feature matching, state estimation, state update and landmark update. The most popular approximate solutions use statistical methods that deal with noisy observations using Bayesian inference, such as the particle filter and the Kalman filter.
2.1.1
Camera calibration
In order to perform computer vision techniques such as SLAM, we must know the intrinsic parameters of our cameras. These coefficients are obtained by calibrating the camera using an object with known dimensions and distinct line features and intersections, e.g. a chessboard. The pattern of an 8x6 chessboard with 108 mm squares was used to calibrate the monocular cameras of the simulated and real UAVs. With the introduction of inexpensive pinhole cameras in the late 20th century, robots have been equipped with this sensor, which has rich visual information for navigation. However, these cameras are subject to significant distortion. Fortunately, these are constants that can be corrected with calibration and remapping. Furthermore, with calibration we can also determine the relationship between the camera units (pixels) and the real world units (in this case, meters). For distortion, the radial and tangential factors are taken into account. In an input image with a previous pixel point (x, y) coordinates, its corrected position will be (xcorrected, ycorrected). Equation
2.1 is used for the radial factor:
xcorrected =x(1 +k1r 2
+k2r 4
+k3r 6
)
ycorrected =y(1 +k1r 2
+k2r 4
+k3r 6
)
(2.1)
wherer is the radius of the point, as this function is radially symmetric around the center of distortion found at the principal point (cx,cy) (Ma et al.,2003), whilek1,
k2 and k3 correspond to the radial distortion coefficients.
The presence of radial distortion manifests in the form of barrel distortion (Figure
2.2) or pincushion (Figure 2.3). Tangential distortion occurs when the lens is not perfectly parallel to the imaging plane. It can be corrected with Equation 2.2:
xcorrected =x+ [2p1xy+p2(r 2
+ 2x2
)]
ycorrected =y+ [p1(r 2
+ 2y2
) + 2p2xy]
(2.2)
The camera must be moved in front of the object with a certain skew, in order to detect patterns and compute the coefficients. Each found pattern results in a new equation. To solve the equations, at least a predetermined number of pattern snapshots is needed, forming a well-posed equation system. This number is higher for the chessboard than for the circle pattern.
Calibration determines the camera matrix, distortion parameters and re-projection error, taking input from the camera. This process was performed with the monocular cameras of the simulated AR.Drone 2.0 (frontal and bottom cameras, Tables 6.1
and 6.2 in the Appendix) and Bebop 2 (frontal camera, Table 6.3). These intrinsic parameters are required by the SLAM system to remove distortion from the fish-eye lens, extract features correctly and generate an accurate map of the environment. These parameters are also used by the path planner, which employs the pinhole camera model to obtain 3D coordinates from a 2D detection in the image plane.
2.1.2
Simultaneous Localization and Mapping
Simultaneous Localization and Mapping (SLAM) using cameras is referred to as vi-sual SLAM (vSLAM). This localization technique was originally proposed to achieve autonomous navigation with robots and it has been widely studied in the fields of computer vision, robotics and augmented reality (Taketomi et al., 2017).
The early work on vSLAM using a monocular camera was based on tracking and mapping feature points. This feature-based approach is considered indirect, as it has an intermediate step with feature extraction from textured surfaces. To cope with textureless environments direct methods use intensity gradients of the whole image for tracking and mapping. The basic framework followed by most vSLAM algorithms is mainly composed of three modules: initialization, tracking and mapping.
To initialize vSLAM, it is necessary to define a certain coordinate system for camera pose estimation and 3D reconstruction of an unknown environment. Therefore, in the initialization, the global coordinate system must be defined and a part of the
environment is reconstructed as an initial map in the global coordinate system. After the initialization, tracking and mapping are performed to continuously esti-mate camera poses. In tracking, the reconstructed map is tracked in the image to estimate the camera pose with respect to that map. In order to do this, 2D-3D cor-respondences between the image and the map are obtained with feature matching and tracking. The camera pose is computed from the correspondences by solving the Perspective-n-Point (PnP) problem. Assuming that the intrinsic parameters of the camera have been calibrated beforehand, a camera pose is equivalent to the ex-trinsic camera parameters with translation and rotation of the camera in the global coordinate system. For mapping, the 3D structure of an environment is computed whenever the camera observes unknown regions, expanding the map.
The following two additional modules are also included in some vSLAM algorithms to improve stability and accuracy: relocalization and global map optimization. Relocalization is required when tracking fails, due to fast camera motion or distur-bances. In this case, it is necessary to compute the camera pose with respect to the map again. The global map optimization module is used to suppress estimation error which was accumulated during the trajectory of camera movement. The map is refined by considering the consistency of the whole map information. Loop closing is used to obtain geometrically consistent maps, where a closed loop is first searched by matching the current image with previously acquired images. Bundle adjustment (BA) is also used to minimize re-projection error of the map by optimizing both the map and camera poses.
Visual SLAM (vSLAM), Visual Odometry (VO), and online Structure from Motion (SfM) are closely related techniques designed to estimate camera motion and 3D structure in an unknown environment. Odometry estimates the sequential changes of sensor positions over time. Camera-based odometry is referred to as visual odometry and it has the following relationship with vSLAM:
In visual odometry, the geometric consistency of a map is considered only in a small portion of a map or only relative camera motion is computed without mapping. On the other hand, in vSLAM, the global geometric consistency of a map is consid-ered. Structure from Motion (SfM) is a technique to estimate camera motion and 3D structure of the environment in a offline batch process. There is no definitive difference between vSLAM and real-time SfM.
For the task in this research we use ORB-SLAM2 (Mur-Artal and Tardos, 2017) with a monocular camera, both in simulation and in a real setting, to estimate the pose of the UAV and allow autonomous navigation. ORB-SLAM is an open-source feature-based vSLAM method with multi-threaded tracking, mapping and closed-loop detection, and the map is optimized using pose-graph optimization.
Monocular vSLAM can only generate a map up to a scale factor. As depth cannot be perceived with just one lens, the real world scale is not directly observable. To obtain the scale of the map using a monocular camera with ORB-SLAM, we emulate an RGB-D camera (Rojas-Perez and Martinez-Carranza,2017). As this version requires RGB frames coupled with depth, we generate a synthetic depth map. This can be solved geometrically by pointing the camera downwards and projecting rays onto a ground plane, knowing the degree of inclination of the camera and the altitude of the UAV. This solution for metric monocular vSLAM retrieves an absolute scale with the relationship between distances in the map and the real world.
2.2
Detection
Computer vision tasks such as object detection require an efficient representation, reducing the amount of data in the image, while preserving its visual characteristics and structural information. Edge detection can reduce the amount of data to be processed, obtaining the outline of the wind turbine. The output of the edge detector is a binary image suitable for line feature extraction for object detection.
2.2.1
Edge detection
Edge detection algorithms are some of the most widespread tools in image processing. Their main purpose is to recognize the boundaries in an image, separating segments with a difference in pixel intensity. In the scope of this work, edge detection provides a pre-processing stage before blade line detection.
Finding edges is a straightforward task for humans. However, computationally de-scribing edge properties is not as intuitive. Edges appear in an image whenever there is a change between segments. Lines or ridges can be found by detecting edges close enough to each other. There are also corners or junctions of two edges, where lines cross. In these regions, the directional rate of change or gradient can indicate the direction of the edge. There is a range of differentiation operators that aim at certain kinds of edges. One of the most versatile is the Sobel operator, used by the Canny edge detector (Canny, 1986).
As noise can intensify the properties of this differentiation, there is usually some sort of noise reduction that needs to be applied beforehand; for instance, Gaussian blur.
2.2.1.1 Gaussian blur
To remove image noise, a Gaussian filter is used. Also known as Gaussian smoothing, the filter blurs the image, removing detail and noise. This is similar to the way a mean filter works, but the Gaussian filter uses a different convolution kernel.
G(x) =√1 2πσe
−x
2
2σ2 (2.5)
G(x, y) = 1 2πσ2e
−x
2+ y2
2σ2 (2.6)
The Gaussian distribution in the 1D and 2D cases are shown in Equations 2.5 and
2.6, where σ is the standard deviation of the distribution. The idea of Gaussian smoothing is to use these distributions as a point spread function to create a filtering mask, using convolution to blur an image. Since images are usually stored as pixel
values, a discrete approximation of the Gaussian function is used on the filtering mask before performing the convolution.
2.2.1.2 Canny edge detector
The Canny edge detector (Canny,1986) is one of the most prevalent edge detection methods. The algorithm calculates directional derivatives with the Sobel operator, which uses two 3x3 kernels, one for horizontal differentiation and another one for its vertical conterpart, depicted in gradient matrices 2.7. These kernels are convoluted with the original image, so that differentiation is performed at every image point.
Gx=
−1 0 +1 −2 0 +2 −1 0 +1
∗A and Gy =
−1 −2 −1
0 0 0
+1 +2 +1
∗A (2.7)
MatricesGx and Gy contain directional derivatives along x and y axes, respectively. Equations2.8 and 2.9 are used to obtain gradient magnitude and direction.
G= q
Gx
2
+Gy
2
(2.8)
Θ= arctan
Gy Gx
(2.9)
In this context, it is irrelevant whether the edge indicates a positive or negative change in pixel intensity. After obtaining the gradient magnitude and direction, a full image scan is performed to remove unwanted pixels, which may not constitute an edge. In this non-maximum suppression process, every pixel is checked to determine if it is a local maximum within its neighborhood in gradient direction.
2.2.2
Hough transform
The Hough transform can be described as a computational procedure to detect straight lines in digitized images. This feature extraction method (Duda and Hart,
1972) uses angle-radius parameters to simplify the computation, rather than the slope-intercept form, originally used by its creator (Hough, 1962). While it can be generalized to detect arbitrary shapes with a known parametric equation, this task can be approached using lines. The most common line equation is:
y=mx+b (2.10)
Equation 2.10 has two parameters, slope (m) and intercept (b). However, this parametrization cannot represent vertical lines as its slope is undefined. A line can also be expressed using polar coordinates with angle (θ) and radius (ρ) in the rho-theta parametrization described by Equation2.11:
ρ=xcosθ+ysinθ (2.11)
where θ represents the angle between the x-axis and the line, measured counter-clockwise, andρ is the perpendicular distance from origin to the closest point in the straight line, as depicted in Figure2.5.
Any line can be described uniquely using these two parameters if its theta interval is restricted toθ ∈[0,180], with the condition thatρ∈R. Therefore, the parameter space for lines has two dimensions (ρ,θ), and with this restriction, every line in the
x−y plane corresponds to a unique point in theρ−θ plane.
Moreover, co-linear points in an edge are transformed into sinusoidal curves defined by Equation 2.11 in the parameter space, where line detection can be found at the intersection of these curves in the so-called Hough space. Therefore, a dual property of the point-to-curve transformation can also be established.
(Duda and Hart, 1972) established the following properties for the Hough space:
1. A point in the image plane corresponds to a sinusoidal curve in the parameter plane.
2. A point in the parameter plane corresponds to a straight line in the image plane.
3. Points lying on the same straight line in the image plane correspond to sinu-soidal curves through a common point in the parameter plane.
4. Points lying on the same curve in the parameter plane correspond to lines through the same point in the image plane.
For lines described by two parameters, a 2D accumulator array is created to hold the values of the two parameters and it is initially set to 0. Rows denoteρand columns denoteθ. The size or resolution of the array depends on the accuracy needed for the task. For a resolution of one degree, we need 180 columns in the accumulator. For
ρ, the maximum possible distance is the diagonal length of the image. Therefore, for a one pixel accuracy, the number of rows is the diagonal length of the image. For instance, through visual observation we can observe that there is a red line formed by three points in Figure 2.6. We take (x, y) values of the first point of the line and check every line (ρ, θ) that goes through that point by dividing the
180o
interval in six angles, 30o
apart from each other, and then incrementing the value of the corresponding (ρ,θ) cell in the accumulator. Then, we take the second point and repeat the same procedure. This way, at the end of the process, if we look for peaks or local maxima in our accumulator, we can find candidate straight lines, including the red line depicted in our image (Figure 2.6). Therefore, we have a voting procedure that allows us to find lines by establishing a detection threshold for the minimum number of sinusoid intersections in the parameter space.
The accumulator resolution in this procedure determines the precision with which lines can be detected. In this case we have a 30 degree resolution with 6 angles, shown in different colors. Thus, for each point, six continuous lines with different angles are drawn. For each solid line, a perpendicular dotted line with length ρ
intersects with the origin. Figure 2.6 shows the values of the ρ and θ parameters for the green line at each point. Likewise, the table below displays the ρ and θ
parameters for every angle determined by the resolution.
For (θ = 60o
, ρ = 80 px) parameters, there are three counts in the accumulator. While any other pair of parameters will have at most one count in the accumulator. Therefore with a threshold ofcounts >1 we have detected one line formed by three points. This line detection can be visualized as the intersection of sinusoids in the Hough space, shown in Figure 2.7.
Figure 2.7: Intersection of sinusoids in the Hough space.
In the Hough space, we can verify if there is a candidate line. As it can be observed in Figure2.6, there is a red line with a 60o
angle, crossing three points in the image space. Each point corresponds to a sinusoid in the parameter space in Figure 2.7, where their intersection coordinates represent the rho-theta (ρ,θ) values for the line.
2.2.2.1 Standard Hough Transform
The Standard Hough Transform (SHT) (Duda and Hart, 1972) for lines is not the most efficient implementation, as for each point in the edge of an image, the al-gorithm updates the accumulator for the whole resolution of the θ parameter. An important fact is that this standard procedure returns infinite lines described by the (ρ,θ) parameters. This algorithm can become more efficient by using a probabilistic approach, where only a randomly selected subset of points is used for detection, obtaining line segments described by connecting two points, (x0, y0) and (x1, y1).
2.2.2.2 Probabilistic Hough Transform
Since more spatial information is needed for object detection and preferably a more efficient implementation, the Probabilistic Hough Transform (PHT) (Kiryati et al.,
1991) is also performed. The idea behind this method is to transform randomly selected pixels in the edge image into counts in the accumulator. When a bin in the accumulator corresponding to a particular infinite line has got a certain number of votes, the edge image is searched along that line to see if one or more finite lines are present. Then, all pixels on that line are removed from the edge image. Thus, the algorithm returns finite lines. If the vote threshold is low, the number of pixels to evaluate in the accumulator also gets smaller.
2.2.2.3 Progressive Probabilistic Hough Transform
The Progressive Probabilistic Hough Transform (PPHT) (Galamhos et al., 1999) builds on the Probabilistic Hough Transform. An important distinction from PHT is that PPHT does not require a previously selected subset ratio. Rather than just randomly extracting a subset n, PPHT exploits knowledge about a threshold effect for then toN ratio. Instead, the algorithm selects points randomly until end conditions are satisfied, where the n toN ratio ends up close to the threshold effect area. It also returns start and end points for each line segment.
2.2.3
Color thresholding
To enhance performance of line detectors we apply segmentation. This process partitions the image into segments, allowing distinction of the wind turbine from the background of the scene. In order to accomplish this, the image is transformed to a color space where it is split into three channels, adjusting lower and upper thresholds until the object of interest is clearly separated from the background. The color models used were HSV in simulation and CIE L*a*b* in real settings.
2.2.3.1 HSV color model
HSV (Hue, Saturation, Value) is an alternative representation to the RGB color model that aligns more closely with the way human vision perceives color-making attributes. The main reason why color segmentation is easier with this color model is because it separates color information (chroma) from intensity (luma).
Figure 2.8: Hue, Saturation, Value (HSV) color model.
In this model, colors of each hue are arranged in a radial slice around a central axis of neutral colors which ranges from black at the bottom to white at the top, as shown in Figure 2.8. The HSV representation models the way color is mixed, with the saturation dimension resembling shades of brightly color, and the value dimension resembling the mixture of them with varying amounts of black or white.
2.2.3.2 CIE L*a*b* color space
CIE L*a*b* (also known as CIELAB or Lab) is a color space defined by the In-ternational Commission of Illumination (CIE) in 1976. It expresses color as three numerical values, L* for lightness, a* for green-red and b* for blue-yellow compo-nents. CIE L*a*b* was designed to be perceptually uniform with respect to human color vision, meaning that the same amount of numerical change in these values corresponds to about the same amount of visually perceived change.
The color space is a three-dimensional real number space, allowing an infinite number of possible representations of colors. In practice, the space is usually mapped onto a three-dimensional integer space for digital representation, and thus the L*, a*, and b* values are usually absolute, with a pre-defined range.
Figure 2.9: Top and frontal view of the CIE L*a*b* color space.
The lightness value, L*, represents the darkest black at L* = 0, and the brightest white at L* = 255. The color channels, a* and b*, represent true neutral gray values at a* = 0 and b* = 0. The a* axis represents green in the negative direction and red in the positive direction. The b* axis represents blue in the negative direction and yellow in the positive direction. The scaling and limits of the a* and b* axes depend on the specific implementation, but they often run in the range of [−128, +127] or [0, +255] in OpenCV.
2.2.5
Random Sample Consensus
Random Sample Consensus (RANSAC) (Fischler and Bolles, 1981) is a general it-erative method for parameter estimation of a mathematical model from a set of observed data that contains outliers. A basic assumption is that the data consists of inliers, data whose distribution can be explained by some set of model parame-ters, though may be subject to noise or outliers, which are data that do not fit the model. Unlike other adopted statistical techniques, RANSAC originated within the computer vision community and Algorithm 1 shows the pseudocode for its generic procedure. This general scheme was adapted to use line segments from LSD: Line Segment Detector and identify the set of lines, typically three, that best fit a geo-metrical model of the wind turbine. The modified Algorithm 2 is described in the methodology section.
In this research, the geometrical model of a wind turbine was based on the three-bladed design, as 90% of the installed wind turbines have three rotor blades (Megraw,
2012). Mechanically, the more blades there are, the higher the torque and the slower the rotational speed. Wind turbines need to operate at high speeds, and do not need much torque. Therefore, the fewer the number of blades, the better suited the system is for producing electricity. Theoretically, a one-bladed turbine is the most aerodynamically efficient configuration. However, it is not practical because of stability problems. Turbines with two blades offer the next best design, but are affected by a vibration phenomenon produced by changes in resistance to yawing, depending on its alignment. On the other hand, a three-bladed turbine has very little vibration. This is because when one blade is in the horizontal position, its resistance to the yaw force is counter-balanced by the two other blades. So, a turbine with three blades represents the best combination of high rotational speed and minimum stress.
Our RANSAC procedure can be adapted to other blade configurations by modifying their number and angle between them in the geometrical model of the wind turbine.
Algorithm 1Random Sample Consensus
Input:
data←a set of observations
model←a model to be fitted to data points
n←minimum number of data points required to estimate model parameters k←maximum number of iterations allowed in the algorithm
t←threshold value to determine data points that are fit well by the model d←number of close data points required to assert that a model fits well to data
Output:
bestFit←model parameters which best fit the data (nil if no good model is found)
1: iterations = 0
2: bestFit = nil
3: bestErr = something really large
4: whileiterations<kdo
5: maybeInliers = n randomly selected values from data
6: maybeModel = model parameters fitted to maybeInliers
7: alsoInliers = empty set
8: forevery point in data not in maybeInliersdo
9: ifpoint fits maybeModel with an error smaller than tthen 10: add point to alsoInliers
11: end if
12: end for
13: ifthe number of elements in alsoInliers is>dthen 14: betterModel = model parameters fitted to all points in
15: maybeInliers and alsoInliers
16: thisErr = a measure of how well betterModel fits these points
17: ifthisErr<bestErrthen 18: bestFit = betterModel
19: bestErr = thisErr
20: end if
21: end if
22: increment iterations
23: end while 24: returnbestFit
RANSAC is a non-deterministic algorithm in the sense that it produces a reason-able result only with a certain probability, with this probability increasing as more iterations are allowed. The number of iterations, k, can be determined as a function of the desired probability of successp.
Let p be the desired probability that the RANSAC algorithm provides a useful result. More specifically, probability p (usually set to 0.99) guarantees that after k iterations at least one set of selected random samples does not include outliers. Letw be the probability of choosing an inlier. A common case is that w is not well known beforehand, but some rough value can be given.
w= inliers
inliers+outliers (2.13)
Assuming that a random subset of s data points are needed to estimate the model are selected independently,wn is the probability that alls points are inliers. 1−wn
is the probability that at least one of then points is an outlier, implying that a bad model will be estimated from this random subset.
That probability to the power of k iterations is the probability that the algorithm never selects a set of n data points which are all inliers and this must be the same as 1−p.
1−p= (1−wn)k (2.14)
The resulting number of iterationsk:
k= log(1−p)
log(1−wn) (2.15)
The standard deviation of k:
SD(k) = √
1−wn
To explain the parameters involved in the RANSAC blade detection, we use Table
2.2, with the same values as Table 5.2 from the first experiment in a real setting.
RANSAC blade detection #1
Parameters Results
p 0.99 total lines 633
w 0.70 filtered lines 58
n 3 pair values angle length intersection score
k 11 pair 1 116o 109 px (352, 241) 75.0457
min length 10 pair 2 120o 105 px (351, 229) 56.2624
line extension 1.005 pair 3 122o 102 px (340, 236) 74.7258
neighborhood 4 reference 120o mean: 105 centroid: (348, 235)
min score 500 model score 206.034
Table 2.2: RANSAC parameters and detection results for inspection #1.
In the left column, under Parameters, we have a desired probability p = 0.99 for RANSAC to provide a useful result. The inlier/outlier ratio w = 0.70, determined with Equation 2.13, is the approximate percentage of lines that lie on the outline of the wind turbine after applying the segmentation mask. The parameter n corre-sponds to the minimum number of lines to estimate the model, for our three-bladed wind turbine, n = 3. Using Equation 2.15 we calculated the number of iterations,
k= 11. To remove small line segments detected by LSD, we used a minimum length = 10 px. The line extension = 1.005 parameter is used to extend lines by 0.5% steps up to the limits of the segmentation mask. The neighborhood = 4 determines how far from the line endpoints we verify against the segmentation mask. Finally, the minimum score = 500 was determined empirically, looking at the models and the quality of the detections. We concluded that models with a score below 500 provide good detections, avoiding false positives. For each frame, RANSAC generatesk= 11 models and ultimately returns the one with the lowest score.
The path planner generates 9 waypoints to inspect the blades of the wind turbine. Trajectories start with the approach to the hub (H) or rotor, shown in Figure 2.16. Then, to inspect the blade, it generates two equidistant waypoints (A-B, C-D or E-F), to reach its midpoint and endpoint. After the endpoint of the first blade, the path planner generates another waypoint to move back to the hub (H), and this procedure is repeated to inspect the three blades. After following the last waypoint, the controller performs the landing routine. Table 2.3 shows the six permutations of trajectories (t1, t2, t3, t4, t5, t6) for the set of three blades {b1, b2, b3}.
Figure 2.16: Path planning of inspection points for our three-bladed wind turbine.
Trajectory Inspection points
b1 b2 b3
t1 H A B H C D H E F
t2 H A B H E F H C D
t3 H C D H A B H E F
t4 H C D H E F H A B
t5 H E F H A B H C D
t6 H E F H C D H A B
2.3.2
Waypoint controller
Once the path planner has established a set of inspection trajectory points, the UAV follows these 3D waypoints to approach the wind turbine and navigate in front of the rotor blades, capturing video/photos of its surface.
Figure 2.17: Diagram with waypoints for autonomous inspection.
The waypoint controller starts by taking off from the position marked by (T) in Figure 2.17. The UAV flies up to a height of 3 m while it rotates looking for the predetermined yaw angle. Once it reaches both references within an error threshold, 10 cm for the height and 2.5 degrees for the yaw angle, a signal is sent to the metric SLAM system to begin localization (L) and mapping. It starts building a metric map while it moves 1 meter towards the wind turbine. Then, it hovers in that position, pauses mapping, raises the fovea of the camera for detection (D) and plans the inspection trajectory. The UAV lowers the fovea of the camera back to -85 degrees, resumes localization, and proceeds to approach the hub (H) until it reaches a safety distance (S) of 1.6 meters to avoid collisions. At this distance, the UAV carries out the inspection, as shown in Figure 2.16, with one of the possible orderings depicted in Table2.3. The inspection trajectory depends on the order in which LSD detected the line segments.
In order to accomplish this task, we implemented high-level controllers for altitude, rotation and translation. The altitude controller is used to reach the (L) waypoint shown in Figure 2.17, which takes the UAV from the takeoff position (T) to the 3 m altitude reference, and the rotation controller is used to look for the yaw angle reference.
The trajectory is divided in two, the approach to the wind turbine with longitudinal movements and the inspection of the blades with lateral movements. The trans-lation control for autonomous navigation requires the continuous localization from the metric SLAM system to calculate the error between its current pose and the setpoint established by the 3D coordinates of the waypoint. The altitude controller requires the readings from the altimeter for the z-axis signal, while the orientation controller uses the odometry data from the inertial measurement unit (IMU) for the rotation control. The altimeter of the Bebop 2 drone uses a sonar to measure the distance to the ground within the range of [0, 3] meters, and a barometer for higher altitudes. In the scene used for experiments, the ground is covered by grass which slightly affects the altitude readings, while the barometer proved to be reliable, as it displayed altitude readings of 42 meters while positioned at the hub of the KWT300 wind turbine, which has a height of 41.5 meters, taken from its specification files. The IMU is used to determine the orientation of the UAV. During the setup, before the inspection, the yaw angle that points to the wind turbine and centers it in the image must be recorded to establish a yaw reference, as shown in Figure2.17. The waypoint controller uses a proportional-integral (PI) approach, which is a special case of the proportional-integral-derivative (PID) controller in which the derivative (D) term is not used. This type of controller continuously calculates an error value e(t) as the difference between the desired setpoint (SP) and a measured process variable (PV).
the reference (SP), which points to the wind turbine. Using a discrete sampling method, it replaces the continuous form of the integral with a summation of past error values. The integral term seeks to eliminate the residual error by adding a control effect due to the historic cumulative value of the error.
The translation controller also uses a PI control loop for the x, y, and z axes. The process variable (PV) is given by the pose estimation from the metric monocular SLAM system, and it is used to determine the error, or Euclidean distance to the 3D setpoint. Once the errors in the control loop for all three axes are within a 25 cm error threshold, the state machine proceeds to follow the next waypoint.
2.4
Chapter summary
This chapter introduced theoretical foundations for wind turbine inspection using autonomous UAVs. The first section presented the calibration method required for camera-based localization and geometrical projection to the ground plane that enables building maps with a metric scale. The second section featured computer vision techniques used to process images obtaining edges and ultimately detect the lines that correspond to the blades. The last section outlined the pinhole camera model and the equations used by the path planner to backproject 2D detection points to 3D coordinates, generating an inspection trajectory. It also described the autonomous navigation control loop that follows the waypoints from the flight plan.
Chapter 3
Related Work
This section presents a review of the state of the art for wind turbine inspection with UAVs, then focusing on previous research closely related to the three main challenges involved in this task: localization, detection and autonomous navigation.
3.1
Wind turbine inspection with UAVs
While most wind turbine inspection systems are closed source software developed by private companies, there have been some recent academic publications that describe how these teams of researchers approached this challenge. The inspection of wind turbines with UAVs has been addressed using different configurations of sensors, localization methods, detection techniques and navigation controllers. For instance, (Schafer et al., 2016) proposed inspection flights with a previous 3D mapping of the plant, using spline-based path planning with collision avoidance and a distance control system. Their method requires a brief flight preceding the actual inspection, where they generate a point cloud with a 2D LIDAR sensor, represented with octrees. Subsequently, a smooth and collision-free flight path is generated using splines. Their system, based on Robot Operating System (ROS), relies on a predetermined flight plan, GPS for localization and a LIDAR sensor to estimate the distance between the vehicle and the wind turbine to avoid collisions. They validated different aspects of
their system with a Gazebo simulation containing a wind turbine model. They also carried out partial indoor tests performing 3D reconstruction of smaller objects. Recently, (Stokkeland,2014) from the Norwegian University of Science and Technol-ogy (NTNU - Trondheim) discussed in his master’s thesis and article (Stokkeland et al.,2015) the key features used for the perception of wind turbines and designed image processing algorithms for object tracking and local navigation. They proposed a blade detection system that uses the Hough transform for lines to extract features from a wind turbine for inspection tasks. Moreover, they developed a machine vision system to autonomously detect the wind turbine’s relative yaw angle and distance to the blades using the pinhole camera model and coordinate transformations. For this task they considered the geometrical properties of wind turbines that allow the recognition of the rotor blades and how line features can help determine their orien-tation. Their system was able to estimate the distance from the UAV to the wind turbine and extract line features to perceive the orientation of the blades. However, the scope of their research was limited to the initial positioning and recognition phase of an inspection.
Another master’s thesis project (Høglund,2014) from NTNU was focused on design-ing an observer capable of local navigation based on input from image processdesign-ing algorithms such as optical flow and the Hough transform. In particular, they in-vestigated how this can aid visual inspection of wind turbines and buildings with an UAV platform. The observer estimates the relative distance between the target object and the UAV, as well as its metric velocity. For optical flow, they used the Horn-Schunck method and an alternative solution to the aperture problem, a pyra-midal Lucas-Kanade method based on least-squared fitting. With these optical flow techniques, they computed the relative motion between an observer and surrounding objects within its field of view. This phenomenon has been observed in nature, as flies use it for obstacle avoidance and tracking. The relative velocity vector gives an indication of how fast and in which direction the object is moving relative to the UAV. This was used to design a control law capable of tracking the target, by
making the optical flow approach zero. They used the Hough transform to deter-mine the heading and apply yaw control, by detecting the angle between straight lines. Using a priori information about the object, the Hough transform can detect errors in orientation, which are fed back to a yaw controller. Thus, ensuring that the UAV is perpendicular to the object at all times. This is applicable for wind turbine inspection and building inspection, as these objects have prominent straight lines with known geometry. The vector received from one of the previously mentioned computer vision algorithms can be interpreted as a pure pursuit vector. This vector starts at the camera, with a direction determined by either optical flow or Hough transform. In the wind turbine scenario, the UAV can also be controlled to always have the blade in the center of the image. They evaluated their local navigation system in simulation and partially in a real setting, where they tested the hardware components individually (power supply and sensors) as well as the stability of their controller on an UAV platform.
Earlier this year, (Kanellakis et al.,2019) presented a framework for visual inspection around 3D structures using a team of autonomous Micro Aerial Vehicles (MAVs) with robust localization, planning and perception capabilities. Their system re-lied on the onboard computer and sensors of the UAV, with a localization enabled scheme for collaborative aerial inspection of infrastructure based on Ultra WideBand (UWB) distance measurements and Inertial Measurement Unit (IMU) sensor fusion for state estimation. They implemented a Collaborative Coverage Path Planner (CCPP) algorithm with the ability to guarantee full coverage of the infrastructure by considering camera, geometry, collision, and other application posed constraints. The covered path had an overlapping field of view that enabled an offline image stitching and generation of a sparse 3D model of the structure using Structure from Motion (SfM). However, the path planner requires a previously generated 3D model of the wind turbine, which might not match the current configuration of the blades, as they rotate even when stopped for maintenance. Moreover, they reported that while trying to reduce tracking error in outdoor trials, they tuned the controller
aggressively. That kind of tuning resulted in jerky and oscillatory movements with excessive pitching and rolling, due to the controllers trying to compensate the dis-turbances, which proved highly sensitive to the weather conditions. Even though the planner guarantees full coverage, there were variations between the performed trajectory and the reference, due to wind gusts. They stated that there is a need for an online path planner capable of considering these drifts and replanning the path or have a system that can identify neglected surface areas and provide extra trajectories to compensate. Additionally, due to the added payload of sensors, wind gusts and low ambient temperature, the battery autonomy was significantly reduced from the expected 26 minutes, according to the manufacturer, down to only 5 min-utes. They also reported that their low cost LIDAR scanner failed to operate due to sunlight interference with the range measurements. Nonetheless, they consider that this sensor should be further examined as it might be useful in cross-section analysis and obstacle avoidance.
As stated in the introduction, the interpretation of inspection images, and the as-sessment of blade damage and overall physical state of the wind turbine is out of the scope of this work. However, there are computational methods that analyze the collected images, which could aid the wind farm field expert to detect cracks on the surface of the blades. With that objective, (Wang and Zhang, 2017) proposed a data-driven framework to automatically detect wind turbine blade surface cracks based on images taken with UAVs. They used two sets of Haar-like features, the original and extended, to depict crack regions and train a cascading classifier using positive samples (crack images) and negative samples (background images). They developed an extended cascading classifier through stage classifiers from a set of base models, LogitBoost, Decision Trees (DT) and Support Vector Machine (SVM). The effectiveness of their crack detection framework was validated with both UAV-taken images, which were collected from a commercial wind farm in China, and others artificially generated, with a total of 770 images, 200 positive and 570 negative. As future work, they proposed testing real-time detection in video streams.
Authors Localization sensor
Inspection
sensor
Perception Navigation
(Schafer et al.,2016)
GPS
IMU
2D LIDAR
2D LIDAR A priori3D model
Partial indoor tests
Obstacle avoidance
3D reconstruction of
small objects
(Stokkeland,2014)
GPS
IMU
Sonar
GoPro Hero3+ Black Hough transform Initial positioning (yaw) Recognition of wind turbine
(Høglund,2014)
Monocular camera
IMU
Sonar
Laser Range Finder
GoPro Hero3+ Black Hough transform Optical Flow
Pure pursuit guidance
PID control
(Kanellakis et al.,2019) Ultra-wideband AscTec VI
LIDAR
PlayStation Eye
GoPro Hero4
A priori3D model Multi-agent (2) Low altitude 3D reconstruction
(Parlange et al.,2018)
Monocular camera
IMU
Sonar
Barometer
GoPro Hero4 Black
Hough transform
Line Segment Detector
RANSAC
Autonomous navigation
Lateral PI control
Metric Monocular SLAM
Table 3.1: Comparison of UAV systems for wind turbine inspection.
Considering the advantages and drawbacks of the wind turbine inspection systems compared in Table 3.1, we opted not to perform 3D reconstruction, like (Schafer et al.,2016) and (Kanellakis et al.,2019) did, as sparse or even dense point clouds do not provide valuable information to the expert inspector for the detection of damage in the structure. Conversely, we considered the extraction of line features of an object with known geometrical properties, as in the methodology of (Stokkeland et al.,2015) and (Høglund, 2014) from NTNU, very useful for path planning and autonomous navigation based on visual perception. Finally, the output of our inspection system, video and image data, could be analyzed by a computational method, such as the one developed by (Wang and Zhang,2017), to help the human expert detect cracks.