CN114170535A

CN114170535A - Target detection positioning method, device, controller, storage medium and unmanned aerial vehicle

Info

Publication number: CN114170535A
Application number: CN202210128714.5A
Authority: CN
Inventors: 罗巍; 任雪峰
Original assignee: Beijing Zhuoyi Intelligent Technology Co Ltd
Current assignee: Beijing Zhuoyi Intelligent Technology Co Ltd
Priority date: 2022-02-11
Filing date: 2022-02-11
Publication date: 2022-03-11

Abstract

The invention discloses a target detection positioning method, a target detection positioning device, a controller, a storage medium and an unmanned aerial vehicle, wherein the method comprises the following steps: acquiring binocular images acquired by a binocular vision perception module, and constructing a depth map of the binocular images; identifying a target object in the binocular image based on the depth map; determining target world coordinates of a target object in a world coordinate system; and determining the space position parameters of the target object according to the flight data and the target world coordinates of the unmanned aerial vehicle. The invention detects and positions the target object by the unmanned aerial vehicle remote sensing and binocular stereo vision technology, and can accurately realize the detection and positioning of the target object with a complex background.

Description

Target detection positioning method, device, controller, storage medium and unmanned aerial vehicle

Technical Field

The invention relates to the technical field of unmanned aerial vehicles, in particular to an unmanned aerial vehicle-based target detection positioning method, an unmanned aerial vehicle-based target detection positioning device, a controller, a computer-readable storage medium and an unmanned aerial vehicle.

Background

The overhead transmission line connects a power plant, a transformer substation and a consumer to form a power transmission and distribution network. In transmission lines, insulators are widely used in basic equipment to perform the dual functions of electrical insulation and mechanical support. The failure of the insulator directly threatens the stability and the safety of the power transmission line. According to statistics, the accident caused by insulator fault accounts for the highest proportion of the power system fault. Therefore, the insulator state monitoring has important significance for the safety and stability of the power system. The traditional manual detection has high labor cost and low efficiency. In addition, adverse factors such as climate and geographical environment may limit manual inspection, resulting in many potential hazards not being discovered in a timely manner.

At present, the detection and the positioning of the insulator are mainly carried out manually, and obviously, the efficiency is low and the cost is high. To overcome the limitations of manual detection, it is necessary to develop automatic detection techniques to assist or replace manual decision-making. However, images taken by drones often contain a cluttered background including vegetation, rivers, roads, houses, and the like. Further, the insulators in the drawing are different due to the diversity of the insulators, the difference in the illumination conditions, and the photographing angle in the actual inspection scene. These disadvantages make it difficult to detect insulators in aerial images.

Existing methods are based on two-dimensional images of transmission line scenes and generally rely on some simplifying assumptions, especially the size, shape and background of the insulators. However, in practical applications, due to differences in shooting angles and lighting conditions, insulators vary widely in shape, appearance, size and the like, and backgrounds of insulators are complex and variable, which brings great challenges to image recognition research.

Disclosure of Invention

The invention aims to solve the technical problem of providing a target detection and positioning method based on an unmanned aerial vehicle, a target detection and positioning device based on the unmanned aerial vehicle, a controller, a computer readable storage medium and the unmanned aerial vehicle, and realizing automatic identification and positioning of a target object by fusing unmanned aerial vehicle remote sensing and binocular stereoscopic vision technologies.

In order to solve the above technical problem, according to an aspect of the present invention, there is provided a target detection and positioning method based on an unmanned aerial vehicle, including: acquiring binocular images acquired by a binocular vision perception module, and constructing a depth map of the binocular images;

identifying a target object in the binocular image based on the depth map;

determining target world coordinates of the target object in a world coordinate system;

and determining the space position parameters of the target object according to the flight data of the unmanned aerial vehicle and the target world coordinates.

In some embodiments, the step of acquiring binocular images acquired by the binocular vision perception module and constructing the depth map of the binocular images comprises:

performing region segmentation on both the left image and the right image of the binocular image to obtain a left region segmentation image and a right region segmentation image;

extracting a first characteristic point of the left region segmentation image and a second characteristic point of the right region segmentation image;

matching the first characteristic point and the second characteristic point according to the Euclidean distance between the first characteristic point and the second characteristic point to obtain a plurality of groups of original characteristic point pairs;

selecting a sparse disparity point pair in the original characteristic point pair, and calculating a depth value of the sparse disparity point pair;

calculating the projection distribution of the sparse disparity point pairs, and taking the average disparity of the sparse disparity point pairs of each region segmentation image as the disparity value of the corresponding region segmentation image;

and creating a sparse disparity map based on the disparity values to obtain a depth map of the binocular image.

In some embodiments, the selecting a sparse disparity point pair from the original feature point pair and calculating a depth value of the sparse disparity point pair includes:

connecting each group of the original characteristic point pairs, and calculating the slope of a connecting line of each group of the original characteristic point pairs;

taking the slope with the highest occurrence frequency in the plurality of slopes as a main slope;

reserving the original characteristic point pairs corresponding to the slopes which are the same as the main slopes in the plurality of slopes as the sparse disparity point pairs;

and calculating the depth value of the sparse disparity point pair.

In some embodiments, the step of identifying the target object in the binocular image based on the depth map comprises:

fusing according to the image characteristics of the binocular images to obtain a two-dimensional significant map;

using the depth map to improve two-dimensional saliency of the two-dimensional saliency map to obtain a depth saliency map;

and carrying out binarization and skeletonization on the depth significance map, and identifying the target object in the depth significance map according to preset features of the target object.

In some embodiments, the step of determining target world coordinates of the target object in a world coordinate system comprises:

setting the optical center of one camera in the binocular vision perception module as an origin, and setting the optical axis of the camera as a Z axis so as to establish a pixel coordinate system;

determining target pixel coordinates of a target point of the target object in the pixel coordinate system;

and converting the target pixel coordinate into the target world coordinate according to the conversion relation between the pixel coordinate system and the world coordinate system.

In some embodiments, the conversion relationship between the pixel coordinate system and the world coordinate system is:

wherein the target pointTarget world coordinates of

The coordinate of the target point in the pixel coordinate system is (u，v) The coordinate of the origin in the pixel coordinate system is: (u ₀，v ₀) The distance between two cameras of the binocular vision perception module is defined as a baseline distancebThe parallax value of the target pointd=u _i -u _rThe coordinates of the target point in the pixel coordinate systems of the two cameras are（u _i , v _i ）And（u _r ,v _r ）,

x, Y, Z, W is the four-dimensional coordinate value of the target point in the world coordinate system.

In some embodiments, the flight data of the drone includes: longitude and latitude, altitude, pitch angle, azimuth angle, roll angle of the unmanned aerial vehicle and pitch angle of the camera;

the spatial position parameters of the target object comprise: the latitude and longitude of the target object and the height from the sea level.

According to another aspect of the present invention, there is provided an unmanned aerial vehicle-based target detection positioning apparatus, including:

the construction module is configured to acquire binocular images acquired by the binocular vision perception module and construct a depth map of the binocular images;

the identification module is configured to identify a target object in the binocular image based on the depth map;

a determination module configured to determine target world coordinates of the target object in a world coordinate system;

and the positioning module is configured to determine the spatial position parameters of the target object according to the flight data of the unmanned aerial vehicle and the target world coordinates.

In some embodiments, the building module comprises:

the segmentation submodule is configured to perform region segmentation on both the left image and the right image of the binocular image to obtain a left region segmentation image and a right region segmentation image;

an extraction submodule configured to extract a first feature point of the left region segmentation image and a second feature point in the right region segmentation image;

the matching submodule is configured to match the first characteristic point and the second characteristic point according to the Euclidean distance between the first characteristic point and the second characteristic point to obtain a plurality of groups of original characteristic point pairs;

the calculation submodule is configured to select a sparse disparity point pair in the original characteristic point pair and calculate a depth value of the sparse disparity point pair;

the statistic submodule is configured to count the projection distribution of the sparse disparity point pairs, and the average disparity of the sparse disparity point pairs of each region segmentation image is used as the disparity value of the corresponding region segmentation image;

and the creating sub-module is configured to create a sparse disparity map based on the disparity values so as to obtain a depth map of the binocular image.

In some embodiments, the computation submodule comprises:

a slope calculation unit configured to connect each group of the original feature point pairs and calculate a slope of a connection line of each group of the original feature point pairs;

a main slope determination unit configured to take a slope having the highest frequency of occurrence among the plurality of slopes as a main slope;

a sparse disparity point pair determining unit configured to retain, as the sparse disparity point pair, the original feature point pair corresponding to a slope that is the same as the main slope among the plurality of slopes;

a depth value calculation unit configured to calculate depth values of the sparse disparity point pairs.

In some embodiments, the identification module comprises:

the fusion sub-module is configured to perform fusion according to the image characteristics of the binocular images to obtain a two-dimensional significant map;

a refinement submodule configured to refine the two-dimensional saliency of the two-dimensional saliency map using the depth map to obtain a depth saliency map;

and the recognition submodule is configured to binarize and skeletonize the depth significance map, and recognize the target object in the depth significance map according to preset features of the target object.

In some embodiments, the determining module comprises:

the establishing submodule is configured to set an optical center of one camera in the binocular vision perception module as an origin and set an optical axis of the camera as a Z axis so as to establish a pixel coordinate system;

a pixel coordinate determination submodule configured to determine target pixel coordinates of a target point of the target object within the pixel coordinate system;

and the conversion sub-module is configured to convert the target pixel coordinate into the target world coordinate according to the conversion relation between the pixel coordinate system and the world coordinate system.

wherein the target world coordinate of the target point is

According to another aspect of the present invention, there is provided a controller comprising a memory and a processor, the memory storing a computer program, which when executed by the processor, is capable of implementing the steps of any of the above-mentioned drone-based target detection and positioning methods.

According to another aspect of the present invention, there is provided a computer-readable storage medium for storing a computer program, which when executed by a computer or a processor implements the steps of the drone-based target detection and positioning method according to any one of the above.

According to another aspect of the present invention, there is provided an unmanned aerial vehicle, comprising a binocular vision sensing module and the unmanned aerial vehicle-based target detection and positioning method according to any one of the above embodiments.

Compared with the prior art, the invention has obvious advantages and beneficial effects. By means of the technical scheme, the target detection positioning method based on the unmanned aerial vehicle, the target detection positioning device based on the unmanned aerial vehicle, the controller and the computer readable storage medium of the invention and the unmanned aerial vehicle can achieve considerable technical progress and practicability, have industrial wide utilization value and at least have the following advantages:

the invention detects and positions the target object by the unmanned aerial vehicle remote sensing and binocular stereo vision technology, and can accurately realize the detection and positioning of the target object with a complex background.

Secondly, the target object detection algorithm based on the significance of the depth map and the skeleton structure characteristics can accurately detect the target object in the aerial image with the complex background.

And thirdly, the invention can accurately acquire the longitude and latitude of the target object by real-time object space positioning based on binocular stereo vision technology and GPS positioning and coordinate conversion.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more clearly understood, the following preferred embodiments are described in detail with reference to the accompanying drawings.

Drawings

Fig. 1 is a schematic flow chart of a target detection and positioning method based on an unmanned aerial vehicle according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating an imaging geometry model of a camera according to an embodiment of the present invention;

FIG. 3 is a schematic view of the binocular stereo vision combined with unmanned aerial vehicle GPS for spatial positioning of a target object according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of the transformation of the geographic coordinate system and the world coordinate system according to an embodiment of the present invention;

fig. 5 is a schematic block diagram of an unmanned aerial vehicle-based target detection positioning apparatus according to an embodiment of the present invention;

FIG. 6 is a block diagram of the construction of the building block shown in FIG. 5;

FIG. 7 is a block diagram of the compute submodule of FIG. 6;

FIG. 8 is a block diagram of the construction of the building block shown in FIG. 5;

fig. 9 is a block diagram showing the structure of the determination module shown in fig. 5.

Detailed Description

To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description will be given, with reference to the accompanying drawings and preferred embodiments, to specific embodiments and effects of a method for detecting and positioning an object based on an unmanned aerial vehicle, an apparatus for detecting and positioning an object based on an unmanned aerial vehicle, a controller, a computer readable storage medium, and an unmanned aerial vehicle according to the present invention.

The invention provides a target detection and space positioning method aiming at an unmanned aerial vehicle aerial image under a complex background. Firstly, the left image in the binocular vision system is subjected to region segmentation, and sparse disparity points are calculated through stereo matching. And combining the region segmentation result with the sparse parallax point to generate a depth map, and reflecting the influence of the spatial position on the visual significance. Then, a two-stage strategy is proposed to achieve accurate detection of the target. Firstly, a candidate region of the target object is determined by an RGB-D significance detection method fusing a color contrast characteristic, a texture contrast characteristic and a depth characteristic. And then, defining a feature descriptor of a skeleton structure of the target object, performing structure search on the candidate target object region, filtering false targets, and realizing accurate detection of the target object. And finally, acquiring the spatial position of the target object by using binocular stereo vision and the GPS coordinates of the unmanned aerial vehicle.

Based on this, the invention provides a target detection and positioning method based on an unmanned aerial vehicle, as shown in fig. 1, the method comprises:

and step S10, acquiring binocular images acquired by the binocular vision perception module, and constructing a depth map of the binocular images.

Specifically, step S10 includes:

step S101, performing region segmentation on both the left image and the right image of the binocular image to obtain a left region segmentation image and a right region segmentation image.

The excellent region division result can improve the accuracy of the parallax boundary. The invention adopts image segmentation based on the Mask R-CNN, the Mask R-CNN is a new convolutional neural network provided based on the prior Faster R-CNN framework, and the network effectively detects the target and completes high-quality semantic segmentation. The main idea of the Mask R-CNN is to expand the original Faster R-CNN, add a branch and use the existing detection to perform parallel prediction on the target. Meanwhile, the network structure is easy to realize and train, and the speed is high.

According to the invention, based on the Mask R-CNN, the left image and the right image in the binocular image acquired by the binocular vision perception module are subjected to region segmentation, and then the left region segmentation image and the right region segmentation image are obtained.

Step S102, extracting a first characteristic point of the left region segmentation image and a second characteristic point of the right region segmentation image.

In particular, Speeded Up Robust Features (SURF) is a fast and stable algorithm for detecting feature points. In the present invention, SURF is used to extract feature points of the left and right region-segmented images and calculate their 64-dimensional descriptors.

The first and second feature points are first order Haar wavelet response distributions built on x and y directions of image coordinates, rather than gradients, speed is increased by using integral images, and only 64-dimensional descriptors of feature points are used. By adopting the 64-dimensional descriptor, the calculation matching time of the feature points can be effectively reduced, and the robustness of the extraction of the feature points can be improved.

And step S103, matching the first characteristic points and the second characteristic points according to the Euclidean distance between the first characteristic points and the second characteristic points to obtain a plurality of groups of original characteristic point pairs.

And step S104, selecting the sparse disparity point pairs in the original characteristic point pairs, and calculating the depth values of the sparse disparity point pairs.

In step S104, each group of original feature point pairs is connected, the slope of the connection line of each group of original feature point pairs is calculated, the slope with the highest occurrence frequency among the slopes is used as the main slope, the original feature point pair corresponding to the slope identical to the main slope among the slopes is reserved as the sparse disparity point pair, and the depth value of the sparse disparity point pair is calculated.

Some unmatched feature point pairs exist in the multiple sets of original feature point pairs obtained in step S103, and the present invention uses slope consistency to eliminate the unmatched original feature point pairs. First, the original matching point pairs are connected by lines and their slopes in the image coordinate system are calculated. Then, the frequency of the slope ratio is calculated, and the slope with the highest frequency is used as the main slope.Matching point pairs having the same slope value as the main slope are retained and defined as sparse disparity point pairs. Finally, the depth value of the sparse disparity point pair is calculatedZ _spp。

Step S105, counting the projection distribution of the sparse disparity point pairs, and taking the average disparity of the sparse disparity point pairs of each segmented image as the disparity value of the corresponding segmented image.

It is known that the presence of disparity information can be used to estimate the general depth information of an image. Based on the method, the depth map of the binocular image is constructed by the parallax values of the region segmentation image.

Step S106, creating a sparse disparity map based on the region segmentation image disparity value counted in the step S105 to obtain a depth map of the binocular imageD _spp。

In perspective projection imaging models, the mapping of 3D scenes to 2D images is a process where depth information is lost. Therefore, it is necessary to estimate the depth information of the scene from binocular disparity cues. However, the correspondence point matching problem is a difficulty. In consideration of the fact that the speed and the precision of the existing depth calculation method are possibly unstable in practical application, a new depth estimation method is provided. The method constructs a depth map by combining region segmentation results with sparse disparity points. And assuming that the disparity values on the same object are the same, in order to ensure the accuracy of the depth boundary, the obtained region segmentation structure is used for assisting the construction of the depth map.

In step S20, the target object in the binocular image is identified based on the depth map.

Specifically, fusion is carried out according to image features of binocular images to obtain a two-dimensional significant map, the two-dimensional significance of the two-dimensional significant map is improved by adopting a depth map to obtain a depth significant map, binarization and skeletonization are carried out on the depth significant map, and the target object in the depth significant map is identified according to preset features of the target object.

It is known that the importance of a region in an image depends on its difference from the surrounding regions, which is usually reflected in features such as color features, shape features, texture features, etc. The color versus bit and texture contrast features of the present invention are fused to obtain a two-dimensional saliency map. The depth information obtained in the previous process is then used to refine the two-dimensional significance detection results. While most of the salient regions are highlighted in the RGB saliency map computed by fusing color and texture features, some of the superpixels belonging to the background are also highlighted. Some background superpixels may also have high significance if they have high contrast with surrounding superpixels. In order to suppress the background in the significance map, the two-dimensional significance detection result was improved using the depth map obtained in the above process.

Although most of the salient regions are highlighted in the RGB saliency map computed fusing the color and texture features, some of the superpixels belonging to the background are also highlighted. Some background superpixels may also have high significance if they have high contrast with surrounding superpixels. In order to suppress the background in the significance map, the depth map obtained in the above process was used to improve the two-dimensional significance detection result.

The idea of this method is based on the fact that: if the regions have the same (or similar) depth values, their significance values should also be the same (or similar). In addition, objects closest to the viewer will attract more attention and should be assigned a higher significance value. Under these assumptions, the image is layered according to a set of thresholds defined by the depth values, with

Wherein G represents the number of layers. The optimized significance map is calculated as follows:

wherein the content of the first and second substances,depth _gis thatg-thThe depth value of (2) the image layer in the depth map,num _grepresenting the number of superpixels in an image layerI _g。δThe sensitivity of the control weights to spatial distance is empirically set to 0.2.

The invention carries out binarization and skeletonization on the obtained depth significance map. These skeletons still retain important information on the shape and structure of the object.

In this embodiment, an insulator in a power transmission line is taken as an example for explanation, and although the types of insulators in the power transmission line are various, an insulator string has a unique skeleton structure characteristic compared with other pseudo target materials, and the following three points can be summarized:

in the framework structure diagram, a central shaft of an insulator string corresponds to a long straight line, and an insulator cap corresponds to a plurality of short straight lines; all the short straight lines are penetrated by the long straight lines; the short straight lines are substantially equal in length and arranged in parallel in the long straight lines at equal intervals.

The three characteristic descriptors are respectively established, and insulator structure searching is carried out in the significance result on the basis of the three characteristic descriptors, so that accurate detection of the insulator is realized. The structure search procedure is as follows:

and searching a central axis. Use ofHoughAn algorithm detects straight lines in the skeleton image. According to the following formula, a straight line having a length greater than the length 1/3 of the long side of the connected domain circumscribed rectangle is regarded as the center axis of the pseudo-insulator.

L≥

max(length _cr ,weigth _cr)

And searching for the insulating cap. A line vertically bisected by the candidate center axis is searched for and its length and position are recorded. Then, the number of linesnum _lIs counted in. A threshold value is setT _NIf, ifnum _l≥T _NThe candidate target is retained for the third filtering step,T _Nwas set to 6 in the experiment.

And (5) uniformly arranging and judging. Length variance

The length of the stub is calculated to represent the length consistency of the stub. Distance variance

The distance of the short lines is calculated to represent the distance consistency of the short lines. These short straight lines are determined as the skeleton of the insulative cap and are retained if these two parameters satisfy the following formula. Otherwise, the target is judged as a false target and eliminated, so that the insulator can be accurately detected.

Wherein the threshold valueT _SEmpirically set to 5.

In this embodiment, candidate target object regions are obtained based on depth map (RGB-D) saliency detection. Then, the skeleton structure of the candidate target object region is extracted. And defining a characteristic descriptor of the skeleton structure, and performing structure search according to the descriptor to realize the final accurate identification of the target object.

In step S30, the target world coordinates of the target object in the world coordinate system are determined.

Specifically, an optical center of one camera in the binocular vision perception module is set as an origin, an optical axis of the camera is set as a Z axis to establish a pixel coordinate system, target pixel coordinates of a target point of a target object in the pixel coordinate system are determined, and the target pixel coordinates are converted into target world coordinates according to a conversion relation between the pixel coordinate system and the world coordinate system.

The first step of the space positioning of the target object is to obtain the three-dimensional coordinates of the target point in a world coordinate system by using the geometric relationship and through binocular vision model analysis.

The internal and external parameters of the left camera and the right camera of the binocular vision perception module are obtained through a calibration algorithm. Setting the optical center of the left camera as the origin and the optical axis as the

Establishing a world coordinate system

This means that the world coordinate system and the left camera coordinate system

And (4) overlapping. Let the coordinates of the target point P in the world coordinate system be

Then the coordinates in the image pixel coordinate system are: (u _i, v _i) And (a)u _r ,v _r). The distance between two cameras is defined as the baseline distanceb。d=u _i -u _rExpressed as disparity values.

Since the world coordinate system coincides with the left camera coordinate system, it is possible to use a camera with a view to the leftZ _ωEqual to the perpendicular distance between the object and the two camera baselinesZ _c. As shown in fig. 2, according to the binocular vision principle and the triangle similarity theorem, the transformation relationship between the pixel coordinate system and the world coordinate system can be obtained as follows:

wherein: target world coordinates of the target point are

The coordinate of the target point in the pixel coordinate system is (u，v) The coordinate of the origin in the pixel coordinate system is (u ₀，v ₀) Of, twoThe distance between two cameras of the visual perception module is defined as the baseline distancebParallax value of target pointd=u _i -u _rThe coordinates of the target point in the pixel coordinate systems of the two cameras are（u _i , v _i ）And（u _r ,v _r ）,

And step S40, determining the space position parameters of the target object according to the flight data of the unmanned aerial vehicle and the target world coordinates.

Wherein, unmanned aerial vehicle's flight data includes: longitude and latitude, altitude, pitch angle, azimuth angle, roll angle of the unmanned aerial vehicle and pitch angle of the camera; the spatial position parameters of the target include: latitude and longitude of the target object and height from sea level.

Specifically, the obtained space information of the target object is combined with flight data of the unmanned aerial vehicle, and the longitude, the latitude and the height of the target object are calculated. The industrial personal computer is used for receiving flight data of the unmanned aerial vehicle in real time and extracting information required by target positioning, including longitude and latitude (longitude and latitude) of the unmanned aerial vehiclelong ₁ ,lat ₁) Altitude of seah _UAVAngle of pitchαAzimuth angleβAngle of rollγAnd the pitch angle of the cameraθ. As shown in figure 3 of the drawings,

is a world coordinate system and is characterized by that,

is the left camera coordinate system when the camera pitch angle is zero. The two coordinate systems are transformed as follows:

unmanned plane in coordinate system

Coordinates of (5)

Can be obtained by manual measurement according to the installation position of the camera. Then we calculate the coordinates of the drone in the world coordinate system using the following expression

. As shown in fig. 3, a coordinate system

Is/are as follows

The axis is directed in the vertical direction,

shaft and

the axis lies in the horizontal plane. Coordinate system

Wound around

The shaft being rotated through an angle gamma and then wound

The shaft rotates by an angle

This can be related to a coordinate system

And (4) overlapping. Coordinate system

And a coordinate system

The conversion relationship between is calculated as follows:

let

Representing the coordinates of the target object in a world coordinate system,

and

respectively represent coordinate systems

Coordinates of the target and drone. The height difference between the drone and the target may be calculated as:

obviously, the height of the target

。

As shown in fig. 4, the longitude and latitude of the target object are calculated from the longitude and latitude of the drone and the relative positional relationship between the target object and the drone(s) ((long ₂ ，lat ₂）。

Wherein the content of the first and second substances,

representing the horizontal distance between the drone and the target,

representing the azimuth of the connecting line between the drone and the target, R is the radius of the earth.

The invention adopts a novel insulator space positioning method combining binocular stereo vision and GPS. The main goal of unmanned aerial vehicle inspection target positioning is to match pixel coordinates of a target in a two-dimensional image with coordinates in a real scene, such as GPS coordinates. And calculating a conversion matrix among an image coordinate system, a world coordinate system and a geographic coordinate system according to real-time flight data and equipment parameters of the unmanned aerial vehicle, and then obtaining the longitude and latitude and the height of the object through coordinate conversion.

In another embodiment of the present invention, as shown in fig. 5, an object detecting and positioning device based on an unmanned aerial vehicle includes: a building module 10, an identification module 20, a determination module 30 and a location module 40.

The construction module 10 is configured to acquire binocular images acquired by the binocular vision perception module and construct a depth map of the binocular images.

Specifically, as shown in fig. 6, the building block 10 includes: a segmentation sub-module 101, an extraction sub-module 102, a matching sub-module 103, a calculation sub-module 104, a statistics sub-module 105 and a creation sub-module 106.

The segmentation sub-module 101 is configured to perform region segmentation on both the left image and the right image of the binocular image to obtain a left region segmentation image and a right region segmentation image.

The excellent region division result can improve the accuracy of the parallax boundary. The segmentation submodule 101 of the invention adopts image segmentation based on Mask R-CNN, the Mask R-CNN is a new convolutional neural network proposed based on the prior Faster R-CNN architecture, and the network can effectively detect the target and complete high-quality semantic segmentation. The main idea of the Mask R-CNN is to expand the original Faster R-CNN, add a branch and use the existing detection to perform parallel prediction on the target. Meanwhile, the network structure is easy to realize and train, and the speed is high.

The segmentation submodule 101 performs region segmentation on both the left image and the right image in the binocular image acquired by the binocular vision perception module based on the Mask R-CNN, and then obtains a left region segmentation image and a right region segmentation image.

The extraction sub-module 102 is configured to extract a first feature point of the left region-segmented image and a second feature point of the right region-segmented image.

In particular, Speeded Up Robust Features (SURF) is a fast and stable algorithm for detecting feature points. In the present invention, the extraction sub-module 102 extracts feature points of the left and right region-segmented images using SURF and calculates 64-dimensional descriptors thereof.

The matching sub-module 103 is configured to match the first feature point and the second feature point according to an euclidean distance between the first feature point and the second feature point, so as to obtain a plurality of groups of original feature point pairs.

The computation submodule 104 is configured to select a sparse disparity point pair from the pair of original feature points, and compute a depth value of the sparse disparity point pair.

Specifically, as shown in fig. 7, the calculation submodule 104 includes: a slope calculation unit 1041, a main slope determination unit 1042, a sparse disparity point pair determination unit 1043, and a depth value calculation unit 1044,

the slope calculating unit 1041 is configured to connect each group of the original feature point pairs, and calculate a slope of a connection line of each group of the original feature point pairs. A main slope determining unit 1042 configured to take a slope having the highest frequency of occurrence among the plurality of slopes as a main slope. A sparse disparity point pair determining unit 1043 configured to retain, as the sparse disparity point pair, the original feature point pair corresponding to a slope that is the same as the main slope among the plurality of slopes. A depth value calculation unit 1044 configured to calculate depth values of the sparse disparity point pairs.

Some unmatched feature point pairs exist in the multiple sets of original feature point pairs obtained by the matching sub-module 103, and the unmatched original feature point pairs are eliminated by using slope consistency. First, the original matching point pairs are connected by lines and their slopes in the image coordinate system are calculated. Then, the frequency of the slope ratio is calculated, and the slope with the highest frequency is used as the main slope. Matching point pairs having the same slope value as the main slope are retained and defined as sparse disparity point pairs. Finally, the depth value of the sparse disparity point pair is calculatedZ _spp。

The statistics submodule 105 is configured to count the projection distribution of the sparse disparity point pairs, and take the average disparity of the sparse disparity point pairs of each region segmentation image as the disparity value of the corresponding region segmentation image.

The creating sub-module 106 is configured to create a sparse disparity map based on the disparity values to obtain a depth map of the binocular image.

The recognition module 20 is configured to recognize a target object in the binocular image based on the depth map.

Specifically, as shown in fig. 8, the identification module 20 includes: a fusion submodule 201, an improvement submodule 202 and an identification submodule 203.

Wherein, the fusion sub-module 201 is configured to perform fusion according to the image characteristics of the binocular images to obtain a two-dimensional saliency map. The refinement submodule 202 is configured to refine the two-dimensional saliency of the two-dimensional saliency map using the depth map to obtain a depth saliency map. The recognition submodule 203 is configured to binarize and skeletonize the depth significance map, and recognize the target object in the depth significance map according to preset features of the target object.

It is known that the importance of a region in an image depends on its difference from the surrounding regions, which is usually reflected in features such as color features, shape features, texture features, etc. The color versus bit and texture contrast features of the present invention are fused to obtain a two-dimensional saliency map. The depth information obtained in the previous process is then used to refine the two-dimensional significance detection results. While most of the salient regions are highlighted in the RGB saliency map computed by fusing color and texture features, some of the superpixels belonging to the background are also highlighted. Some background superpixels may also have high significance if they have high contrast with surrounding superpixels. In order to suppress the background in the saliency map, the depth map obtained in the previous process was used to improve the two-dimensional saliency detection result.

Although most of the salient regions are highlighted in the RGB saliency map computed fusing the color and texture features, some of the superpixels belonging to the background are also highlighted. Some background superpixels may also have high significance if they have high contrast with surrounding superpixels. In order to suppress the background in the significance map, the depth map obtained in the above process was used to improve the two-dimensional significance detection result. Such a methodThe idea of the law is based on the fact that: if the regions have the same (or similar) depth values, their significance values should also be the same (or similar). In addition, objects closest to the viewer will attract more attention and should be assigned a higher significance value. From these assumptions, we define a set of thresholds for layering images according to depth values, with

=

whereindepth _gIs thatg-thThe depth value of (2) the image layer in the depth map,num _grepresenting the number of superpixels in an image layerI _g。δThe sensitivity of the control weights to spatial distance is empirically set to 0.2.

L≥

max(length _cr ,weigth _cr)

And (5) uniformly arranging and judging. Length variance

Wherein the threshold valueT _SEmpirically set to 5.

The determination module 30 is configured to determine target world coordinates of the target object in a world coordinate system.

Specifically, as shown in fig. 9, the determination module 30 includes: a build sub-module 301, a pixel coordinate determination sub-module 302 and a conversion sub-module 303.

Wherein the establishing sub-module 301 is configured to set an optical center of one camera in the binocular vision sensing module as an origin and an optical axis of the camera as a Z-axis to establish a pixel coordinate system. The pixel coordinate determination submodule 302 is configured to determine target pixel coordinates of a target point of the target object within the pixel coordinate system. The conversion sub-module 303 is configured to convert the target pixel coordinate into the target world coordinate according to a conversion relationship between a pixel coordinate system and a world coordinate system.

Establishing a world coordinate system

wherein: target world coordinates of the target point are

The coordinate of the target point in the pixel coordinate system is (u，v) The coordinate of the origin in the pixel coordinate system is (u ₀，v ₀) The distance between two cameras of the binocular vision perception module is defined as a baseline distancebParallax value of target pointd=u _i -u _rThe coordinates of the target point in the pixel coordinate systems of the two cameras are（u _i , v _i ）And（u _r ,v _r ）,

x, Y, Z, W is meshAnd the four-dimensional coordinate value of the punctuation in the world coordinate system.

The positioning module 40 is configured to determine spatial location parameters of the target object based on the flight data of the drone and the target world coordinates.

Wherein, unmanned aerial vehicle's flight data includes: longitude and latitude, altitude, pitch angle, azimuth angle, roll angle of the unmanned aerial vehicle and pitch angle of the camera; the spatial position parameters of the target object comprise: the latitude and longitude of the target object and the height from the sea level.

is a world coordinate system and is characterized by that,

unmanned plane in coordinate system

Coordinates of (5)

Can be obtained by manual measurement according to the installation position of the camera. Then we calculate the coordinates of the drone in the world coordinate system using the expression as follows

. As shown in fig. 3, a coordinate system

Is/are as follows

The axis is directed in the vertical direction,

shaft and

the axis lies in the horizontal plane. Coordinate system

Wound around

The shaft being rotated through an angle gamma and then wound

The axis being rotated through an angle which may be related to the coordinate system

And (4) overlapping. Coordinate system

And a coordinate system

The conversion relationship between is calculated as follows:

let

Representing the coordinates of the insulator in the world coordinate system,

and

respectively represent coordinate systems

obviously, the height of the insulator

。

Wherein the content of the first and second substances,

representing the horizontal distance between the drone and the target,

A controller according to another embodiment of the present invention includes a memory and a processor, the memory stores a computer program, and the program can realize the steps of the control method of tethered drone according to any embodiment when executed by the processor.

A computer-readable storage medium of another embodiment of the present invention stores a computer program which, when executed by a computer or a processor, implements the steps of the control method of tethered drone of any embodiment.

The unmanned aerial vehicle of another embodiment of the invention comprises a binocular vision perception module and the target detection positioning device based on the unmanned aerial vehicle of any one of the embodiments.

Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An unmanned aerial vehicle-based target detection and positioning method is characterized by comprising the following steps:

acquiring binocular images acquired by a binocular vision perception module, and constructing a depth map of the binocular images;

identifying a target object in the binocular image based on the depth map;

2. The unmanned aerial vehicle-based target detection and positioning method of claim 1, wherein the step of acquiring binocular images acquired by a binocular vision perception module and constructing a depth map of the binocular images comprises:

3. The unmanned aerial vehicle-based target detection and positioning method of claim 2, wherein the step of selecting sparse disparity point pairs from the original feature point pairs and calculating depth values of the sparse disparity point pairs comprises:

and calculating the depth value of the sparse disparity point pair.

4. The unmanned aerial vehicle-based target detection and positioning method of claim 1, wherein the step of identifying the target object in the binocular image based on the depth map comprises:

5. The drone-based target detection and positioning method of claim 1, wherein the step of determining target world coordinates of the target object in a world coordinate system comprises:

6. The unmanned aerial vehicle-based target detection and positioning method of claim 5, wherein the conversion relationship between the pixel coordinate system and the world coordinate system is as follows:

wherein the target world coordinate of the target point is

7. The drone-based target detection and positioning method of claim 1, wherein the flight data of the drone includes: longitude and latitude, altitude, pitch angle, azimuth angle, roll angle of the unmanned aerial vehicle and pitch angle of the camera;

8. The utility model provides a target detection positioner based on unmanned aerial vehicle which characterized in that includes:

9. The drone-based object detection and positioning device of claim 8, wherein the building module comprises:

10. The drone-based object detection and positioning device of claim 9, wherein the computation submodule includes:

11. The drone-based object detection and positioning device of claim 8, wherein the identification module comprises:

12. The drone-based object detection and positioning device of claim 8, wherein the determination module comprises:

13. The drone-based target detection and positioning device of claim 12, wherein the transformation relationship between the pixel coordinate system and the world coordinate system is:

wherein the target world coordinate of the target point is

14. The drone-based object detection and positioning device of claim 8, wherein the flight data of the drone includes: longitude and latitude, altitude, pitch angle, azimuth angle, roll angle of the unmanned aerial vehicle and pitch angle of the camera;

15. A controller comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, is capable of carrying out the steps of the method of any one of claims 1 to 7.

16. A computer-readable storage medium for storing a computer program which, when executed by a computer or processor, implements the steps of the method of any one of claims 1 to 7.

17. An unmanned aerial vehicle comprising a binocular vision perception module and the unmanned aerial vehicle-based target detection and positioning apparatus of any one of claims 8-14.