CN110415297B

CN110415297B - Positioning method and device and unmanned equipment

Info

Publication number: CN110415297B
Application number: CN201910629969.8A
Authority: CN
Inventors: 杨立荣
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2019-07-12
Filing date: 2019-07-12
Publication date: 2021-11-05
Anticipated expiration: 2039-07-12
Also published as: CN110415297A

Abstract

The application provides a positioning method, a positioning device and unmanned equipment, wherein one specific implementation mode of the method comprises the following steps: extracting visual image characteristics aiming at the current target visual image; searching a target point cloud image feature matched with the visual image feature from a pre-constructed feature library; the feature library comprises point cloud image features extracted aiming at multi-frame laser point cloud data; determining target positioning data associated with the target point cloud image features; current positioning information is determined based on the object positioning data. According to the embodiment, the pose is not required to be estimated firstly, and then positioning is carried out based on the estimated pose, so that the accuracy of positioning information is improved, and the requirement of the unmanned equipment on the positioning accuracy is met.

Description

Positioning method and device and unmanned equipment

Technical Field

The application relates to the technical field of unmanned driving, in particular to a positioning method and device and unmanned equipment.

Background

At present, in terms of positioning of unmanned equipment, in general, in some specific area environments, a visual image is acquired by using a visual image pickup device, and a pose tag corresponding to the visual image is determined, so as to obtain a training data set, and a target model is obtained by training using the training data set. When the unmanned equipment is positioned, the visual image acquired by the unmanned equipment at present is input to the target model to obtain the current pose information of the unmanned equipment, and the unmanned equipment is positioned based on the current pose information of the unmanned equipment. However, positioning by the above method requires estimating the pose first and then positioning based on the estimated pose, so that the obtained positioning information has low accuracy and is difficult to meet the requirement of the unmanned equipment on positioning accuracy.

Disclosure of Invention

In order to solve one of the above technical problems, the present application provides a positioning method, a positioning device and an unmanned device.

According to a first aspect of embodiments of the present application, there is provided a positioning method, including:

extracting visual image characteristics aiming at the current target visual image;

searching a target point cloud image feature matched with the visual image feature from a pre-constructed feature library; the feature library comprises point cloud image features extracted aiming at multi-frame laser point cloud data;

determining target positioning data associated with the target point cloud image features;

current positioning information is determined based on the object positioning data.

Optionally, the point cloud image features are extracted by using a pre-trained first target neural network;

the visual image features are extracted by utilizing a pre-trained second target neural network;

wherein the first target neural network and the second target neural network satisfy the following condition: for any positioning point, the difference between the point cloud image features extracted by the first target neural network aiming at the laser point cloud data corresponding to the positioning point and the visual image features extracted by the second target neural network aiming at the visual image corresponding to the positioning point is within a preset error range.

Optionally, the first target neural network and the second target neural network are both deep convolutional neural networks;

extracting point cloud image features with the first target neural network by: inputting a frame of laser point cloud data into the first target neural network, and outputting point cloud image features corresponding to the frame of laser point cloud data by a deconvolution layer of the first target neural network;

extracting visual image features using the second target neural network by: and inputting a frame of visual image into the second target neural network, and outputting the visual image characteristics corresponding to the frame of visual image by the deconvolution layer of the second target neural network.

Optionally, the first target neural network and the second target neural network are trained by:

determining a sample sequence, wherein the sample sequence comprises a frame of sample laser point cloud data, a frame of sample visual image and a group of semantic labels corresponding to each acquisition time in a plurality of acquisition times;

iteratively executing the updating operation of the first neural network and the second neural network based on the sample sequence until a stopping condition is met, and respectively taking the first neural network and the second neural network after iterative updating as the first target neural network and the second target neural network; the update operation includes:

inputting sample laser point cloud data corresponding to any randomly selected acquisition time in the sample sequence into a current first neural network to obtain sample point cloud image characteristics and first semantics; the sample point cloud image features and the first semantics are respectively output by a current deconvolution layer and an output layer of a first neural network;

inputting the sample visual image corresponding to the acquisition time in the sample sequence into a current second neural network to obtain the sample visual image characteristics and a second semantic meaning; the sample visual image features and the second semantics are respectively output by a deconvolution layer and an output layer of a current second neural network;

determining a semantic label corresponding to the acquisition time in the sample sequence as a current semantic label;

updating a current first neural network and a current second neural network according to the sample point cloud image feature, the first semantic, the sample visual image feature, the second semantic and the current semantic label;

wherein the stop condition includes: a difference function convergence between the first semantic and the current semantic tag, a difference function convergence between the second semantic and the current semantic tag, and a difference function convergence between the sample point cloud image feature and the sample visual image feature.

Optionally, before extracting the visual image feature for the current target visual image, the method further includes:

acquiring a currently acquired visual image;

and converting the currently acquired visual image into a target visual image meeting preset shooting conditions.

Optionally, the preset shooting condition includes preset illumination and/or a preset shooting angle.

Optionally, the converting the currently acquired visual image into a target visual image meeting a preset shooting condition includes:

inputting the currently acquired visual image into a generator trained in advance to obtain a target visual image meeting preset shooting conditions; wherein the generator is obtained by training a generative confrontation network.

According to a second aspect of embodiments of the present application, there is provided a positioning apparatus, including:

the extraction module is used for extracting visual image characteristics aiming at the current target visual image;

the searching module is used for searching the target point cloud image characteristics matched with the visual image characteristics from a pre-constructed characteristic library; the feature library comprises point cloud image features extracted aiming at multi-frame laser point cloud data;

the determining module is used for determining target positioning data associated with the target point cloud image features;

and the positioning module is used for determining the current positioning information based on the object positioning data.

According to a third aspect of embodiments herein, there is provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method of any one of the above first aspects.

According to a fourth aspect of embodiments of the present application, there is provided an unmanned aerial vehicle comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the method of any one of the first aspect above.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

according to the positioning method and device provided by the embodiment of the application, the visual image features are extracted aiming at the current target visual image, the target point cloud image features matched with the visual image features are searched from a pre-constructed feature library, and the feature library comprises the point cloud image features extracted aiming at multi-frame laser point cloud data. And determining target positioning data associated with the target point cloud image features, and determining current positioning information based on the target positioning data. Therefore, the pose is not required to be estimated firstly, and then the positioning is carried out based on the estimated pose, so that the accuracy of the positioning information is improved, and the requirement of the unmanned equipment on the positioning accuracy is met.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

FIG. 1 is a flow chart illustrating a method of positioning according to an exemplary embodiment of the present application;

FIG. 2 is a flow chart illustrating another positioning method according to an exemplary embodiment of the present application;

FIG. 3 is a block diagram of a positioning device shown in the present application according to an exemplary embodiment;

FIG. 4 is a block diagram of another positioning device shown in the present application according to an exemplary embodiment;

FIG. 5 is a block diagram of another positioning device shown in the present application according to an exemplary embodiment;

FIG. 6 is a schematic diagram of an unmanned device according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

As shown in fig. 1, fig. 1 is a flow chart illustrating a positioning method according to an exemplary embodiment, which may be applied in an unmanned device. Those skilled in the art will appreciate that the drone may include, but is not limited to, an unmanned vehicle, an unmanned robot, a drone, an unmanned ship, and the like. The method comprises the following steps:

in step 101, visual image features are extracted for a current target visual image.

In this embodiment, an image capturing device (e.g., a camera or the like) is installed on the unmanned device, and the image capturing device may be used to capture a visual image of the environment around the unmanned device in real time and perform positioning using the captured visual image.

In one implementation, the current visual image acquired by the image acquisition device can be used as the current target visual image. In another implementation manner, the current visual image acquired by the image acquisition device may be converted, and the obtained visual image after the conversion is used as the current target visual image. It is to be understood that the present application is not limited in this respect.

In this embodiment, the visual image features may be extracted for the current target visual image. The visual image features may be extracted in any reasonable manner known in the art and that may occur in the future. Optionally, a machine learning mode may be adopted, and a pre-trained neural network model is utilized to extract visual image features corresponding to the target visual image. It is to be understood that the present application is not limited in the particular manner in which the visual image features are extracted.

In step 102, target point cloud image features matching the visual image features are searched from a pre-constructed feature library.

In this embodiment, the feature library may include point cloud image features extracted for a plurality of frames of laser point cloud data. Specifically, the above feature library may be constructed in advance as follows: first, a sensor such as a laser radar and a positioning device may be mounted on the test apparatus. Then, the test equipment is driven under a specific scene (namely, a scene needing to be positioned by applying the method provided by the application). Meanwhile, laser radar can be adopted to collect laser point cloud data of the surrounding environment of the test equipment in real time, and positioning data is collected by a positioning device at the same time when the laser point cloud data is collected. Then, for each frame of collected laser point cloud data, corresponding point cloud image features can be extracted. And storing the point cloud image characteristics corresponding to each frame of laser point cloud data and the positioning data corresponding to the frame of laser point cloud data in an associated manner to obtain the characteristic library.

Wherein the point cloud image features may be extracted in any reasonable manner known in the art and that may occur in the future. Optionally, a machine learning mode may be adopted, and a pre-trained neural network model is used to extract point cloud image features corresponding to the laser point cloud data. It is to be understood that the present application is not limited to the specific manner of extracting the point cloud image features.

Alternatively, a feature library corresponding to each region may be constructed in advance for a plurality of different regions. And when positioning is carried out, selecting a feature library corresponding to the area according to the area where the unmanned equipment is located.

It should be noted that the test device may be a manually driven device, which is the same type of device as the unmanned device to which the present application is applied. For example, if the unmanned vehicle to which the present application is applied is an unmanned vehicle, the above-described test device may be a vehicle of the same type as the above-described unmanned vehicle. For another example, if the unmanned aerial vehicle to which the present application is applied is an unmanned aerial vehicle, the above-described test device may be a flight device of the same type as the above-described unmanned aerial vehicle. For another example, if the unmanned aerial vehicle to which the present application is applied is an unmanned robot, the above-described test device may be a robot of the same type as the above-described unmanned robot.

It should be noted that, although the visual image and the laser point cloud data are different types of data, both the visual image and the laser point cloud data can express the information of the surrounding environment and the image of the surrounding object in different ways. Therefore, the visual image and the laser point cloud data must contain the same type of information. For the visual image and the laser point cloud data collected at the same positioning point, the same characteristic information can be extracted from the visual image and the laser point cloud data respectively. Therefore, the visual image and the laser point cloud data are subjected to feature extraction to obtain visual image features and point cloud image features, and the visual image features and the point cloud image features can be used for matching so as to perform positioning.

In this embodiment, after the visual image feature corresponding to the target visual image is obtained, the target point cloud image feature matching the visual image feature may be searched from a pre-constructed feature library. Specifically, each point cloud image feature in the feature library may be traversed and the visual image feature may be compared to each point cloud image feature. And selecting the point cloud image feature with the maximum similarity with the visual image feature and the similarity greater than or equal to the preset similarity as the target point cloud image feature matched with the visual image feature.

In step 103, object location data associated with the object point cloud image features is determined.

In step 104, current positioning information is determined based on the object positioning data.

In this embodiment, when the feature library is constructed, the point cloud image features corresponding to each frame of laser point cloud data and the positioning data corresponding to the frame of laser point cloud data are stored in an associated manner, so that the positioning data associated with the target point cloud image features can be found from the feature library and used as the target positioning data. Then, the object location data may be used as the current location information, or the object location data may be slightly adjusted and the adjusted object location data may be used as the current location information.

According to the positioning method provided by the embodiment of the application, the visual image features are extracted aiming at the current target visual image, the target point cloud image features matched with the visual image features are searched from a pre-constructed feature library, and the feature library comprises the point cloud image features extracted aiming at multi-frame laser point cloud data. And determining target positioning data associated with the target point cloud image features, and determining current positioning information based on the target positioning data. Therefore, the pose is not required to be estimated firstly, and then the positioning is carried out based on the estimated pose, so that the accuracy of the positioning information is improved, and the requirement of the unmanned equipment on the positioning accuracy is met.

In some optional embodiments, the point cloud image features are extracted by using a first pre-trained target neural network, and the visual image features are extracted by using a second pre-trained target neural network. Wherein the first target neural network and the second target neural network satisfy the following conditions: for any positioning point, the difference between the point cloud image features extracted by the first target neural network aiming at the laser point cloud data corresponding to the positioning point and the visual image features extracted by the second target neural network aiming at the visual image corresponding to the positioning point is within a preset error range.

Specifically, in this embodiment, when the feature library is pre-constructed, corresponding point cloud image features need to be extracted for each frame of collected laser point cloud data, and the point cloud image features corresponding to each frame of laser point cloud data may be extracted by using a pre-trained first target neural network. When positioning is performed based on the feature library, the visual image features need to be extracted for the current target visual image, and the visual image features corresponding to the target visual image can be extracted by using a pre-trained second target neural network.

Wherein the first target neural network and the second target neural network can be obtained by training together, and the first target neural network and the second target neural network need to satisfy the following conditions: for any positioning point (for example, a positioning point under a specified scene which can be positioned by applying the method provided by the application), if the first target neural network is utilized, the point cloud image features are extracted according to the laser point cloud data corresponding to the positioning point. And extracting visual image features aiming at the visual images corresponding to the positioning points by utilizing a second target neural network. Then, the difference between the point cloud image features extracted by the first target neural network and the visual image features extracted by the second target neural network is within a preset error range.

It should be noted that, if the image acquisition device and the laser radar are installed on the testing device, for any one of the positioning points, the testing device simultaneously adopts the image acquisition device and the laser radar to respectively acquire the visual image and the laser point cloud data on the positioning point. And the laser point cloud data acquired by the testing equipment on the positioning point is the laser point cloud data corresponding to the positioning point. And the visual image acquired by the testing equipment on the positioning point is the visual image corresponding to the positioning point.

Because this embodiment utilizes first target neural network and second target neural network to draw point cloud image feature and visual image feature respectively for the point cloud image feature and the visual image feature dimension of drawing are high, and the expressive ability is strong, and the information that contains is abundanter, and the interference killing feature is stronger. In addition, because the first target neural network and the second target neural network meet the conditions, when the target point cloud image characteristics matched with the visual image characteristics are searched in the characteristic library, the searching can be directly and rapidly carried out based on the similarity between the visual image characteristics and the point cloud image characteristics, and the searching efficiency and the searching accuracy are improved.

In other optional embodiments, the first target neural network and the second target neural network may further satisfy the following condition: for any positioning point, a first semantic segmentation result is obtained by using the first target neural network and aiming at the laser point cloud data corresponding to the positioning point (the laser point cloud data corresponding to the positioning point is input to the first target neural network, and the first semantic segmentation result output by the output layer of the first target neural network can be obtained). And obtaining a second semantic segmentation result aiming at the visual image corresponding to the positioning point by using a second target neural network (the visual image corresponding to the positioning point is input into the second target neural network, and the second semantic segmentation result output by an output layer of the second target neural network can be obtained). And the difference between the first semantic segmentation result and the second semantic segmentation result is within a preset difference range.

In this embodiment, the first target neural network and the second target neural network further satisfy the above conditions, so that feature information expressed by the visual image feature point cloud image features extracted by using the first target neural network and the second target neural network is closer. When the target point cloud image characteristics matched with the visual image characteristics are searched in the characteristic library, the searching efficiency and the searching accuracy are further improved.

In other alternative embodiments, the first target neural network and the second target neural network are both deep convolutional neural networks.

In the present embodiment, the first target neural network may be a deep convolutional neural network having a function of processing a three-dimensional image, and the second target neural network may be a deep convolutional neural network having a function of processing a two-dimensional image.

In this embodiment, the point cloud image feature may be extracted by using the first target neural network as follows: inputting a frame of laser point cloud data into a first target neural network, and outputting point cloud image features corresponding to the frame of laser point cloud data by a deconvolution layer of the first target neural network.

In this embodiment, the visual image feature may be extracted using the second target neural network by: and inputting a frame of visual image into a second target neural network, and outputting visual image characteristics corresponding to the frame of visual image by a deconvolution layer of the second target neural network.

In the embodiment, the point cloud image feature and the visual image feature are obtained by using the deconvolution layers of the first target neural network and the second target neural network respectively. The point cloud image features and the visual image features contain more object edge detail information, and when the target point cloud image features matched with the visual image features are searched in the feature library, the searching efficiency and the searching accuracy are improved.

In other alternative embodiments, the first target neural network and the second target neural network are trained together using the same training data set, and the training processes of the first target neural network and the second target neural network are not independent.

Specifically, first, a laser radar and an image acquisition device may be mounted on the sample acquisition apparatus. Then, the sample collecting device is driven in a specific scene, and the laser radar and the image collecting device are adopted to respectively collect sample laser point cloud data and sample visual images aiming at the surrounding environment at a plurality of preset collecting moments. Then, a set of semantic labels corresponding to the acquisition time can be obtained for each acquisition time, and the semantic labels can be the result of labeled image semantic segmentation obtained for the sample laser point cloud data and the sample visual image corresponding to the acquisition time. And finally, obtaining a sample sequence, wherein the sample sequence corresponds to a plurality of acquisition moments, the sample sequence comprises sample data corresponding to each acquisition moment, and the sample data corresponding to any acquisition moment comprises a frame of sample laser point cloud data, a frame of sample visual image and a group of semantic labels.

It should be noted that the sample collection device may be a manually driven device, which is the same type of device as the unmanned device to which the present application is applied.

In one implementation, a first target neural network and a second target neural network may be trained simultaneously based on the sequence of samples. Specifically, based on the sample sequence, the updating operation on the first neural network and the second neural network can be iteratively performed until a stopping condition is satisfied, and the first neural network and the second neural network after the iterative updating are respectively used as a first target neural network and a second target neural network.

The update operation includes: firstly, inputting a frame of sample laser point cloud data corresponding to any acquisition time randomly selected from the sample sequence into a current first neural network to obtain sample point cloud image characteristics and a first semantic meaning. The sample point cloud image features are output by a deconvolution layer of the current first neural network, and the first semantics are output by an output layer of the current first neural network. And inputting a frame of sample visual image corresponding to the acquisition time in the sample sequence to a current second neural network to obtain the sample visual image characteristics and a second semantic meaning. Wherein the sample visual image feature is output by a deconvolution layer of the current second neural network, and the second semantic is output by an output layer of the current second neural network. And obtaining the semantic label corresponding to the acquisition time as the current semantic label.

Then, the current first neural network and the current second neural network are updated (i.e. network parameters in the current first neural network and the current second neural network are adjusted) according to the sample point cloud image feature, the first semantic, the sample visual image feature, the second semantic and the current semantic label. And after the updating, judging whether the stop condition is met. If the stop condition is not satisfied, the update operation continues. And if the stopping condition is met, stopping executing the updating operation, and taking the first neural network and the second neural network after iterative updating as a first target neural network and a second target neural network.

Due to the implementation mode, the first target neural network and the second target neural network are put together and trained simultaneously in the mode, so that the information content of the visual features extracted by the second target neural network is richer, and the visual feature information can be consistent with the point cloud image feature dimension extracted by the first target neural network. When the target point cloud image characteristics matched with the visual image characteristics are searched in the characteristic library, the searching efficiency and the searching accuracy are improved.

In another implementation, the sample sequence may be used to train to obtain a first target neural network, and then a second target neural network may be trained based on the sample sequence and the first target neural network. Or, the sample sequence is firstly used for training to obtain a second target neural network, and then the first target neural network is obtained through training based on the sample sequence and the second target neural network.

It is understood that the first target neural network and the second target neural network may be obtained by other training methods, and the present application is not limited to the specific training methods for the first target neural network and the second target neural network.

FIG. 2, as shown in FIG. 2, illustrates a flow chart of another positioning method according to an exemplary embodiment describing a process of obtaining a visual image of a target, which may be applied in an unmanned device, comprising the steps of:

in step 201, a currently acquired visual image is acquired.

In this embodiment, the image acquisition device is installed on the unmanned device, and the image acquisition device can be used to acquire the visual image of the surrounding environment of the unmanned device in real time and perform positioning by using the acquired visual image. And acquiring a visual image currently acquired by the image acquisition device during positioning.

In step 202, the currently captured visual image is converted into a target visual image satisfying a preset shooting condition.

In the present embodiment, the inventors found that different shooting conditions (e.g., lighting factors, or image shooting angle factors, etc.) have a large influence on the visual image features extracted for the visual image. Under different shooting conditions, the characteristics of the visual images corresponding to the visual images at the same positioning point are lack of uniformity, so that the final positioning result is influenced. Therefore, the currently acquired visual image can be converted into the target visual image meeting the preset shooting condition, so that the visual image characteristics corresponding to the target visual image at the same positioning point have smaller difference under different shooting conditions. The preset shooting condition may include preset illumination, or include a preset shooting angle, or include preset illumination and a preset shooting angle.

Specifically, the currently acquired visual image may be input to a pre-trained generator, and a target visual image output by the generator and meeting a preset shooting condition is obtained. Wherein, the generator can be obtained by training the generative confrontation network.

In step 203, visual image features are extracted for the current target visual image.

In step 204, the target point cloud image features matching the visual image features are searched from a pre-constructed feature library.

In step 205, object location data associated with the object point cloud image features is determined.

In step 206, current positioning information is determined based on the object positioning data.

It should be noted that, for the same steps as in the embodiment of fig. 1, details are not repeated in the embodiment of fig. 2, and related contents may refer to the embodiment of fig. 1.

According to the positioning method provided by the embodiment of the application, the currently acquired visual image is acquired, the currently acquired visual image is converted into the target visual image meeting the preset shooting condition, the visual image characteristics are extracted aiming at the current target visual image, the target point cloud image characteristics matched with the visual image characteristics are searched from the pre-constructed characteristic library, the target positioning data associated with the target point cloud image characteristics are determined, and the current positioning information is determined based on the target positioning data. Because this embodiment converts the visual image of current collection into the target visual image that satisfies preset shooting condition to unify the visual image, avoided the shooting condition to the influence that the location produced, improved the degree of accuracy of location.

It should be noted that although in the above embodiments, the operations of the methods of the present application were described in a particular order, this does not require or imply that these operations must be performed in that particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

Corresponding to the embodiment of the positioning method, the application also provides an embodiment of the positioning device.

As shown in fig. 3, fig. 3 is a block diagram of a positioning apparatus according to an exemplary embodiment of the present application, and the apparatus may include: an extraction module 301, a lookup module 302, a determination module 303 and a location module 304.

The extracting module 301 is configured to extract, for a current target visual image, a visual image feature.

A searching module 302, configured to search a pre-constructed feature library for target point cloud image features matching the visual image features. The feature library comprises point cloud image features extracted aiming at multi-frame laser point cloud data.

A determining module 303, configured to determine target location data associated with the target point cloud image feature.

A positioning module 304 for determining current positioning information based on the object positioning data.

In some optional embodiments, the point cloud image features are extracted by using a first pre-trained target neural network, and the visual image features are extracted by using a second pre-trained target neural network.

Wherein the first target neural network and the second target neural network satisfy the following conditions: for any positioning point, the difference between the point cloud image features extracted by the first target neural network aiming at the laser point cloud data corresponding to the positioning point and the visual image features extracted by the second target neural network aiming at the visual image corresponding to the positioning point is within a preset error range.

The point cloud image features may be extracted using a first target neural network by: inputting a frame of laser point cloud data into a first target neural network, and outputting point cloud image features corresponding to the frame of laser point cloud data by a deconvolution layer of the first target neural network.

The visual image feature may be extracted using the second target neural network by: and inputting a frame of visual image into a second target neural network, and outputting visual image characteristics corresponding to the frame of visual image by a deconvolution layer of the second target neural network.

In other alternative embodiments, the first target neural network and the second target neural network are trained by: determining a sample sequence, wherein the sample sequence comprises a frame of sample laser point cloud data corresponding to each acquisition time in a plurality of acquisition times, a frame of sample visual image and a group of semantic labels. And based on the sample sequence, iteratively executing the updating operation on the first neural network and the second neural network until a stopping condition is met, and respectively taking the first neural network and the second neural network after iterative updating as a first target neural network and a second target neural network.

The update operation includes: and inputting sample laser point cloud data corresponding to any randomly selected acquisition time in the sample sequence into a current first neural network to obtain the sample point cloud image characteristics and the first semantics. The sample point cloud image features and the first semantic are output by a current deconvolution layer and an output layer of the first neural network respectively. And inputting the sample visual image corresponding to the acquisition time in the sample sequence into a current second neural network to obtain the sample visual image characteristics and a second semantic meaning. The sample visual image features and the second semantic are output by a deconvolution layer and an output layer of the current second neural network respectively. And determining the semantic label corresponding to the acquisition time in the sample sequence as the current semantic label. And updating the current first neural network and the current second neural network according to the sample point cloud image feature, the first semantic, the sample visual image feature, the second semantic and the current semantic label.

As shown in fig. 4, fig. 4 is a block diagram of another positioning apparatus according to an exemplary embodiment of the present application, and the apparatus according to the embodiment shown in fig. 3 may further include: an acquisition module 305 and a conversion module 306.

The acquiring module 305 is configured to acquire a currently acquired visual image.

And a converting module 306, configured to convert the currently acquired visual image into a target visual image meeting a preset shooting condition.

In other alternative embodiments, the preset photographing condition may include a preset illumination and/or a preset photographing angle.

As shown in fig. 5, fig. 5 is a block diagram of another positioning apparatus according to an exemplary embodiment of the present application, where on the basis of the foregoing embodiment shown in fig. 4, the converting module 306 may include: input to sub-module 501.

The input sub-module 501 is configured to input a currently acquired visual image to a pre-trained generator, so as to obtain a target visual image meeting a preset shooting condition. Wherein, the generator is obtained by training a generative confrontation network.

It should be understood that the above-mentioned means may be preset in the unmanned device, or may be loaded into the unmanned device by means of downloading or the like. The corresponding modules in the above-described arrangement may cooperate with modules in the drone to implement the positioning solution.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.

An embodiment of the present application further provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program can be used to execute the positioning method provided in any one of the embodiments of fig. 1 to fig. 2.

Corresponding to the positioning method, the embodiment of the present application also proposes a schematic structural diagram of the unmanned aerial vehicle according to an exemplary embodiment of the present application, shown in fig. 6. Referring to fig. 6, at the hardware level, the drone includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, although it may also include hardware required for other services. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs the computer program to form the positioning device on a logic level. Of course, besides the software implementation, the present application does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method of positioning, the method comprising:

extracting visual image features aiming at the current target visual image, wherein the visual image features are extracted by utilizing a pre-trained second target neural network;

searching a target point cloud image feature matched with the visual image feature from a pre-constructed feature library; the feature library comprises point cloud image features extracted aiming at multiple frames of laser point cloud data, the point cloud image features corresponding to each frame of laser point cloud data and positioning data corresponding to the frame of laser point cloud data are stored in the feature library in an associated mode, the point cloud image features are extracted by utilizing a pre-trained first target neural network, and the first target neural network and the second target neural network meet the following conditions: for any positioning point, the difference between the point cloud image features extracted by the first target neural network aiming at the laser point cloud data corresponding to the positioning point and the visual image features extracted by the second target neural network aiming at the visual image corresponding to the positioning point is within a preset error range, and the second target neural network is a deep convolution neural network with a function of processing a two-dimensional image;

2. The method of claim 1, wherein the first target neural network and the second target neural network are both deep convolutional neural networks;

3. The method of claim 2, wherein the first target neural network and the second target neural network are trained by:

4. The method according to any of claims 1-3, further comprising, prior to said extracting visual image features for the current target visual image:

acquiring a currently acquired visual image;

5. The method according to claim 4, wherein the preset shooting condition comprises a preset illumination and/or a preset shooting angle.

6. The method according to claim 4, wherein the converting the currently acquired visual image into the target visual image satisfying a preset shooting condition comprises:

7. A positioning device, the device comprising:

the extraction module is used for extracting visual image features aiming at the current target visual image, and the visual image features are extracted by utilizing a pre-trained second target neural network;

the searching module is used for searching the target point cloud image characteristics matched with the visual image characteristics from a pre-constructed characteristic library; the feature library comprises point cloud image features extracted aiming at multiple frames of laser point cloud data, the point cloud image features corresponding to each frame of laser point cloud data and positioning data corresponding to the frame of laser point cloud data are stored in the feature library in an associated mode, the point cloud image features are extracted by utilizing a pre-trained first target neural network, and the first target neural network and the second target neural network meet the following conditions: for any positioning point, the difference between the point cloud image features extracted by the first target neural network aiming at the laser point cloud data corresponding to the positioning point and the visual image features extracted by the second target neural network aiming at the visual image corresponding to the positioning point is within a preset error range, and the second target neural network is a deep convolution neural network with a function of processing a two-dimensional image;

8. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when being executed by a processor, carries out the method of any of the preceding claims 1-6.

9. An unmanned aerial device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the method of any of claims 1-6.