CN111798496B

CN111798496B - Visual locking method and device

Info

Publication number: CN111798496B
Application number: CN202010542145.XA
Authority: CN
Inventors: 熊明磊; 陈龙冬; 李鑫海
Original assignee: Boya Gongdao Beijing Robot Technology Co Ltd
Current assignee: Boya Gongdao Beijing Robot Technology Co Ltd
Priority date: 2020-06-15
Filing date: 2020-06-15
Publication date: 2021-11-02
Anticipated expiration: 2040-06-15
Also published as: CN111798496A

Abstract

The invention discloses a visual locking method and a visual locking device, relates to the technical field of underwater robots, and solves the technical problem that a tracking method in the prior art only stops on a two-dimensional image and is poor in tracking effect. The vision locking method of the invention enables the unmanned underwater vehicle and the tracking target to keep six-axis synchronization by acquiring the three-dimensional position data of the tracking target and controlling the operation mode of the unmanned underwater vehicle in real time based on the acquired three-dimensional position data. The vision locking method can automatically feed back and adjust the position of the unmanned submersible in real time under the conditions of water flow interference fluctuation and the like, so that the unmanned submersible keeps synchronous with a tracked target, and the unmanned submersible can conveniently shoot or grab the tracked target. Namely, the visual locking method of the present invention can improve the tracking effect of the unmanned underwater vehicle.

Description

Visual locking method and device

Technical Field

The invention relates to the technical field of underwater robots, in particular to a visual locking method and device.

Background

The ocean occupies 71 percent of the surface area of the earth, has a volume of 14 hundred million cubic kilometers, contains extremely rich biological resources and mineral resources in the ocean floor and ocean, and has extremely strong attraction and challenge as the ocean floor detection is similar to the space detection. The unmanned submersible and the facilities matched with the unmanned submersible are products of various modern high technologies and system integration thereof, and have special significance for marine economy, marine industry, marine development and marine high-tech in China.

The existing unmanned submersible generally uses coherent filtering (KCF column) or a characteristic matching optical flow method to estimate the position of a moving object, the method only stays on a two-dimensional image, and only has two degrees of freedom, and the tracking effect on certain shielding, motion blurring and 'fog' formed by underwater light absorption and scattering is extremely poor. Therefore, to keep the object synchronized, it is a technical problem to be solved by those skilled in the art to provide a multi-dimensional and precise visual locking method and apparatus.

Disclosure of Invention

One of the purposes of the present invention is to provide a visual locking method and apparatus, which solve the technical problem that the tracking method in the prior art only stays on a two-dimensional image and the tracking effect is poor. The various technical effects that can be produced by the preferred technical solution of the present invention are described in detail below.

In order to achieve the purpose, the invention provides the following technical scheme:

according to the vision locking method, the unmanned submersible and the tracking target keep six-axis synchronization by acquiring the three-dimensional position data of the tracking target and controlling the operation mode of the unmanned submersible in real time based on the acquired three-dimensional position data.

According to a preferred embodiment, the visual locking method comprises the following steps:

s1: initializing equipment to finish correction of the binocular camera;

s2: acquiring a video stream of a binocular camera, and pushing the acquired video stream to an upper computer and a deep learning module;

s3: selecting a tracking target, and pushing a tracking target image to a deep learning module;

s4: the deep learning module acquires three-dimensional position data of a tracking target in the left camera and/or the right camera based on the acquired binocular video stream and the tracking target image;

s5: and the control module judges whether the tracking target is in the left camera and/or the right camera or not based on the received three-dimensional position data, and controls the motion mode of the unmanned submersible vehicle based on the judgment result so as to ensure that the unmanned submersible vehicle and the tracking target keep six-axis synchronization.

According to a preferred embodiment, in step S1, the correction of the binocular camera is accomplished by: after equipment is initialized, obtaining internal and external parameters and distortion parameters of the binocular camera, and completing distortion correction of the binocular camera based on the obtained internal parameters and distortion parameters; and determining the position relation between the binocular cameras based on the acquired external parameters.

According to a preferred embodiment, in step S3, the tracking target is selected by: and manually selecting a tracking target through an upper computer, and/or automatically detecting the tracking target through a target detection system.

According to a preferred embodiment, in step S4, the three-dimensional position data includes an X value, a Y value, and a depth value of the tracking target.

According to a preferred embodiment, the depth value is depth information of the tracking target with respect to the left camera and/or the right camera.

According to a preferred embodiment, in step S4, the deep learning module acquires three-dimensional position data of the tracking target relative to the left camera and/or the right camera by:

s41: extracting the backbone characteristics of the tracking target image and the left and right camera views by using a residual error network;

s42: extracting a tracking target graph and feature graphs of a left camera view and/or a right camera view at different depth positions to generate a network by a twin network region in a plurality of cascaded stages, and regressing the predicted positions and categories of the tracking target of the left camera view and/or the right camera view; wherein the predicted position of the tracking target is the X value and the Y value of the tracking target;

s43: calculating matching cost by using a twin network based on feature maps of left and right camera views with different depths, returning the feature maps to the same size as an original map by using a convolutional neural network, then using a plurality of self-encoders connected in series to optimize a disparity map in a multi-stage manner, and acquiring the depth diff of a left camera and/or a right camera based on the optimized disparity map;

s44: and calculating the depth D of the tracking target relative to the left camera and/or the right camera according to the binocular camera base line B, the left camera and/or the right camera focal length F and the depth diff, wherein D is F B/diff.

According to a preferred embodiment, in step S5, the control module determines whether the tracking target is in the left camera and/or the right camera by: and comparing the detected X value and Y value of the tracking target with the visual field range of the left camera and/or the right camera, wherein the visual field range of the left camera and/or the right camera is (width/2-width/10, width/2+ width/10), (height/2-height/10, height/2 + height/10), and the width and the height are respectively the width and the height of the resolution.

According to a preferred embodiment, the control module controls the mode of motion of the unmanned vehicle by: the control module calculates a yaw angle, a pitch angle, and/or a velocity of the unmanned submersible based on a distance between the tracked target and the unmanned submersible, and controls a mode of motion of the unmanned submersible based on the obtained yaw angle, pitch angle, and/or velocity.

The vision locking device of the invention utilizes the vision locking method of any technical proposal of the invention to keep the unmanned submersible and the tracking target in six-axis synchronization, and comprises an initialization module, a data acquisition module, a tracking target selection module, a deep learning module and a control module, wherein,

the initialization module is used for initializing equipment and finishing the correction of the binocular camera;

the data acquisition module and the tracking target selection module are connected with the deep learning module, wherein the data acquisition module is used for acquiring a video stream of a binocular camera and pushing the acquired video stream to the deep learning module, the tracking target selection module is used for selecting a tracking target and pushing a tracking target image to the deep learning module, and the deep learning module acquires three-dimensional position data of the tracking target in the left camera and/or the right camera based on the acquired binocular video stream and the tracking target image;

the deep learning module is connected with the control module, the control module is connected with the unmanned submersible, the deep learning module transmits acquired three-dimensional position data of the tracking target in the left camera and/or the right camera to the control module in real time, the control module judges whether the tracking target is in the left camera and/or the right camera or not based on the received three-dimensional position data, and controls the motion mode of the unmanned submersible based on the judgment result, so that the unmanned submersible and the tracking target keep six-axis synchronization.

The visual locking method and the visual locking device provided by the invention at least have the following beneficial technical effects:

according to the vision locking method, the three-dimensional position data of the tracked target is acquired, the running mode of the unmanned submersible is controlled in real time based on the acquired three-dimensional position data, the unmanned submersible and the tracked target keep six-axis synchronization, the position of the unmanned submersible can be automatically fed back and adjusted in real time under the conditions of water flow interference fluctuation and the like, the unmanned submersible and the tracked target keep synchronous, and the unmanned submersible can conveniently carry out shooting or grabbing and other operations on the tracked target.

The vision locking method of the invention ensures that the unmanned submersible and the tracked target keep six-axis synchronization by acquiring the three-dimensional position data of the tracked target and controlling the running mode of the unmanned submersible in real time based on the acquired three-dimensional position data, can improve the tracking effect of the unmanned submersible, and solves the technical problem that the tracking method in the prior art only stays on a two-dimensional image and has poor tracking effect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of the steps of a preferred embodiment of the visual locking method of the present invention;

FIG. 2 is a schematic diagram illustrating the steps of the deep learning module for obtaining three-dimensional position data of a tracked target according to an embodiment of the present invention;

fig. 3 is a schematic view of a preferred embodiment of the visual locking apparatus of the present invention.

In the figure: 1. initializing a module; 2. a data acquisition module; 3. a tracking target selection module; 4. a deep learning module; 5. and a control module.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.

The visual locking method and apparatus of the present embodiment will be described in detail with reference to fig. 1 to 3.

According to the visual locking method, the unmanned underwater vehicle and the tracking target keep six-axis synchronization by acquiring the three-dimensional position data of the tracking target and controlling the operation mode of the unmanned underwater vehicle in real time based on the acquired three-dimensional position data. Preferably, the six axes in this embodiment are the positive X-axis direction, the negative X-axis direction, the positive Y-axis direction, the negative Y-axis direction, the positive Z-axis direction, and the negative Z-axis direction. Specifically, the establishment of the XYZ coordinate system is the same as that in the prior art, and is not described herein again.

According to the visual locking method, the three-dimensional position data of the tracked target is acquired, the operation mode of the unmanned submersible is controlled in real time based on the acquired three-dimensional position data, the unmanned submersible and the tracked target keep six-axis synchronization, the position of the unmanned submersible can be automatically fed back and adjusted in real time under the conditions of water flow interference fluctuation and the like, the unmanned submersible and the tracked target keep synchronous, and the unmanned submersible can conveniently carry out shooting or grabbing and other operations on the tracked target. That is, the visual locking method of the embodiment obtains the three-dimensional position data of the tracked target and controls the operation mode of the unmanned underwater vehicle in real time based on the obtained three-dimensional position data, so that the unmanned underwater vehicle and the tracked target keep six-axis synchronization, the tracking effect of the unmanned underwater vehicle can be improved, and the technical problem that the tracking method in the prior art only stays on a two-dimensional image and is poor in tracking effect is solved.

As shown in fig. 1, the visual locking method of the preferred embodiment of the present invention includes the following steps:

s1: and initializing equipment to finish the correction of the binocular camera.

S2: and acquiring the video stream of the binocular camera, and pushing the acquired video stream to the upper computer and the deep learning module 4. Preferably, one path of the obtained video stream of the binocular camera is pushed to the upper computer for real-time display, and the other path of the obtained video stream is pushed to the deep learning module 4 for data analysis.

S3: and selecting a tracking target and pushing a tracking target map to the deep learning module 4.

S4: the deep learning module 4 acquires three-dimensional position data of the tracking target in the left camera and/or the right camera based on the acquired binocular video stream and the tracking target map.

S5: and the three-dimensional position data of the tracking target in the left camera and/or the right camera, which is acquired by the deep learning module 4, is transmitted to the control module 5 in real time, the control module 5 judges whether the tracking target is in the left camera and/or the right camera or not based on the received three-dimensional position data, and controls the motion mode of the unmanned submersible based on the judgment result, so that the unmanned submersible and the tracking target keep six-axis synchronization.

According to a preferred embodiment, in step S1, the correction of the binocular camera is accomplished by: after equipment is initialized, obtaining internal and external parameters and distortion parameters of the binocular camera, and completing distortion correction of the binocular camera based on the obtained internal parameters and distortion parameters; and determining the position relation between the binocular cameras based on the acquired external parameters. Preferably, the method for obtaining the inside and outside parameters and the distortion parameters of the binocular camera, the distortion correction method of the binocular camera and the method for determining the position relationship between the binocular cameras are the same as those in the prior art, and are not described herein again.

According to a preferred embodiment, in step S3, the tracking target is selected by: and manually selecting a tracking target through an upper computer, and/or automatically detecting the tracking target through a target detection system. That is, the preferred technical solution of the present embodiment may select the tracking target in a manual and/or automatic manner.

According to a preferred embodiment, in step S4, the three-dimensional position data includes an X value, a Y value, and a depth value of the tracking target. Preferably, the depth value is depth information of the tracking target with respect to the left camera and/or the right camera. More preferably, the depth value is depth information of the tracking target with respect to the left camera.

According to a preferred embodiment, in step S4, the deep learning module 4 acquires three-dimensional position data of the tracking target relative to the left camera and/or the right camera by:

s41: and extracting the backbone features of the tracking target image and the left and right camera views by using a residual error network.

S42: extracting a tracking target graph and feature graphs of a left camera view and/or a right camera view at different depth positions to generate a network by a twin network region in a plurality of cascaded stages, and regressing the predicted positions and categories of the tracking target of the left camera view and/or the right camera view; and the predicted position of the tracking target is the X value and the Y value of the tracking target.

S43: based on feature maps of left and right camera views with different depths, a twin network is used for calculating matching cost, a convolutional neural network is used for returning the feature maps to the same size as an original map, a plurality of self-encoders connected in series are used for optimizing the disparity map in a multi-stage mode, and the depth diff of the left camera and/or the right camera is obtained based on the optimized disparity map.

Fig. 2 is a schematic diagram illustrating a step of acquiring three-dimensional position data of a tracking target by the deep learning module according to a preferred embodiment of the present invention. Preferably, the deep learning module 4 calculates three-dimensional position data of each frame based on the acquired binocular video stream and the tracking target map.

As shown in fig. 2, the method first extracts the backbone features using a residual error network (resnet50) for the tracking target map and the left and right camera views, respectively. And then extracting a tracking target map and feature maps of the left camera view and/or the right camera view at different depth positions to generate a twin network region generation network in a plurality of stages in cascade, and regressing the predicted position (bbox) and the predicted class (cls) of the left camera view tracking target object. The predicted positions of the tracking targets are the X and Y values of the tracking targets. And calculating matching cost by using a twin network (siamese) for the feature maps of the left camera view and the right camera view with different depths, returning the feature maps to the same size as the original map by using a convolutional neural network, then using a plurality of self-encoders connected in series to carry out multi-stage optimization on the disparity map, and acquiring the depth diff of the left camera and/or the right camera based on the optimized disparity map. And finally, calculating the depth D of the tracking target relative to the left camera and/or the right camera according to the binocular camera base line B, the left camera and/or the right camera focal length F and the depth diff, wherein D is F B/diff.

Preferably, the depth D in this embodiment may be information of the object with respect to the left camera, or may be information of the object with respect to the right camera, and the calculation methods thereof are all the same.

Preferably, the residual error network (resnet50), the twin network (siamese) and the convolutional neural network are all methods in the prior art, and the specific processes thereof are not described again.

According to a preferred embodiment, in step S5, the control module 5 determines whether the tracking target is in the left camera and/or the right camera by: and comparing the detected X value and Y value of the tracking target with the visual field range of the left camera and/or the right camera, wherein the visual field range of the left camera and/or the right camera is (width/2-width/10, width/2+ width/10), (height/2-height/10, height/2 + height/10), and the width and the height are respectively the width and the height of the resolution. Specifically, the deep learning module 4 transmits the calculated three-dimensional position data (X, Y, depth) to the control module 5 in real time, and the control module 5 compares the received position data with the visual field range of the left camera and/or the right camera to determine whether the tracking target is in the left camera and/or the right camera.

According to a preferred embodiment, the control module 5 controls the motion of the unmanned vehicle by: the control module 5 calculates a yaw angle, a pitch angle, and/or a velocity of the unmanned submersible based on a distance between the tracking target and the unmanned submersible, and controls a motion pattern of the unmanned submersible based on the obtained yaw angle, pitch angle, and/or velocity. The control module 5 calculates the yaw angle, the pitch angle and/or the speed of the unmanned submersible based on the judgment result and the distance between the tracking target and the unmanned submersible, controls the motion mode of the unmanned submersible based on the obtained yaw angle, pitch angle and/or speed, and controls the tracking target to the center of the visual field of the left camera and/or the right camera after the unmanned submersible receives the regulation and control information of the control module 5, so that the unmanned submersible and the tracking target keep six-axis synchronization.

The vision locking device of the embodiment utilizes the vision locking method of any technical scheme of the embodiment to enable the unmanned submersible vehicle and the tracking target to keep six-axis synchronization.

Preferably, the visual locking device comprises an initialization module 1, a data acquisition module 2, a tracking target selection module 3, a deep learning module 4 and a control module 5, as shown in fig. 3. The initialization module 1 is used for initializing equipment and finishing correction of the binocular camera. The data acquisition module 2 and the tracking target selection module 3 are connected with the deep learning module 4, wherein the data acquisition module 2 is used for acquiring a video stream of a binocular camera and pushing the acquired video stream to the deep learning module 4, the tracking target selection module 3 is used for selecting a tracking target and pushing a tracking target image to the deep learning module 4, and the deep learning module 4 acquires three-dimensional position data of the tracking target in the left camera and/or the right camera based on the acquired binocular video stream and the tracking target image. The deep learning module 4 is connected with the control module 5, the control module 5 is connected with the unmanned submersible, the deep learning module 4 transmits acquired three-dimensional position data of the tracking target in the left camera and/or the right camera to the control module 5 in real time, the control module 5 judges whether the tracking target is in the left camera and/or the right camera based on the received three-dimensional position data, and controls the motion mode of the unmanned submersible based on the judgment result, so that the unmanned submersible and the tracking target keep six-axis synchronization.

The vision locking device comprises an initialization module 1, a data acquisition module 2, a tracking target selection module 3, a deep learning module 4 and a control module 5, wherein after the remote control unmanned submersible selects a tracking target, the tracking target is tracked, six-axis synchronization is realized, namely, under the condition of water flow interference fluctuation and the like, the position of the unmanned submersible can be automatically fed back and adjusted in real time, so that the unmanned submersible keeps synchronization with the tracking target, and the unmanned submersible can conveniently carry out operations such as shooting or grabbing on the tracking target.

According to the visual locking method and device, on one hand, the distance of a tracked target object can be accurately deduced, depth information is provided, and accurate control of six-axis freedom is achieved; on the other hand, the method also has the advantage of higher feature extraction quality, and can ensure more stable tracking quality under the conditions of shielding, multiple similar targets, motion blurring and severe change of ambient illumination.

It is understood that the same or similar parts in the present embodiment may be mutually referred to, and the same or similar contents in other embodiments may be referred to for the contents which are not described in detail in some embodiments.

It is noted that, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.

The term "connection" as used herein may refer to one or more of a data connection, a communication connection, a wired connection, a wireless connection, a connection via a physical connection, and the like, as will be appreciated by those skilled in the art.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A vision locking method for enabling an unmanned underwater vehicle to maintain six-axis synchronization with a tracked target by acquiring three-dimensional position data of the tracked target and controlling an operation mode of the unmanned underwater vehicle in real time based on the acquired three-dimensional position data, comprising the steps of:

s1: initializing equipment to finish correction of the binocular camera;

s5: the three-dimensional position data of the tracking target in the left camera and/or the right camera, which are acquired by the deep learning module, are transmitted to the control module in real time, the control module judges whether the tracking target is in the left camera and/or the right camera or not based on the received three-dimensional position data, and controls the motion mode of the unmanned submersible based on the judgment result, so that the unmanned submersible and the tracking target keep six-axis synchronization;

in step S4, the deep learning module acquires three-dimensional position data of a tracking target relative to the left camera and/or the right camera by:

s42: extracting a tracking target graph and feature graphs of a left camera view and/or a right camera view at different depth positions, cascading the feature graphs and the feature graphs, inputting twin network regions of multiple stages to generate a network, and regressing the predicted positions and categories of the tracking target of the left camera view and/or the right camera view; wherein the predicted position of the tracking target is the X value and the Y value of the tracking target;

2. The visual locking method according to claim 1, wherein in step S1, the correction of the binocular camera is accomplished by: after equipment is initialized, obtaining internal and external parameters and distortion parameters of the binocular camera, and completing distortion correction of the binocular camera based on the obtained internal parameters and distortion parameters; and determining the position relation between the binocular cameras based on the acquired external parameters.

3. The visual locking method according to claim 1, wherein in step S3, the tracking target is selected by: and manually selecting a tracking target through an upper computer, and/or automatically detecting the tracking target through a target detection system.

4. The visual locking method of claim 1, wherein in step S4, the three-dimensional position data includes an X value, a Y value, and a depth value of a tracking target.

5. The visual locking method of claim 4, wherein the depth value is depth information of a tracking target relative to a left camera and/or a right camera.

6. The visual locking method according to claim 1, wherein in step S5, the control module determines whether the tracking target is in the left camera and/or the right camera by:

and comparing the detected X value and Y value of the tracking target with the visual field range of the left camera and/or the right camera, wherein the visual field range of the left camera and/or the right camera is (width/2-width/10, width/2+ width/10), (height/2-height/10, height/2 + height/10), and the width and the height are respectively the width and the height of the resolution.

7. The visual locking method of claim 1, wherein the control module controls the manner of motion of the unmanned vehicle by:

the control module calculates a yaw angle, a pitch angle, and/or a velocity of the unmanned submersible based on a distance between the tracked target and the unmanned submersible, and controls a mode of motion of the unmanned submersible based on the obtained yaw angle, pitch angle, and/or velocity.

8. A visual locking device, characterized in that the unmanned underwater vehicle is kept in six-axis synchronization with a tracking target by the visual locking method as claimed in any one of claims 1 to 7, and

the visual locking device comprises an initialization module, a data acquisition module, a tracking target selection module, a deep learning module and a control module, wherein,